Identifiers
What are the identifiers in the data source yaml?
Identifiers are the join key columns in your data source that can be used to join to other data sources.
Within a data source, the required parameters for an identifier are name
and type
. The name
refers to either the key’s column name from the underlying data table, or the name
may serve as an alias with the column name referenced in the expr
parameter.
The identifier types in Transform are primary, foreign, or unique. Identifiers can also be single identifiers or composite identifiers. Identifiers can also be used as a dimension allowing you to aggregate a metric to the granularity of that identifier.
Note that the MetricFlow framework refers to the identifiers (join keys) in a data source via the name
parameter. Within a data source, identifier names must be unique (you cannot have repeated identifier names within a data source just as you cannot have repeated column names within a data table). However, identifier names do not have to be unique across different data sources.
Identifier Types
Transform uses the identifier's type to determine how to join data sources. The join logic depends on the type of identifier. Check out our join logic page for more info on how Transform joins data sources.
Primary: A primary key has one and only one record for each row in the table, and it is also inclusive of every record in the data warehouse.
Unique: A unique key has one and only one record for each row in the table, but it may only have a subset of every record in the data warehouse. They can also have nulls.
Foreign: A foreign key can have zero, one or many of the same records. They can also have nulls.
Example:
identifiers:
- name: transaction
type: primary
expr: id_transaction
- name: order
type: foreign
expr: id_order
- name: user
type: foreign
expr: SUBSTRING(id_order FROM 2)
Composite Keys
MetricFlow supports configuring data sources that contain composite keys (a combination of two or more columns to uniquely identify an entity occurrence or a table row). Composite keys can be defined in the identifiers section of the data source, just like the single identifiers.
Note: Composite keys can only include primary or foreign identifiers. Unique identifiers cannot be included in a composite key.
For the following example file containing a composite key, assume:
- Suppose we have a table
people.users
that consists of all users (denoted by the field nameid
) and the teams that they belong to (denoted by the fieldteam_id
). - In the example configuration file below, we have defined the single identifiers (
team_id
andid
), and then created a composite identifier (user_team
) that refers to theteam_id
andid
columns that are combined to uniquely identify a particular user-team row combination.
Example:
data source:
name: users
description: Users and their teams
owners:
- owner@company.com
sql_table: people.users
identifiers:
- name: team_id
type: foreign
- name: id
type: foreign
- name: user_team # Composite key created
type: primary # Composite keys can also be of foreign type
identifiers:
- ref: team_id
- ref: id
There is no limit to the number of identifiers that can be combined into your composite key. Common examples of setting up composite keys include event logs, where tables often require the combination of 2 or more columns (i.e. timestamp, machine_id, event_type) to generate a unique identifier key.
Note: MetricFlow will never implicitly create a composite key. For example, if another data source named users_v2
also happens to have the identifiers team_id
and id
, MetricFlow will not assume that the data source users_v2
can be joined with data source users
via the composite key user_team
unless explicitly defined.
You do not have to explicitly pre-define all identifiers that make up your composite key. Our recommended approach is to consider whether the particular field can stand alone as an identifier, or whether it is only usable as part of a composite key.
Example: The data source users
below has a composite key named user_message
. In the composite key, there is a reference to the identifier message
, which has been previously defined on its own. Notice that the composite key user_message
also references another identifier named user
, which is instead defined within the composite identifier.
data source:
name: users
description: Users and messages sent
owners:
- owner@company.com
sql_table: people.users
identifiers:
- name: message
expr: message_id
type: foreign
- name: user_message # Composite Key Created
type: primary # Composite keys can also be foreign
identifiers:
- name: user
expr: user_id
- ref: message
type: foreign
Querying Composite Keys in MetricFlow
In MetricFlow, you can query using a where clause by specifying each specific field that makes up your composite key. For example, if your composite key is made up of user_id
and message_id
, you can run a query such as.
MetricFlow
mf query --metrics messages --dimensions metric_time --where "user_id = 10 and team_id = 7"
Transform
mql query --metrics messages --dimensions metric_time --where "user_id = 10 and team_id = 7"