Skip to main content

Identifiers

What are the identifiers in the data source yaml?

Identifiers are the join key columns in your data source that can be used to join to other data sources.

Within a data source, the required parameters for an identifier are name and type. The name refers to either the key’s column name from the underlying data table, or the name may serve as an alias with the column name referenced in the expr parameter.

The identifier types in Transform are primary, foreign, or unique. Identifiers can also be single identifiers or composite identifiers. Identifiers can also be used as a dimension allowing you to aggregate a metric to the granularity of that identifier.

Note that the MetricFlow framework refers to the identifiers (join keys) in a data source via the name parameter. Within a data source, identifier names must be unique (you cannot have repeated identifier names within a data source just as you cannot have repeated column names within a data table). However, identifier names do not have to be unique across different data sources.

Identifier Types

Transform uses the identifier's type to determine how to join data sources. The join logic depends on the type of identifier. Check out our join logic page for more info on how Transform joins data sources.

Primary: A primary key has one and only one record for each row in the table, and it is also inclusive of every record in the data warehouse.

Unique: A unique key has one and only one record for each row in the table, but it may only have a subset of every record in the data warehouse. They can also have nulls.

Foreign: A foreign key can have zero, one or many of the same records. They can also have nulls.

Example:

identifiers:
- name: transaction
type: primary
expr: id_transaction
- name: order
type: foreign
expr: id_order
- name: user
type: foreign
expr: SUBSTRING(id_order FROM 2)

Composite Keys

MetricFlow supports configuring data sources that contain composite keys (a combination of two or more columns to uniquely identify an entity occurrence or a table row). Composite keys can be defined in the identifiers section of the data source, just like the single identifiers.

Note: Composite keys can only include primary or foreign identifiers. Unique identifiers cannot be included in a composite key.

For the following example file containing a composite key, assume:

  • Suppose we have a table people.users that consists of all users (denoted by the field name id) and the teams that they belong to (denoted by the field team_id).
  • In the example configuration file below, we have defined the single identifiers (team_id and id), and then created a composite identifier (user_team) that refers to the team_id and id columns that are combined to uniquely identify a particular user-team row combination.

Example:

data source: 
name: users
description: Users and their teams
owners:
- owner@company.com
sql_table: people.users

identifiers:
- name: team_id
type: foreign
- name: id
type: foreign
- name: user_team # Composite key created
type: primary # Composite keys can also be of foreign type
identifiers:
- ref: team_id
- ref: id

There is no limit to the number of identifiers that can be combined into your composite key. Common examples of setting up composite keys include event logs, where tables often require the combination of 2 or more columns (i.e. timestamp, machine_id, event_type) to generate a unique identifier key.

Note: MetricFlow will never implicitly create a composite key. For example, if another data source named users_v2 also happens to have the identifiers team_id and id, MetricFlow will not assume that the data source users_v2 can be joined with data source users via the composite key user_team unless explicitly defined.

You do not have to explicitly pre-define all identifiers that make up your composite key. Our recommended approach is to consider whether the particular field can stand alone as an identifier, or whether it is only usable as part of a composite key.

Example: The data source users below has a composite key named user_message. In the composite key, there is a reference to the identifier message, which has been previously defined on its own. Notice that the composite key user_message also references another identifier named user, which is instead defined within the composite identifier.

data source: 
name: users
description: Users and messages sent
owners:
- owner@company.com
sql_table: people.users

identifiers:
- name: message
expr: message_id
type: foreign
- name: user_message # Composite Key Created
type: primary # Composite keys can also be foreign
identifiers:
- name: user
expr: user_id
- ref: message
type: foreign

Querying Composite Keys in MetricFlow

In MetricFlow, you can query using a where clause by specifying each specific field that makes up your composite key. For example, if your composite key is made up of user_id and message_id, you can run a query such as.

MetricFlow

mf query --metrics messages --dimensions metric_time --where "user_id = 10 and team_id = 7"

Transform

mql query --metrics messages --dimensions metric_time --where "user_id = 10 and team_id = 7"