Skip to main content

Identifiers

What are the identifiers in the data source yaml?

Identifiers are the join key columns in your data source that can be used to join to other data sources. The identifier types in Transform are primary, foreign, or unique. Identifiers can be single identifiers or composite identifiers. Additionally, identifiers can be referenced directly by column name from the data source or they can be altered using an expression. Identifiers can also be used as a dimension allowing you to aggregate a metric to the granularity of that identifier. Identifiers need to be unique in the same data source, but do not have to be unique in different data sources.

Identifier Types

Transform uses the identifier's type to determine how to join data sources. The join logic depends on the type of identifier. Check out our join logic page for more info on how Transform joins data sources.

Primary: A primary key has one and only one record for each row in the table and it is also inclusive of every record in the data warehouse.

Unique: A unique key has one and only one record for each row in the table, but it may only have a subset of every record in the data warehouse. They can also have nulls.

Foreign: A foreign key can have zero, one or many of the same records. They can also have nulls.

Example:

identifiers:
- name: transaction
type: primary
expr: id_transaction
- name: order
type: foreign
expr: id_order
- name: user
type: foreign
expr: SUBSTRING(id_order FROM 2)

Composite Keys

MetricFlow supports configuring data sources that contain composite keys. In database design, a composite key is a key that consists of two or more attributes (table columns) that together uniquely identify an entity occurrence or a table row instead of a key that uses a single column or attribute. Composite keys can be used in foreign or primary identifiers, but not with unique identifiers. Identifiers with composite keys can be defined in the identifiers section of the data source like any other identifiers.

Examples

A table has a composite key made up of two fields: user_id and message_id. Additionally, for this example, assume the user_id field cannot be joined on its own and is only joinable when tied to a particular message

Example 1:

To configure a composite key

  • Name the Identifier using the same name field as when naming any other Identifier. In this example we'll define a Identifier called user_message
  • Specify the identifier type via type field. Here it's primary .
  • Use the identifiers field to define the components of the composite key.
  • You can specify new identifiers directly in the composite key like user_id or use the ref field to reference identifiers you've already defined i.e ref: message
  • Composite identifiers with the same name should also have sub-identifiers with the same name. You can leverage expr to rename a sub-identifier.
data source: users
identifiers:
- name: message
expr: message_id
type: foreign
#Here we define the compostite key by refrecing the message identifier
- name: user_message
type: primary #Composite keys can also be a foreign
identifiers:
- name: user
expr: user_id
- ref: message

Transform will never implicitly create a composite key. If a data source happens to have team_id and user_id, we won't assume that data source can be joined with user_team unless explicitly defined.

Note: If keys that make up an identifier cannot stand alone or be joined to other data sources we recommend that you define those fields directly in the composite statement as we did with user. Notice in this example that we only defined one field before the composite key definition (message_id) and defined the other (user_id) directly in the composite key. You can still choose to explicitly define all identifiers that make up your composite key. Our recommended approach is to consider whether the field can stand alone or whether it is only usable as part of a composite key.

Example 2:

You can optionally define all fields of a composite key ahead of the definition of the composite primary key.

data_source:    
name: users
description: Users and relationships to the teams they are on
owners:
- owner@company.com
sql_table: people.users

identifiers:
- name: team_id
type: foreign
- name: id
type: foreign
- name: user_team
type: primary
identifiers:
- ref: team_id
- ref: id

Example 3:

You can define any number of columns as your composite key. Here we have three components in a composite key. In this case, the fields are team_id, id, and team_region, respectively.

data_source:    
name: users
description: Users and relationships to the teams they are on
owners:
- owner@company.com
sql_table: people.users

identifiers:
- name: team_id
type: foreign
- name: id
type: foreign
- name: team_region
type: foreign
- name: user_team
type: primary
identifiers:
- ref: team_id
- ref: id
- ref: team_region

Querying Composite Keys in MetricFlow

You can query using a where clause in MetricFlow by specifying each specific field that makes up your composite key. For example, if your composite key is made up of user_id and message_id, you can run a query such as.

MetricFlow

mf query --metrics messages --dimensions ds --where "user_id = 10 and team_id = 7"

Transform

mql query --metrics messages --dimensions ds --where "user_id = 10 and team_id = 7"

Derived Identifiers

A common design pattern you may have in your data warehouse is to have differently prioritized dimensions. For example, a user_id from one table and person_id from a different table may refer to the same user. One field may be preferred to use in a query, with the other only being used if the first is null.

You can use a derived_identifier to create the coalesce in Transform! See an example below.

derived_identifier:
name: c_user_id
owners:
- support@transformdata.io
expr: COALESCE(user_id, person_id)
expr_elements:
- person_id
- user_id

This identifier example is derived from the following two datasources (summarized):

---
data_source:
name: user_core_source
table: user_core
identifiers:
- name: user_id
type: primary
...
---
data_source:
name: bookings
table: fct_bookings
identifiers:
- name: person_id
type: foreign

After defined, you can reference the identifier in any MetricFlow query:

MetricFlow

mf query --metrics bookings --dimensions c_user_id

Transform

mql query --metrics bookings --dimensions c_user_id