Skip to main content

Join Logic

Joins are a powerful part of MetricFlow and allow you to make all valid dimensions available for your metrics on the fly, even if they are defined in a different data source. Joins also alow you to construct metrics using measures from different data sources.

MetricFlow uses the identifiers defined in your data source configs as the join keys between tables. Assuming the identifiers are set, MetricFlow can perform joins under the hood by creating a graph using the data sources as nodes and the join paths as edges. MetricFlow will generate the appropriate join type and avoid fan-out or chasm joins with other tables based on the identifier types.

Example:

This example uses two data sources with a common identifier and shows a MetricFlow query that requires a join between the two data sources.

Say you have two data sources, transactions and user_signup:

data_source:
name: transactions
identifiers:
- name: id
type: primary
- name: user_id
type: foreign
measures:
- name: average_purchase_price
agg: avg
expr: purchase_price
create_metric: true
--
data_source:
name: user_signup
identifiers:
- name: user_id
type: primary
dimension:
- name: type
type: categorical

MetricFlow will use user_id as the join key. This is a foreign to primary relationship and will be implmented as a left join. When requesting a dimension from another data source the convention is to prefix the dimension name with a dunder (double underscore) and identifier. We can now ask for the metric average_purchase_price defined in transactions by the type dimension from user_signup by issuing the following query:

mql query --metrics average_purchase_price -- dimensions ds –-dimensions user__id_type.
note

If you have multiple dimensions with the same name, it would be possible for an ambiguous join path to be created. To avoid this, use a dimension which includes the prefixed identifier that's meant to disambiguate where the dimension comes from. You can see the fully scoped dimension name by running mql list-metrics in the CLI.

MetricFlow implements most joins as left joins, and does not allow fan-out and chasm joins. To make sure we are not constructing dangerous joins, foreign to foreign, primary to foreign, and unique to foreign joins are not allowed.

See the table below for a quick reference to what join types MetricFlow constructs based on the identifiers.

Identifier type - Table AIdentifier type - Table BJoin type
PrimaryPrimaryLeft
PrimaryUniqueLeft
PrimaryForeignFanout (Not Allowed)
UniquePrimaryLeft
UniqueUniqueLeft
UniqueForeignFanout (Not Allowed)
ForeignPrimaryLeft
ForeignUniqueLeft
ForeignForeignFanout (Not Allowed)

Multi-Hop Joins

Transform allows users to join measures and dimensions across a graph of identifiers. We call this a 'multi-hop join' as a user can 'hop' across one table to another.

For example, given the schema below, it would be possible in Transform to create the metric 'Average Purchase Price by Country' using the purchase_price measure from the sales table and the country_name dimension from the country_dim table.

Multi-Hop-Join

We can translate this schema into MetricFlow data sources:

data_source:
name: sales
identifiers:
- name: id
type: primary
- name: user_id
type: foreign
measures:
- name: average_purchase_price
agg: avg
expr: purchase_price
create_metric: true
--
data_source:
name: user_signup
identifiers:
- name: user_id
type: primary
- name: country_id
type: Unique
dimension:
- name: signup_date
type: time
type_params:
is_primary: true
--
data_source:
dame: country_dim
identifiers:
- name: country_id
type: primary
dimension:
- name: country_name
type: categorical

Using Multi-Hop Joins

To query dimensions without a multi-hop join involved, you use the dimension name prefixed with a dunder (double underscore) and identifier. The same paradigm applies for dimensions retrieved by a multi-hop join, except you may have multiple sets of dundered identifiers, since you are hopping across multiple data sources.

An example query using this structure would be as follows. Notice how the dimension user_id__country_id__country_name includes two sets of dunders. One for the identifier user_id and one for the identifier country_id:

MetricFlow

mf query --metrics average_purchase_price dimensions --ds --dimensions user_id__country_id__country_name

Transform

mql query --metrics average_purchase_price dimensions --ds --dimensions user_id__country_id__country_name

In this example - Transform joins the user_signup to sales on user_id, then joins country_dim to user_signup on country_id.