Skip to main content

Measures

Measures are the aggregations that can result in either your final metric of desire, and/or serve as the foundational building blocks for a metric.

Measures have a few inputs in the data source configuration: name, description, aggregation, expr, create_metric, and non_additive_dimension.

Name (Required)

The name of the measure can be chosen by the user or optionally reference a database column directly. If the name of the measure is different from the column in the table, you must add an expr that will denote the column name. The name of the measure will be referenced when creating a metric. Note that measure names must be unique across all data sources in a project.

Description (Required)

The description section describes the measure that is being calculated. It is strongly recommended to create verbose and human-readable descriptions in this field.

Aggregation (Required)

The aggregation determines how the field will be aggregated. For example, a sum aggregation type over a granularity of day would sum the values across a given day.

Aggregations supported by MetricFlow are as such:

Aggregation TypesDescription
sumSum across the values
minMinimum across the values
maxMaximum across the values
averageAverage across the values
sum_booleanA sum for a boolean type
count_distinctDistinct count of values
medianMedian (p50) calculation across the values
percentilePercentile calculation across the values
tip

All the metrics above, except for count_distinct, are additive measures. MetricFlow can aggregate these agg types sequentially to reuse datasets. As an example, transactions by day could be aggregated to transactions by week. Non-additive measures like count_distinct don't have this property and must be calculated from the source every time. This can be more costly and slower, but the resulting metrics are worth the wait in many cases.

Expr (Optional)

Use the expr parameter if you have specified a name parameter that does not match the name of a column in your database table (in this case, name would effectively serve as an alias to refer to your measure). You may also use the expr parameter with any valid SQL to manipulate an underlying column name into a specific output.

Note: Always refer to your database-specific SQL when utilizing SQL functions to be used in the expr parameter as outputs may differ depending on your specific database provider.

caution

If you are using Snowflake and use any week-level function inside the expr parameter, the function will now return the ISO-standard default of MONDAY.

Examples:

  1. You are running Snowflake warehouse instances with account or session level overrides for the WEEK_START parameter that fix it to a value other than 0 or 1: You will now see MONDAY as the week start in all cases.

  2. You are using the DAYOFWEEK function inside the expr parameter with the legacy Snowflake default of WEEK_START = 0: DAYOFWEEK will now return the ISO-standard values of 1 (Monday) through 7 (Sunday) instead of Snowflake’s legacy default values of 0 (Monday) through 6 (Sunday).

Create Metric (optional)

You can optionally choose to create a metric directly from a measure with create_metric: True as a shortcut for creating a metric.

  • Note that the description and owners of the metric created from a measure with create_metric: True will thus default to the description and owner of the data source.
  • You may further specify the display name of the metric via create_metric_display_name: "Your Metric Display Name"

Note: If you prefer to have a single file location or a single mechanism for storing all of your metric definitions, we do not recommend you use the create_metric: True shortcut as this allows metric definitions to exist in multiple places across your Transform model.

Example file using various aggregation types

data_source:
name: transactions
description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.
owners: support@transform.co
sql_table: schema.transactions

# --- IDENTIFIERS ---
identifiers:
- name: transaction_id
type: primary
- name: customer_id
type: foreign
- name: store_id
type: foreign
- name: product_id
type: foreign

# --- MEASURES ---
measures:
- name: transaction_amount_usd
description: Total USD value of transactions
expr: transaction_amount_usd
agg: sum
- name: transaction_amount_usd_avg
description: Average USD value of transactions
expr: transaction_amount_usd
agg: average
- name: transaction_amount_usd_max
description: Maximum USD value of transactions
expr: transaction_amount_usd
agg: max
- name: transaction_amount_usd_min
description: Minimum USD value of transactions
expr: transaction_amount_usd
agg: min
- name: quick_buy_transactions
description: The total transactions bought as quick buy
expr: quick_buy_flag
agg: sum_boolean
- name: distinct_transactions_count
description: Distinct count of transactions
expr: transaction_id
agg: count_distinct
- name: transactions
description: The average value of transactions
expr: transaction_amount_usd
agg: average
create_metric: True
create_metric_display_name: Avg Value of Transactions (Daily)
- name: transactions_amount_usd_valid #Notice here how we use expr to compute the aggregation based on a condition
description: The total USD value of valid transactions only
expr: CASE WHEN is_valid = True then 1 else 0 end
agg: sum
- name: transactions
description: The average value of transactions.
expr: transaction_amount_usd
agg: average
create_metric: True
create_metric_display_name: Avg Value of Transactions (Daily)
- name: p99_transaction_value
description: The 99th percentile transaction value
expr: transaction_amount_usd
agg: percentile
agg_params:
percentile: .99
use_discrete_percentile: False #False will calculate the discrete percentile and True will calculate the continuous percentile
create_metric: True
- name: median_transaction_value
description: The median transaction value
expr: transaction_amount_usd
agg: median
create_metric: True

# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('day', ts) #expr refers to underlying column ts
type_params:
is_primary: true
time_granularity: day
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end

Non-Additive Dimensions (Optional)

Some measures cannot be aggregated over certain dimensions (typically time), as it may produce incorrect results. Examples include bank account balances (where it does not make sense to carry over balances month-to-month), and monthly recurring revenue (you cannot sum up daily recurring revenue to achieve monthly recurring revenue). For such measures, we provide the ability to specify non-additive dimensions for measures.

note

The current support for non-additive dimensions assumes dimensional tables, otherwise the calculation of certain measures may not be correct.

For example, suppose you have a subscription-based business and want to calculate the monthly recurring revenue of users (their active subscriptions) at the end of the month. To calculate the monthly recurring revenue (MRR), the underlying dimensional table would require a daily snapshot of the date, user, and subscription revenue.

To demonstrate the config files for non-additive measures, consider a subscription table that includes 1 row per date of the registered user, the user's active subscription plan(s), and the plan's subscription value (revenue) -- with the following columns:

  • date_transaction: The daily date-spine.
  • user_id: The id pertaining to the registered user
  • subscription_plan: A column to indicate the subscription plan ID
  • subscription_value: A column to indicate the monthly subscription value (revenue) of a particular subscription plan ID.

Parameters under the non_additive_dimension will specify dimensions that the measure should not be aggregated over.

  • name: Required. This will be the name of the time dimension (that has already been defined in the data source) that the measure should not be aggregated over.
  • window_choice: Required. Choose either min or max, where min reflects the beginning of the time period and max reflects the end of the time period.
  • window_groupings: Optional. Provide the identifiers that you would like to group by.
data_source:
name: subscription_table
description: A subscription table with one row per date for each active user and their subscription plans.
owners:
- user@company.com
sql_table: your_schema.subscription_table

identifiers:
- name: user_id
type: foreign

dimensions:
- name: ds
type: time
expr: date_transaction
type_params:
is_primary: True
time_granularity: day

measures:
- name: count_users_end_of_month
description: Count of users at the end of the month
expr: 1
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: min
- name: mrr_end_of_month
description: Aggregate by summing all users' active subscription plans at end of month
expr: subscription_value
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: max
- name: mrr_by_user_end_of_month
description: Group by user_id to achieve each user's MRR at the end of the month
expr: subscription_value
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: max
window_groupings:
- user_id

We can proceed to query the semi-additive metrics (note the create_metric: True parameter) as such:

  • Note the inclusion of the non-additive time dimension in the CLI and the specified granularity. You can easily change to a weekly basis if desired via metric_time__week, and Transform's framework will adjust the window to reflect weekly granularity for the appropriate min/max snapshot.

MetricFlow:

mf query --metrics mrr_by_end_of_month --dimensions metric_time__month --order metric_time__month 
mf query --metrics mrr_by_end_of_month --dimensions metric_time__week --order metric_time__week

Transform:

mql query --metrics mrr_by_end_of_month --dimensions metric_time__month --order metric_time__month 
mql query --metrics mrr_by_end_of_month --dimensions metric_time__week --order metric_time__week