Measures
Measures are the aggregations that can result in either your final metric of desire, and/or serve as the foundational building blocks for a metric.
Measures have a few inputs in the data source configuration: name
, description
, aggregation
, expr
, create_metric
, and non_additive_dimension
.
Name (Required)
The name
of the measure can be chosen by the user or optionally reference a database column directly. If the name
of the measure is different from the column in the table, you must add an expr
that will denote the column name. The name
of the measure will be referenced when creating a metric. Note that measure names must be unique across all data sources in a project.
Description (Required)
The description section describes the measure that is being calculated. It is strongly recommended to create verbose and human-readable descriptions in this field.
Aggregation (Required)
The aggregation determines how the field will be aggregated. For example, a sum
aggregation type over a granularity of day
would sum the values across a given day.
Aggregations supported by MetricFlow are as such:
Aggregation Types | Description |
---|---|
sum | Sum across the values |
min | Minimum across the values |
max | Maximum across the values |
average | Average across the values |
sum_boolean | A sum for a boolean type |
count_distinct | Distinct count of values |
median | Median (p50) calculation across the values |
percentile | Percentile calculation across the values |
tip
All the metrics above, except for count_distinct, are additive measures. MetricFlow can aggregate these agg types sequentially to reuse datasets. As an example, transactions by day could be aggregated to transactions by week. Non-additive measures like count_distinct don't have this property and must be calculated from the source every time. This can be more costly and slower, but the resulting metrics are worth the wait in many cases.
Expr (Optional)
Use the expr
parameter if you have specified a name
parameter that does not match the name of a column in your database table (in this case, name
would effectively serve as an alias to refer to your measure). You may also use the expr
parameter with any valid SQL to manipulate an underlying column name into a specific output.
Note: Always refer to your database-specific SQL when utilizing SQL functions to be used in the expr
parameter as outputs may differ depending on your specific database provider.
caution
If you are using Snowflake and use any week-level function inside the expr
parameter, the function will now return the ISO-standard default of MONDAY.
Examples:
You are running Snowflake warehouse instances with account or session level overrides for the WEEK_START parameter that fix it to a value other than 0 or 1: You will now see MONDAY as the week start in all cases.
You are using the DAYOFWEEK function inside the
expr
parameter with the legacy Snowflake default of WEEK_START = 0: DAYOFWEEK will now return the ISO-standard values of 1 (Monday) through 7 (Sunday) instead of Snowflake’s legacy default values of 0 (Monday) through 6 (Sunday).
Create Metric (optional)
You can optionally choose to create a metric directly from a measure with create_metric: True
as a shortcut for creating a metric.
- Note that the description and owners of the metric created from a measure with
create_metric: True
will thus default to the description and owner of the data source. - You may further specify the display name of the metric via
create_metric_display_name: "Your Metric Display Name"
Note: If you prefer to have a single file location or a single mechanism for storing all of your metric definitions, we do not recommend you use the create_metric: True
shortcut as this allows metric definitions to exist in multiple places across your Transform model.
Example file using various aggregation types
data_source:
name: transactions
description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.
owners: support@transform.co
sql_table: schema.transactions
# --- IDENTIFIERS ---
identifiers:
- name: transaction_id
type: primary
- name: customer_id
type: foreign
- name: store_id
type: foreign
- name: product_id
type: foreign
# --- MEASURES ---
measures:
- name: transaction_amount_usd
description: Total USD value of transactions
expr: transaction_amount_usd
agg: sum
- name: transaction_amount_usd_avg
description: Average USD value of transactions
expr: transaction_amount_usd
agg: average
- name: transaction_amount_usd_max
description: Maximum USD value of transactions
expr: transaction_amount_usd
agg: max
- name: transaction_amount_usd_min
description: Minimum USD value of transactions
expr: transaction_amount_usd
agg: min
- name: quick_buy_transactions
description: The total transactions bought as quick buy
expr: quick_buy_flag
agg: sum_boolean
- name: distinct_transactions_count
description: Distinct count of transactions
expr: transaction_id
agg: count_distinct
- name: transactions
description: The average value of transactions
expr: transaction_amount_usd
agg: average
create_metric: True
create_metric_display_name: Avg Value of Transactions (Daily)
- name: transactions_amount_usd_valid #Notice here how we use expr to compute the aggregation based on a condition
description: The total USD value of valid transactions only
expr: CASE WHEN is_valid = True then 1 else 0 end
agg: sum
- name: transactions
description: The average value of transactions.
expr: transaction_amount_usd
agg: average
create_metric: True
create_metric_display_name: Avg Value of Transactions (Daily)
- name: p99_transaction_value
description: The 99th percentile transaction value
expr: transaction_amount_usd
agg: percentile
agg_params:
percentile: .99
use_discrete_percentile: False #False will calculate the discrete percentile and True will calculate the continuous percentile
create_metric: True
- name: median_transaction_value
description: The median transaction value
expr: transaction_amount_usd
agg: median
create_metric: True
# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('day', ts) #expr refers to underlying column ts
type_params:
is_primary: true
time_granularity: day
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end
Non-Additive Dimensions (Optional)
Some measures cannot be aggregated over certain dimensions (typically time), as it may produce incorrect results. Examples include bank account balances (where it does not make sense to carry over balances month-to-month), and monthly recurring revenue (you cannot sum up daily recurring revenue to achieve monthly recurring revenue). For such measures, we provide the ability to specify non-additive dimensions for measures.
note
The current support for non-additive dimensions assumes dimensional tables, otherwise the calculation of certain measures may not be correct.
For example, suppose you have a subscription-based business and want to calculate the monthly recurring revenue of users (their active subscriptions) at the end of the month. To calculate the monthly recurring revenue (MRR), the underlying dimensional table would require a daily snapshot of the date, user, and subscription revenue.
To demonstrate the config files for non-additive measures, consider a subscription table that includes 1 row per date of the registered user, the user's active subscription plan(s), and the plan's subscription value (revenue) -- with the following columns:
date_transaction
: The daily date-spine.user_id
: The id pertaining to the registered usersubscription_plan
: A column to indicate the subscription plan IDsubscription_value
: A column to indicate the monthly subscription value (revenue) of a particular subscription plan ID.
Parameters under the non_additive_dimension
will specify dimensions that the measure should not be aggregated over.
name
: Required. This will be the name of the time dimension (that has already been defined in the data source) that the measure should not be aggregated over.window_choice
: Required. Choose eithermin
ormax
, wheremin
reflects the beginning of the time period andmax
reflects the end of the time period.window_groupings
: Optional. Provide the identifiers that you would like to group by.
data_source:
name: subscription_table
description: A subscription table with one row per date for each active user and their subscription plans.
owners:
- user@company.com
sql_table: your_schema.subscription_table
identifiers:
- name: user_id
type: foreign
dimensions:
- name: ds
type: time
expr: date_transaction
type_params:
is_primary: True
time_granularity: day
measures:
- name: count_users_end_of_month
description: Count of users at the end of the month
expr: 1
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: min
- name: mrr_end_of_month
description: Aggregate by summing all users' active subscription plans at end of month
expr: subscription_value
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: max
- name: mrr_by_user_end_of_month
description: Group by user_id to achieve each user's MRR at the end of the month
expr: subscription_value
agg: sum
create_metric: True
non_additive_dimension:
name: ds
window_choice: max
window_groupings:
- user_id
We can proceed to query the semi-additive metrics (note the create_metric: True
parameter) as such:
- Note the inclusion of the non-additive time dimension in the CLI and the specified granularity. You can easily change to a weekly basis if desired via
metric_time__week
, and Transform's framework will adjust the window to reflect weekly granularity for the appropriate min/max snapshot.
MetricFlow:
mf query --metrics mrr_by_end_of_month --dimensions metric_time__month --order metric_time__month
mf query --metrics mrr_by_end_of_month --dimensions metric_time__week --order metric_time__week
Transform:
mql query --metrics mrr_by_end_of_month --dimensions metric_time__month --order metric_time__month
mql query --metrics mrr_by_end_of_month --dimensions metric_time__week --order metric_time__week