Skip to main content

Dimensions

Dimensions Overview

Dimensions are non-aggregatable expressions that are used to define the level of aggregation that a MetricFlow user would like a metric to be aggregated to. You can also think of dimensions as the various data cuts (the slices and dices) that you would like to see a metric grouped by. Dimensions cannot be aggregated, so they are then considered to be a property of the primary or unique identifiers of the table.

Dimensions are defined within data sources along with measures and identifiers. You can think of dimensions as the columns in your data table or query that are non-aggregatable — they provide categorical or time-based context to enrich queries, and are usually included in the GROUP BY clause of your SQL query.

Consider the following data source as the basis for the examples to follow.

data_source:
name: transactions
description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.
owners: support@transform.co
sql_table: schema.transactions

# --- IDENTIFIERS ---
identifiers:
- name: transaction_id
type: primary
- name: customer_id
type: foreign
- name: store_id
type: foreign
- name: product_id
type: foreign

# --- MEASURES ---
measures:
- name: revenue
description:
expr: price * quantity
agg: sum

# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('day', ts)
type_params:
is_primary: true
time_granularity: day
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end

All dimensions require a name, type and in most cases, an expr parameter.

  • name refers to the name for the dimension, which will be exposed to the end user. The name parameter can also act as an alias if the underlying column or SQL query reference is a different string within the expr parameter.

    • Note that dimension names must be unique within a data source, but they are not required to be unique across different data sources. MetricFlow will identify the correct dimension via the proper join identifiers. For further discussion on join paths, see the Join logic page.
  • type refers to the type of dimension created within the data source. Current types include:

    • Categorical: Usually a way to group rows within a table, such as geography, product type, color, etc.
    • Time: Time dimensions will point to an underlying date field in the warehouse. All time dimensions must be TIMESTAMP or the DB engine equivalent.
  • expr (optional) refers to the underlying column or SQL query to define the dimension. If no expr parameter is present for a dimension, MetricFlow will look for a column that matches the name parameter.

    • For valid inputs into the expr parameter: you may input the column name itself, or a valid SQL expression such as a case statement.

Dimensions Types

Categorical Dimensions

Category dimensions allow metrics to be grouped by different "category" slices and dices, such as product type, color, or geographical area. These may point to existing columns in the data table, or be calculated using a SQL expression with the expr parameter.

As different business functional areas may have different ways to categorize the data, it can be very valuable to create new dimensions to capture those bespoke groupings.

In the example file provided below, is_bulk_transaction is a categorical dimension where we have provided a case statement for further grouping based on the underlying column quantity. is_bulk_transaction will then be available to the end user to filter or group by in the UI.

Example:

dimensions: 
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end

Time Dimensions

tip

Time dimension columns must be datetime data type if you are using BigQuery as your Data Warehouse. If they are stored as timestamp or another data type, you can cast the dimensions to datatime in the expr property. i.e expr: cast(timestamp as datetime) All time dimensions must be TIMESTAMP or the DB engine equivalent.

Time dimensions are used to aggregate metrics against different levels of time granularity. Current granularities supported by MetricFlow include day, week, month, quarter, and year (see the parameter time_granularity for further details).

Time dimensions in MetricFlow have a few more parameters, which are specified under the type_params section, as seen in the example below.

Example:

dimensions: 
- name: created_at
type: time
expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table
is_partition: True
type_params:
is_primary: True
time_granularity: day
- name: deleted_at
type: time
expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table
is_partition: True
type_params:
is_primary: False
time_granularity: day

measures:
- name: users_deleted
expr: 1
agg: sum
agg_time_dimension: deleted_at
create_metric: True
- name: users_created
expr: 1
agg: sum
create_metric: True

Within type_params:

  • is_primary: Set the is_primary parameter to True to indicate if a time dimension is to be the primary, or preferred, time dimension for a measure or metric.

    • Note that a primary time dimension is required for data sources that contain measures. The primary time dimension is used for graphing the x-axis within the Transform UI. For materializations, the primary time dimension is used as the common time dimension for all metrics. If you have other time dimensions in your data source, these time dimensions will require is_primary: False.

    • To assign a non-primary time dimension to measures in MetricFlow, specify an agg_time_dimension parameter for the measure and refer to any time dimension that is defined in the dimension section.

      • In the example above, we have a data source with two different time dimensions defined as created_at and deleted_at.

        • Within the data source, the time dimension created_at has been specified as the primary time dimension via is_primary: True.
        • The time dimension deleted_at has been specified as the non-primary time dimension via is_primary: False.
        • The measure users_created defaults to the primary time dimension (created_at)
        • On the other hand, the measure users_deleted utilizes the time dimension deleted_at as its time dimension via the parameter agg_time_dimension.
      • Below, the CLI Command example shows querying a single metric by its time dimension. For a single metric, the default time dimension is the specified primary time dimension, which may be referred to as metric_time or the time dimension's name itself.

        • MetricFlow
        # Both CLI queries below will return `users_created` by the primary time dimension `created_at`)
        mf query --metrics users_created --dimensions metric_time --order metric_time
        mf query --metrics users_created --dimensions created_at --order created_at

        # Specify a time dimension by name if you choose to query the measure with a time dimension that has not been specified as its agg_time_dimension
        mf query --metrics users_deleted --dimensions created_at --order created_at
        • Transform
        # Both CLI queries below will return `users_created` by the primary time dimension `created_at`)
        mql query --metrics users_created --dimensions metric_time --order metric_time
        mql query --metrics users_created --dimensions created_at --order created_at

        # Specify a time dimension by name if you choose to query the measure with a time dimension that has not been specified as its agg_time_dimension
        mql query --metrics users_deleted --dimensions created_at --order created_at

      • Below, the CLI Command example shows querying multiple metrics by differing time dimensions. The MetricFlow framework recognizes the multiple time dimensions that are referenced by two separate metrics: users_created utilizes the primary time dimension created_at while users_deleted utilizes time dimension deleted_at. The dimension argument metric_time will identify the necessary primary time dimensions and agg_time_dimensions and join the metric values together based on the conjoined time spines.

        • MetricFlow:
        mf query --metrics users_created,users_deleted --dimensions metric_time --order metric_time 
        • Transform:
        mql query --metrics users_created,users_deleted --dimensions metric_time --order metric_time 
  • time_granularity: Time granularity refers to the minimum granularity that a measure or metric should be reported by. For example, a metric that is specified with weekly granularity would not be allowed to be shown with daily granularity.

    • Current options for time granularity are day, week, month, quarter, and year.
    • Moreover, the specified time granularity allows for proper aggregation when querying metrics of different granularities. For example, when querying two metrics (one with a DAY granularity and another with a MONTH granularity), MetricFlow will only return a result with the MONTH granularity by default.
    • Within the Transform UI, the specified time granularity also allows graphs to render correctly (e.g. a DAY granularity cannot be specified for a MONTHLY metric).
  • is_partition: To assert that a dimension exists over a specific time window(e.g. a date-partitioned dimensional table), add the is_partition: True parameter to the dimension. MetricFlow will add this date to any joins on this table to ensure that the correct dimensional values are joined to measures in other tables.

MetricFlow also allows for easy aggregation of metrics on the fly via the time-granularity option in the CLI. Suppose we have a measure named messages_per_month where the original time_granularity of the time dimension metric_time was set to monthly as seen below:

Example:

data_source:
name: messages_month
description: Each row represents a message truncated to the month.
owners: support@transform.co
sql_table: schema.messages

# --- IDENTIFIERS ---
identifiers:
- name: message_id
type: primary
- name: user_id_from
type: foreign
- name: user_id_to
type: foreign
- name: message_user_from_to # Composite key
type: unique
identifiers:
- ref: message_id
- ref: user_id_from
- ref: user_id_to

# --- MEASURES ---
measures:
- name: messages_by_month
description: The total number of messages sent by month
expr: 1
agg: sum
create_metric: True

# --- DIMENSIONS ---
dimensions:
- name: ds
type: time
expr: date_trunc('month', ts) #Underlying column is named “ts”
type_params:
is_primary: true
time_granularity: month

The CLI command example below shows aggregating the messages_per_month metric up to yearly via the time-granularity option:

MetricFlow:

mf query --metrics messages_per_month --dimensions metric_time --order metric_time --time-granularity year  

Transform:

mql query --metrics messages_per_month --dimensions metric_time --order metric_time --time-granularity year