Skip to main content

No project description provided

Project description

DotCube

dotML

⚠️ still in alpha ⚠️ dotML is a lightweight semantic layer with minimal abstractions. It enables fetching metrics, filtered by dimensions, by parsing metric queries based on a data model. The data model defines dimensions and metrics similar to LookML and supports joins on primary keys. The data model is defined in YAML files. The package generates a SQL query that you can use to fetch your results.

Installation

pip install dotml

Usage

dotML can be used both as a Python package and through its command line interface.

As a Python Library

You can import the load_cubes, and generate_sql_query functions from their respective modules to use in your Python code.

CLI Commands

# list all cubes
dotml cubes

# list all metrics for a cube
dotml fields <cube_name>

# query a set of metrics and dimensions
dotml query "<query_json>"

The query command expects a JSON5 string as its argument. Here's an example:

dotml query "{
'fields': ['orders.booking_date_month', 'orders.revenue', 'orders_items.quantity'],
'filters': ['${orders.country_id} = 67'],
'sorts': ['orders.booking_date_month'],
'limit': 100
}"

Is this for me?

dotML is for you if are a tool builder and want to:

  1. query metrics and dimensions consistently and correctly
  2. want to support highly flexible analytics without writing complex SQL

We build dotML for Dot, but believe it can be useful for other tool builders as well.

In the future, dotML might also be for you if you are a data analyst and want to:

  1. have consistent metrics across all your tools
  2. implement a semantic layer for your data warehouse

Why dotML?

OLAP cubes, hypercubes, metric stores and semantic layers are all different names for the same thing: a data model that defines dimensions and metrics, and a query engine that can parse metric queries and generate SQL queries to fetch the results. All the existing solutions we found are tightly coupled with a specific tool or database, and are not easy to use with other tools.

What about dbt Metricflow?

There is a chance that dbt's metricflow could take this place, but today ( 15.06.2023):

  1. it does not have a permissive license for tool builders,
  2. is still in beta and,
  3. is pretty complex: 1359 files vs 12 files of dotML
  4. seems harder to learn, because of more abstractions in the data model

However, dotML only produces sql queries based on the defined data model. There is no UI, no caching, and no permissions.

Today it does:

  1. support joins on primary keys
  2. nested metrics and dimensions
  3. post-processing of metrics with window function (e.g. average over 7 days, cumulative sum, etc.)

Defining Cubes

Cubes are defined in YAML files with the following structure:

name: <cube_name>
table: <table_path>
dimensions:
  - name: <dimension_name>
    sql: <sql_expression>
  ...
metrics:
  - name: <measure_name>
    sql: <sql_expression>
  ...

Basically all SQL databases are supported: PostgreSQL, Snowflake, Redshift, BigQuery, Databricks SQL, Trino, Druid, Oracle, MSSQL ...

Full Example

Let's say we have an online store with the following tables.

Orders

id booking_date country_id status total
1 2023-05-12 68 cancelled 109
2 2023-06-02 62 cancelled 32
3 2023-04-20 67 confirmed 213
4 2023-04-30 64 cancelled 22
5 2023-04-28 65 delivered 5
6 2023-06-13 69 shipped 98
7 2023-04-17 62 delivered 30

Orders Items

id order_id product_id quantity price
10 1 8 9 26
20 2 15 8 91
21 2 2 3 20
30 3 19 9 5
31 3 13 10 71
32 3 15 4 26
40 4 17 9 10

Each table gets modeled as a cube in dotML. A cube encodes the business logic on how data should be aggregated in dimensions and metrics.

cubes:
  - name: orders
    table: my_db.prod.orders
    always_filter:
      - "${table}.booking_date >= '2019-01-01'"
      - "${table}.status = 'confirmed'"
    dimensions:
      - name: id
        sql: ${table}.id
        primary_key: true
      - name: booking_date
        sql: date_trunc(${time_frame},${table}.booking_date)
        variants:
          - time_frame: [day, week, month, quarter, year]
      - name: country_id
        sql: ${table}.country_id
    metrics:
      - name: revenue
        sql: sum(${table}.total)
      - name: average_order_value
        sql: avg(${table}.total)
      - name: revenue_big_orders
        sql: sum( iff(${table}.total > 100, ${table}.total, 0) )
    window_metrics:
      - name: average_order_value_rolling_30_days
        sql: sum(${revenue}) over (order by ${booking_date_day} rows between 30 preceding and current row)

  - name: orders_items
    table: my_db.prod.orders_line_items
    dimensions:
      - name: id
        sql: ${table}.id
        primary_key: true
      - name: order_id
        sql: ${table}.order_id
      - name: product
        sql: ${table}.product_id
    metrics:
      - name: quantity
        sql: sum(${table}.quantity)

Now, we can query the data model with dotML.
Notice how we are querying metrics across different cubes.
dotML automatically prevents join fanouts

from dotML import load_cubes, generate_sql_query

cubes = load_cubes("data_model.yaml")
query = {
    "fields": ["orders.booking_date_month", "orders.revenue", "orders_items.quantity"],
    "filters": ["${orders.country_id} = '67'"],
    "sorts": ["orders.booking_date_month"],
    "limit": 100
}
sql_query = generate_sql_query(cubes, query)
# here you can execute the sql query with your favorite database client

The result looks like this:

booking_date_month revenue quantity
2023-04-01 802 125
2023-05-01 1329 212
2023-06-01 1689 158

The generated SQL query looks like this:

with orders_R5S_dimension as (
    select  orders_R5S.id as pk0
    from demo_db.public.orders as orders_R5S
    where (orders_R5S.booking_date >= '2019-01-01') and (orders_R5S.status = 'confirmed')
    group by 1
), orders_items_0B7_dimension as (
    select  
        orders_items_0B7.id as pk0,
        date_trunc(month,orders_R5S.booking_date) as booking_date_month
    from demo_db.public.orders_line_items as orders_items_0B7
    right join demo_db.public.orders as orders_R5S
          on orders_R5S.id = orders_items_0B7.order_id
    group by 1, 2
), orders_R5S_metrics as (
    select  date_trunc(month,orders_R5S.booking_date) as booking_date_month,
            orders_R5S.country_id,
            sum(orders_R5S.total) as revenue
    from demo_db.public.orders as orders_R5S 
            join orders_R5S_dimension as orders_R5S_dimension 
            on orders_R5S.id = orders_R5S_dimension.pk0
    where (orders_R5S.booking_date >= '2019-01-01') and (orders_R5S.status = 'confirmed')
    group by 1
), orders_items_0B7_metrics as (
    select  orders_items_0B7_dimension.booking_date_month,
            orders_items_0B7_dimension.country_id,
    sum(orders_items_0B7.quantity) as quantity
    from demo_db.public.orders_line_items as orders_items_0B7 
            join orders_items_0B7_dimension as orders_items_0B7_dimension 
            on orders_items_0B7.id = orders_items_0B7_dimension.pk0
    group by 1
)
select 
    orders_R5S_metrics.booking_date_month, 
    orders_R5S_metrics.revenue, 
    orders_items_0B7_metrics.quantity
from orders_R5S_metrics as orders_R5S_metrics
join orders_items_0B7_metrics as orders_items_0B7_metrics
    on orders_R5S.booking_date_month = orders_items_0B7.booking_date_month 
    and orders_R5S.country_id = orders_items_0B7.country_id
where (orders_R5S_metrics.country_id = '67')
order by 1 limit 100

Contributing

If you have suggestions for how dotML could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

Two areas where we would especially appreciate help are:

  • a Tableau compiler, so that dotML cubes can be used as a data source in Tableau (or another BI tool, you care about)
  • a dbt_package_wrapper, so that cubes can get materialized by dbt

Reach out to: hi@sled.so

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dotml-0.1.9.tar.gz (19.0 kB view hashes)

Uploaded Source

Built Distribution

dotml-0.1.9-py3-none-any.whl (15.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page