A lightweight semantic layer that infers metrics and dimensions from your database schema

These details have not been verified by PyPI

Project description

💫 Cantrip

Cantrip is a lightweight semantic layer that reads metrics and infers dimensions from your database schema. Define metrics as SQL views, and use views to add metadata like types, join hints, and foreign keys. Cantrip will infer dimensions, join paths, and type information automatically. No configuration needed.

How it works

Cantrip reads your database schema (tables, views, foreign keys, column types) and builds a semantic model:

Metrics are SQL views containing a single aggregate expression (e.g., SUM, COUNT, AVG).
Dimensions are columns on tables connected via foreign keys.
Join paths are discovered automatically by following FK relationships.
Metadata (types, join hints, time grains, foreign keys) is declared via annotations on view columns.

When you request a query combining metrics and dimensions, Cantrip figures out which tables to join, how to group, how to filter, and then generates the SQL.

Currently supports:

SQLite
BigQuery
Postgres
DuckDB
Snowflake
MySQL
Redshift
Trino
Databricks
SQL Server
ClickHouse
Oracle

Suggest a database here.

Quick example

Given these tables and views in a (SQLite) database:

CREATE TABLE dim_users (
    customer_id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE fact_orders (
    order_id INTEGER PRIMARY KEY,
    buyer_id INTEGER REFERENCES dim_users(customer_id),
    quantity INTEGER,
    unit_price REAL
);

-- A metric: one view, one aggregate
CREATE VIEW total_revenue AS
SELECT SUM(quantity * unit_price) AS "total_revenue [type=REAL]"
FROM fact_orders;

Cantrip automatically discovers:

total_revenue as a metric (type REAL)
dim_users.name and dim_users.country as dimensions
The join path fact_orders.buyer_id → dim_users.customer_id

You can then query:

from sqlalchemy import create_engine
from cantrip.implementations.sqlite import SQLiteSemanticLayer

engine = create_engine("sqlite:///my.db")
sl = SQLiteSemanticLayer(engine)

sv = next(iter(sl.get_semantic_views()))
metrics = sl.get_metrics(sv)
dimensions = sl.get_dimensions(sv)

# Build a query for total_revenue broken down by country
revenue = next(m for m in metrics if m.name == "total_revenue")
country = next(d for d in dimensions if d.name == "dim_users.country")
query = sl.get_query(sv, {revenue}, {country}, set())

print(query.sql)

SELECT
  SUM(quantity * unit_price) AS total_revenue,
  dim_users.country AS "dim_users.country"
FROM fact_orders
JOIN dim_users
  ON fact_orders.buyer_id = dim_users.customer_id
GROUP BY
  dim_users.country

Filters

Dimensions and metrics support Python operators for building filters. The filter type is inferred automatically — dimensions produce WHERE clauses, metrics produce HAVING clauses:

# Filter on a dimension (WHERE)
query = sl.get_query(sv, {revenue}, {country}, {country == "br"})

SELECT
  SUM(quantity * unit_price) AS total_revenue,
  dim_users.country AS "dim_users.country"
FROM fact_orders
JOIN dim_users
  ON fact_orders.buyer_id = dim_users.customer_id
WHERE
  dim_users.country = 'br'
GROUP BY
  dim_users.country

# Filter on a metric (HAVING)
query = sl.get_query(sv, {revenue}, {country}, {revenue > 1000})

Filters can be composed with & (AND), | (OR), and ~ (NOT):

query = sl.get_query(sv, {revenue}, {country}, {
    (country == "br") | (country == "us"),
    revenue > 1000,
})

Additional filter methods:

country.isin("br", "us", "mx")     # IN
quantity.between(10, 100)          # BETWEEN
country.like("%bra%")              # LIKE
country.is_null()                  # IS NULL
country.is_not_null()              # IS NOT NULL

Annotations

Views serve double duty in Cantrip: they define metrics (via aggregate expressions) and carry metadata (via annotations on column aliases or descriptions). The way annotations are expressed depends on the database.

SQLite

In SQLite, annotations are embedded directly in the column alias using [key=value] syntax:

-- Declare the metric type explicitly
CREATE VIEW total_sales AS
SELECT SUM(amount) AS "total_sales [type=REAL]" FROM orders;

-- Hint which FK column to use for joins
CREATE VIEW sellers AS
SELECT COUNT(DISTINCT seller_id) AS "sellers [type=INTEGER, join=seller_id]"
FROM orders;

-- Declare a foreign key when the database doesn't have one
CREATE VIEW fact_events_annotated AS
SELECT user_id AS "user_id [fk=dim_users.customer_id]", amount FROM events;

BigQuery

BigQuery doesn't allow special characters in column aliases. Instead, annotations go in column descriptions using ALTER VIEW ... ALTER COLUMN ... SET OPTIONS(description='...'):

-- Declare the metric type explicitly
CREATE OR REPLACE VIEW `my_project.my_dataset.total_sales` AS
SELECT SUM(amount) AS total_sales FROM orders;

ALTER VIEW `my_project.my_dataset.total_sales`
ALTER COLUMN total_sales SET OPTIONS(description='type=FLOAT64');

-- Hint which FK column to use for joins
CREATE OR REPLACE VIEW `my_project.my_dataset.sellers` AS
SELECT COUNT(DISTINCT seller_id) AS sellers FROM orders;

ALTER VIEW `my_project.my_dataset.sellers`
ALTER COLUMN sellers SET OPTIONS(description='type=INT64, join=seller_id');

-- Declare a foreign key when the database doesn't have one
CREATE OR REPLACE VIEW `my_project.my_dataset.fact_events_annotated` AS
SELECT user_id, amount FROM events;

ALTER VIEW `my_project.my_dataset.fact_events_annotated`
ALTER COLUMN user_id SET OPTIONS(description='fk=dim_users.customer_id');

DuckDB / PostgreSQL

DuckDB and PostgreSQL use the same bracket-style annotations as SQLite:

CREATE VIEW total_sales AS
SELECT SUM(amount) AS "total_sales [type=DOUBLE PRECISION]" FROM orders;

Snowflake

Snowflake annotations use inline column comments in CREATE VIEW:

-- Declare the metric type explicitly
CREATE OR REPLACE VIEW total_sales (
    total_sales COMMENT 'type=FLOAT'
) AS
SELECT SUM(amount) AS total_sales FROM orders;

-- Hint which FK column to use for joins
CREATE OR REPLACE VIEW sellers (
    sellers COMMENT 'type=INTEGER, join=seller_id'
) AS
SELECT COUNT(DISTINCT seller_id) AS sellers FROM orders;

-- Declare a foreign key when the database doesn't have one
CREATE OR REPLACE VIEW fact_events_annotated (
    event_id,
    user_id COMMENT 'fk=dim_users.customer_id',
    amount
) AS
SELECT event_id, user_id, amount FROM events;

Note: COMMENT ON COLUMN only works for tables in Snowflake, not views. The inline CREATE VIEW (col COMMENT '...') syntax is the only way to annotate view columns.

Supported annotation keys

Key	Description
`type`	Column type (`INTEGER`, `REAL`, `TEXT`, `DATE`, `DATETIME`, etc. for SQLite; `INT64`, `FLOAT64`, `STRING`, etc. for BigQuery)
`join`	Column name to prefer when joining to dimension tables
`grain`	Time grain (`year`, `quarter`, `month`, `week`, `day`, `hour`)
`fk`	Declare a foreign key as `table.column` (for databases without FK constraints)
`partition`	Partition filter for date-partitioned dimensions (`latest` or `event`)
`geo_grain`	Geographic grain for enrichment joins (e.g., `country`, `us_state`, `zip_code`)

Supported databases

SQLite

pip install cantrip

from cantrip.implementations.sqlite import SQLiteSemanticLayer

BigQuery

pip install cantrip[bigquery]

from cantrip.implementations.bigquery import BigQuerySemanticLayer

Uses INFORMATION_SCHEMA for metadata discovery and supports standard SQL time functions natively.

DuckDB

pip install cantrip[duckdb]

from cantrip.implementations.duckdb import DuckDBSemanticLayer

In-process like SQLite but with PostgreSQL-compatible SQL. Uses INFORMATION_SCHEMA for metadata and bracket-style annotations in column aliases.

PostgreSQL

pip install cantrip[postgres]

from cantrip.implementations.postgres import PostgresSemanticLayer

Uses INFORMATION_SCHEMA for metadata, real FOREIGN KEY constraints for join discovery, and bracket-style annotations in column aliases. Supports FILTER clauses and CTEs for multi-context queries. Optional PostGIS support for spatial dimensions.

Snowflake

pip install cantrip[snowflake]

from cantrip.implementations.snowflake import SnowflakeSemanticLayer

Uses INFORMATION_SCHEMA for metadata, SHOW IMPORTED KEYS / SHOW PRIMARY KEYS for FK/PK discovery, and column comments for annotations. Supports GEOGRAPHY/GEOMETRY spatial types via ST_WITHIN. Identifiers are uppercased by default following Snowflake conventions.

Spatial dimensions

Cantrip supports spatial (geographic) grouping via a geometries table named cantrip__dim_geometries. If present, Cantrip discovers spatial grains (e.g., country, city, zip_code) and exposes them as Space.* dimensions — the geographic counterpart to Time.* dimensions.

How it works

Point column: Your fact table has a geometry/geography column (e.g., POINT in SQLite, GEOGRAPHY in BigQuery).
Geometries table: A cantrip__dim_geometries table defines the spatial regions to group by:

-- SQLite (SpatiaLite)
CREATE TABLE cantrip__dim_geometries (
    id INTEGER PRIMARY KEY,
    grain TEXT NOT NULL,      -- e.g. 'country', 'city'
    value TEXT NOT NULL,      -- e.g. 'United States', 'New York'
    geometry MULTIPOLYGON     -- region boundary
);

-- BigQuery
CREATE TABLE cantrip__dim_geometries (
    grain STRING NOT NULL,
    value STRING NOT NULL,
    geometry GEOGRAPHY NOT NULL
);

Metric: If the fact table has exactly one spatial column, Cantrip finds it automatically. If there are multiple (e.g., pickup_geom and dropoff_geom), use join=<column> to pick one:

-- SQLite: explicit join hint (only needed with multiple spatial columns)
CREATE VIEW pickups AS
SELECT COUNT(*) AS "total_pickups [join=pickup_geom]"
FROM trips;

-- BigQuery equivalent
ALTER VIEW pickups
ALTER COLUMN total_pickups SET OPTIONS(description='join=pickup_geom');

Query: Request Space.* dimensions just like any other dimension:

country = next(d for d in dimensions if d.name == "Space.country")
query = sl.get_query(sv, {checkins_metric}, {country}, set())

GeoJSON output

Spatial dimension values are returned as GeoJSON Feature strings, containing both the region name and its geometry — ready for display or map rendering:

{
  "type": "Feature",
  "properties": {"name": "United States"},
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [[[[-130, 20], [-60, 20], [-60, 55], [-130, 55], [-130, 20]]]]
  }
}

Database support

Database	Point type	Spatial function	Index
SQLite	`POINT` (SpatiaLite)	`ST_Within`	R-tree `SpatialIndex` (physical tables only)
BigQuery	`GEOGRAPHY`	`ST_WITHIN`	Automatic
DuckDB	`GEOMETRY` (spatial extension)	`ST_Within`	Automatic
PostgreSQL	`geometry` / `geography` (PostGIS)	`ST_Within`	GiST index recommended
Snowflake	`GEOGRAPHY` / `GEOMETRY`	`ST_WITHIN`	Automatic

Calendar table

Cantrip supports an optional calendar dimension table named cantrip__dim_calendar. If present, its columns (except date) become time-related dimensions that are automatically joined to any metric with a date/datetime column:

CREATE TABLE cantrip__dim_calendar (
    date DATE PRIMARY KEY,
    us_holiday TEXT,
    is_weekend BOOLEAN
);

Partitioned dimensions

Data warehouses commonly use date-partitioned slowly-changing dimensions — a full snapshot of every entity per date partition (ds). Cantrip supports these via the partition= annotation.

Wrap the partitioned table in a view with partition=latest on the date column, then declare the FK from your fact table:

-- SQLite / DuckDB / PostgreSQL
CREATE VIEW dim_users AS
SELECT
    user_id,
    name,
    country,
    ds AS "ds [partition=latest]"
FROM dim_all_users;

CREATE VIEW fact_orders_annotated AS
SELECT
    order_id,
    buyer_id AS "buyer_id [fk=dim_users.user_id]",
    quantity
FROM fact_orders;

-- Snowflake
CREATE OR REPLACE VIEW dim_users (
    user_id, name, country,
    ds COMMENT 'partition=latest'
) AS
SELECT user_id, name, country, ds FROM dim_all_users;

Cantrip will:

Exclude the partition column (ds) from dimensions
Add AND dim_users.ds = (SELECT MAX(ds) FROM dim_users) to every join involving that dimension

For point-in-time joins, use partition=event to join at the metric's event time:

-- Join to the user snapshot as of the order date
CREATE VIEW dim_users_historical AS
SELECT
    user_id,
    name,
    country,
    ds AS "ds [partition=event]"
FROM dim_all_users;

Cantrip resolves the event time column automatically when the fact table has a single time column (DATE, TIME, or TIMESTAMP). If the fact table has multiple time columns, use the metric's join= annotation to disambiguate (e.g., join=event_time). The generated predicate is AND dim_users.ds = CAST(fact.event_time AS DATE). The CAST uses the partition column's type — if ds is a TIMESTAMP, the CAST targets TIMESTAMP instead of DATE, preserving hour-level granularity. Only the fact side is cast, so indexes on the dimension's partition column are used.

Derived metrics

Define metrics as expressions over other metrics — ratios, differences, index scores — without writing raw aggregations:

-- Click-through rate: ratio of two metrics
CREATE VIEW ctr AS
SELECT total_clicks / total_views AS "ctr [type=REAL]"
FROM total_clicks
CROSS JOIN total_views;

-- Revenue per unit with a description
CREATE VIEW revenue_per_unit AS
SELECT
    'Revenue per unit sold' AS cantrip__description,
    total_revenue / total_units_sold AS "revenue_per_unit [type=REAL]"
FROM total_revenue
CROSS JOIN total_units_sold;

A derived metric is detected automatically when a view's SELECT has no aggregation and all its FROM sources are other metrics. Use CROSS JOIN to combine metrics (each metric returns a single row).

Column names and annotations

For databases that use bracket annotations (DuckDB, Postgres, SQLite), brackets become part of the column name, which prevents derived metrics from referencing sub-metrics by clean names. Use the view column-renaming syntax to get clean DB column names while preserving annotations:

-- Column rename gives clean name "total_revenue" for derived metrics to reference
CREATE VIEW total_revenue (cantrip__description, total_revenue) AS
SELECT
    'Total revenue' AS cantrip__description,
    SUM(quantity * unit_price) AS "total_revenue [type=FLOAT, join=event_time]"
FROM fact_orders;

BigQuery and Snowflake don't need this — their annotations live in column descriptions/comments, not aliases.

Postgres note: Postgres strips the column-renaming header and normalizes SELECT aliases to match, losing bracket annotations. For Postgres, derived metrics can reference the full annotated column names — cantrip automatically strips the brackets when matching:

-- Postgres: reference the full annotated column name
CREATE VIEW revenue_per_unit AS
SELECT "total_revenue [type=FLOAT8, join=event_time]"
     / "total_units_sold [type=INTEGER]"
     AS "revenue_per_unit [type=FLOAT8]"
FROM total_revenue
CROSS JOIN total_units_sold;

Querying derived metrics

Derived metrics work like any other metric — break down by dimensions, filter, sort, combine with regular metrics:

metrics = sl.get_metrics(sv)
ctr = next(m for m in metrics if m.name == "ctr")
revenue = next(m for m in metrics if m.name == "total_revenue")

# Derived metric alone
query = sl.get_query(sv, {ctr}, set(), set())

# Mixed with regular metrics and dimensions
dims = sl.get_dimensions(sv)
country = next(d for d in dims if d.name == "dim_users.country")
query = sl.get_query(sv, {ctr, revenue}, {country}, set())

Compatible dimensions

A derived metric's compatible dimensions are the intersection of its sub-metrics' dimensions. If total_clicks can be broken down by {page, country} and total_views by {page, campaign}, then ctr can only be broken down by {page}.

# Discovery works as usual
compatible = sl.get_compatible_dimensions(sv, {ctr}, set())

Nested derived metrics

Derived metrics can reference other derived metrics:

CREATE VIEW margin AS
SELECT total_revenue - total_cost AS "margin [type=REAL]"
FROM total_revenue CROSS JOIN total_cost;

CREATE VIEW margin_pct AS
SELECT margin / total_revenue * 100 AS "margin_pct [type=REAL]"
FROM margin CROSS JOIN total_revenue;

Cantrip resolves the full graph to leaf (aggregate) metrics, with cycle detection.

Descriptions

Add a human-readable description to any metric, dimension view, or population by including a cantrip__description column with a string literal:

-- Metric with description
CREATE VIEW total_revenue AS
SELECT
    SUM(quantity * unit_price) AS "total_revenue [type=REAL]",
    'Total revenue from all completed orders' AS cantrip__description
FROM fact_orders;

-- Dimension view with description
CREATE VIEW dim_users_annotated AS
SELECT
    user_id,
    name,
    country,
    'User demographics and account info' AS cantrip__description
FROM dim_users;

The description column is automatically stripped from the view before validation — it doesn't count as a metric expression or dimension column. The description is available on the Metric and Dimension objects via the .description attribute (or None when absent). Unlike annotations, descriptions work identically across all backends — no special syntax needed.

Populations

Populations are named, pre-filtered subsets of dimension tables. They let analysts define reusable segments — "Brazilian users", "premium customers" — as SQL views, without needing to know column names or values.

Defining populations

A population is any view that:

Is a valid transform (no GROUP BY, no DISTINCT, no aggregations)
Has a WHERE clause
Reads from exactly one dimension table

CREATE VIEW brazilian_users AS
SELECT * FROM dim_users WHERE country = 'BR';

CREATE VIEW premium_users AS
SELECT * FROM dim_users WHERE tier = 'premium';

Populations are detected automatically — no annotations needed.

Adding a description

CREATE VIEW brazilian_users AS
SELECT
    *,
    'Users from Brazil' AS cantrip__description
FROM dim_users
WHERE country = 'BR';

Using populations in queries

sv = next(iter(sl.get_semantic_views()))
metrics = sl.get_metrics(sv)
dimensions = sl.get_dimensions(sv)
populations = sl.get_populations(sv)

revenue = next(m for m in metrics if m.name == "total_revenue")
category = next(d for d in dimensions if d.name == "dim_products.category")

br = next(p for p in populations if p.name == "brazilian_users")
premium = next(p for p in populations if p.name == "premium_users")

query = sl.get_query(sv, {revenue}, {category}, set(), populations={br, premium})

This generates a UNION ALL query — one sub-query per population — each with a population label column:

SELECT
  'brazilian_users' AS "population",
  SUM(quantity * unit_price) AS total_revenue,
  dim_products.category AS "dim_products.category"
FROM fact_orders
JOIN dim_users ON fact_orders.buyer_id = dim_users.customer_id
JOIN dim_products ON fact_orders.product_id = dim_products.id
WHERE dim_users.country = 'BR'
GROUP BY dim_products.category

UNION ALL

SELECT
  'premium_users' AS "population",
  SUM(quantity * unit_price) AS total_revenue,
  dim_products.category AS "dim_products.category"
FROM fact_orders
JOIN dim_users ON fact_orders.buyer_id = dim_users.customer_id
JOIN dim_products ON fact_orders.product_id = dim_products.id
WHERE dim_users.tier = 'premium'
GROUP BY dim_products.category

Discovery

Use get_compatible_populations to discover which populations are compatible with a set of metrics. A population is valid when its source dimension table is reachable from the metric's fact tables:

# Which populations work with the revenue metric?
valid_pops = sl.get_compatible_populations(sv, {revenue}, set())

# Which metrics work with the brazilian_users population?
valid_metrics = sl.get_compatible_metrics(sv, set(), set(), populations={br})

# Which dimensions work with the revenue metric?
valid_dims = sl.get_compatible_dimensions(sv, {revenue}, set())

All three get_compatible_* methods share the same signature (sv, metrics, dimensions, populations) — each parameter acts as a constraint that narrows the results.

Enrichment dimensions

Enrichment dimensions let you join external data — weather, demographics, economic indicators — to your metrics automatically. Any table named cantrip__dim_<name> (other than the reserved cantrip__dim_calendar and cantrip__dim_geometries) becomes an enrichment source.

How it works

Enrichment tables use reserved column names to declare their join axes:

Reserved column	Axis	Joined via
`date`	Time	`CAST(fact.event_time AS DATE)`
`geo_grain`	Space	Spatial join or `geo_grain=` annotation
`geo_value`	Space	Spatial join or `geo_grain=` annotation

All other columns become dimensions in a virtual namespace derived from the table name.

Defining enrichment tables

-- Time-only: economic indicators
CREATE TABLE cantrip__dim_economics (
    date              DATE PRIMARY KEY,
    cpi               REAL,
    unemployment_rate REAL
);

-- Space-only: demographics by region
CREATE TABLE cantrip__dim_demographics (
    geo_grain TEXT,
    geo_value TEXT,
    population INTEGER,
    median_income REAL,
    PRIMARY KEY (geo_grain, geo_value)
);

-- Time + Space: daily weather by region
CREATE TABLE cantrip__dim_weather (
    date      DATE,
    geo_grain TEXT,
    geo_value TEXT,
    temp_high REAL,
    is_sunny  BOOLEAN,
    PRIMARY KEY (date, geo_grain, geo_value)
);

These produce dimensions like Economics.cpi, Demographics.median_income, Weather.is_sunny.

Availability

Enrichment dimensions are only available for metrics that satisfy the required axes:

Enrichment axes	Metric requires
Time only	An event time column
Space only	A spatial column or a `geo_grain=` annotation at a matching grain
Time + Space	Both

If a metric doesn't have the required axis, the enrichment dimensions simply don't appear — no error, they're just not compatible.

Query generation

When you request an enrichment dimension, Cantrip generates the appropriate JOIN:

econ_cpi = next(d for d in dimensions if d.name == "Economics.cpi")
query = sl.get_query(sv, {revenue}, {econ_cpi}, set())

SELECT
  SUM(quantity * unit_price) AS total_revenue,
  cantrip__dim_economics.cpi AS "Economics.cpi"
FROM fact_orders
JOIN cantrip__dim_economics
  ON cantrip__dim_economics.date = CAST(fact_orders.event_time AS DATE)
GROUP BY
  cantrip__dim_economics.cpi

Space-dependent enrichments piggyback on the spatial join when Space.* dimensions are in the query.

The `geo_grain=` annotation

Dimension tables often have a column that identifies a geographic region — a country code, state abbreviation, FIPS code, etc. The geo_grain= annotation tells Cantrip that a column holds a resolved geo value at a specific grain, enabling enrichment joins without a spatial column or geometries table.

Place the geo_grain= annotation on a dimension view, then reference it from a fact transform via an fk= annotation:

-- SQLite / DuckDB / PostgreSQL

-- Step 1: Annotate the dimension view with geo_grain=
CREATE VIEW dim_users_geo AS
SELECT
    customer_id,
    name,
    country AS "country [geo_grain=country]"
FROM dim_users;

-- Step 2: Fact transform declares FK to the geo-annotated view
CREATE VIEW fact_orders_geo AS
SELECT
    order_id,
    buyer_id AS "buyer_id [fk=dim_users_geo.customer_id]",
    quantity,
    unit_price,
    event_time
FROM fact_orders;

-- Step 3: Metric reads from the transform
CREATE VIEW geo_revenue AS
SELECT SUM(quantity * unit_price) AS "geo_revenue [type=REAL, join=event_time]"
FROM fact_orders_geo;

-- BigQuery (column descriptions)
CREATE OR REPLACE VIEW dim_users_geo AS
SELECT customer_id, name, country FROM dim_users;

ALTER VIEW dim_users_geo
ALTER COLUMN country SET OPTIONS(description='geo_grain=country');

CREATE OR REPLACE VIEW fact_orders_geo AS
SELECT order_id, buyer_id, quantity, unit_price, event_time FROM fact_orders;

ALTER VIEW fact_orders_geo
ALTER COLUMN buyer_id SET OPTIONS(description='fk=dim_users_geo.customer_id');

-- Snowflake (column comments)
CREATE OR REPLACE VIEW dim_users_geo (
    customer_id, name,
    country COMMENT 'geo_grain=country'
) AS SELECT customer_id, name, country FROM dim_users;

CREATE OR REPLACE VIEW fact_orders_geo (
    order_id,
    buyer_id COMMENT 'fk=dim_users_geo.customer_id',
    quantity, unit_price, event_time
) AS SELECT order_id, buyer_id, quantity, unit_price, event_time FROM fact_orders;

With this setup, space-dependent enrichment tables join directly on the annotated column — no ST_Within needed:

SELECT
  SUM(quantity * unit_price) AS geo_revenue,
  cantrip__dim_demographics.median_income AS "Demographics.median_income"
FROM fact_orders
JOIN dim_users_geo
  ON fact_orders.buyer_id = dim_users_geo.customer_id
JOIN cantrip__dim_demographics
  ON cantrip__dim_demographics.geo_grain = 'country'
  AND cantrip__dim_demographics.geo_value = dim_users_geo.country
GROUP BY
  cantrip__dim_demographics.median_income

A few things to note:

The annotated column remains a regular dimension — unlike partition=, it is not excluded from the dimension set. dim_users_geo.country can still be grouped by directly.
The annotation only satisfies enrichment tables that have data at the matching grain. If the enrichment table only has us_state data, a geo_grain=country annotation doesn't help.
When both a Space.* dimension and a geo_grain= annotation are in the same query, the spatial join is preferred — it's already present, so the enrichment piggybacks on it.
A view can have multiple geo_grain= annotations at different grains (e.g., geo_grain=country and geo_grain=us_state).
When a metric's fact table has spatial columns and a geometries table exists, enrichment dimensions that need space will use the spatial join automatically — even if no Space.* dimension is explicitly selected.

Enrichment data scripts

Cantrip includes scripts to download real-world enrichment data and generate backend-optimized DDL (with indexes, partitioning, and clustering). All scripts are in scripts/ and output SQL to stdout (pipe to your database) or load directly via --db.

Demographics (US Census)

Downloads population, median income, and median age from the Census Bureau American Community Survey. No API key required.

# Generate SQL for your backend
python scripts/generate_demographics.py sqlite > demographics.sql
python scripts/generate_demographics.py bigquery --table my_project.ds.cantrip__dim_demographics

# Load directly into a database
python scripts/generate_demographics.py postgres --db postgresql://user:pass@localhost/mydb

# Choose grains (default: us_state + us_county)
python scripts/generate_demographics.py sqlite --grains us_state
python scripts/generate_demographics.py sqlite --grains us_state us_county

Economics (FRED)

Downloads CPI, unemployment rate, federal funds rate, and GDP from the Federal Reserve Economic Data API. Forward-fills monthly/quarterly series to daily rows. Requires a free API key from fred.stlouisfed.org.

# Set API key via argument or environment variable
python scripts/generate_economics.py sqlite --api-key YOUR_KEY
export FRED_API_KEY=YOUR_KEY
python scripts/generate_economics.py postgres | psql mydb

# Custom date range (default: last 5 years)
python scripts/generate_economics.py sqlite --start 2020-01-01 --end 2024-12-31

Weather (Open-Meteo)

Downloads historical daily weather (temperature, precipitation, snowfall) from the Open-Meteo Archive API. Uses state capitals and country capitals as representative points. No API key required.

# State-level weather (default: last year, 50 states)
python scripts/generate_weather.py sqlite

# Country-level weather
python scripts/generate_weather.py bigquery --grains country

# Both grains with custom date range
python scripts/generate_weather.py postgres --grains us_state country --start 2023-01-01

# Direct load
python scripts/generate_weather.py snowflake --db snowflake://user:pass@account/DB/SCHEMA

Incremental updates (cron jobs)

Scripts with a time axis (economics, weather) support --append mode for daily incremental loads. Append mode skips DDL and deletes existing rows in the date range before inserting — making reruns idempotent:

# Daily cron: fetch yesterday's weather and append
python scripts/generate_weather.py postgres \
    --db $DATABASE_URL \
    --start $(date -d yesterday +%Y-%m-%d) \
    --end $(date -d yesterday +%Y-%m-%d) \
    --append

# Or generate SQL for piping
python scripts/generate_economics.py postgres \
    --api-key $FRED_API_KEY \
    --start $(date -d yesterday +%Y-%m-%d) \
    --end $(date -d yesterday +%Y-%m-%d) \
    --append | psql $DATABASE_URL

Backend-specific optimizations

The scripts automatically generate optimized DDL for each backend:

Backend	Time-only	Space-only	Time+Space
SQLite	PK on `date`	PK on `(geo_grain, geo_value)`	PK on `(date, geo_grain, geo_value)`
DuckDB	PK on `date`	PK on `(geo_grain, geo_value)`	PK on `(date, geo_grain, geo_value)`
Postgres	PK on `date`	PK on `(geo_grain, geo_value)`	PK + index on `(geo_grain, geo_value)`
BigQuery	`PARTITION BY date`	`CLUSTER BY geo_grain, geo_value`	`PARTITION BY date CLUSTER BY geo_grain, geo_value`
Snowflake	`CLUSTER BY (date)`	`CLUSTER BY (geo_grain, geo_value)`	`CLUSTER BY (date, geo_grain, geo_value)`

Installation

pip install cantrip              # SQLite only
pip install cantrip[bigquery]    # BigQuery support
pip install cantrip[duckdb]      # DuckDB support
pip install cantrip[postgres]    # PostgreSQL support
pip install cantrip[snowflake]   # Snowflake support
pip install cantrip[calendar]    # Calendar table generation (requires `holidays`)
pip install cantrip[spatial]     # Spatial dimensions (SpatiaLite, DuckDB spatial, PostGIS, BigQuery, or Snowflake)

Development

git clone https://github.com/betodealmeida/cantrip.git
cd cantrip
uv sync --all-extras
make pr   # runs tests (100% coverage required) + pre-commit hooks

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Mar 17, 2026

0.2.0

Mar 17, 2026

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cantrip-0.3.0.tar.gz (247.8 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cantrip-0.3.0-py3-none-any.whl (79.1 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file cantrip-0.3.0.tar.gz.

File metadata

Download URL: cantrip-0.3.0.tar.gz
Upload date: Mar 17, 2026
Size: 247.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cantrip-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`9d9976678cece0fafdf1f2f5b68692f7f598af4d8c4f671cd68ced8d519d9c76`
MD5	`c98b9ba02e4f73ec0b983a83930c8789`
BLAKE2b-256	`8d2624ea982faf9d2887a09b55df1917335926dc072baf3da51535cedf095f02`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cantrip-0.3.0.tar.gz:

Publisher: release.yml on betodealmeida/cantrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cantrip-0.3.0.tar.gz
- Subject digest: 9d9976678cece0fafdf1f2f5b68692f7f598af4d8c4f671cd68ced8d519d9c76
- Sigstore transparency entry: 1113741747
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: betodealmeida/cantrip@c959343926bf06e66f4256a425b64661857e84c5
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/betodealmeida
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c959343926bf06e66f4256a425b64661857e84c5
- Trigger Event: push

File details

Details for the file cantrip-0.3.0-py3-none-any.whl.

File metadata

Download URL: cantrip-0.3.0-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 79.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cantrip-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`975ffdc1094caab2bb88e8abe54d043452637111564e794fb352fa48ecacd73e`
MD5	`cd0b2d6513605cb2726ad3e855b068f5`
BLAKE2b-256	`fcb9193a228c73c3038b7d34670b88b09706e9757b352c19246195123095cd98`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cantrip-0.3.0-py3-none-any.whl:

Publisher: release.yml on betodealmeida/cantrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cantrip-0.3.0-py3-none-any.whl
- Subject digest: 975ffdc1094caab2bb88e8abe54d043452637111564e794fb352fa48ecacd73e
- Sigstore transparency entry: 1113741791
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: betodealmeida/cantrip@c959343926bf06e66f4256a425b64661857e84c5
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/betodealmeida
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c959343926bf06e66f4256a425b64661857e84c5
- Trigger Event: push

cantrip 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

💫 Cantrip

How it works

Quick example

Filters

Annotations

SQLite

BigQuery

DuckDB / PostgreSQL

Snowflake

Supported annotation keys

Supported databases

SQLite

BigQuery

DuckDB

PostgreSQL

Snowflake

Spatial dimensions

How it works

GeoJSON output

Database support

Calendar table

Partitioned dimensions

Derived metrics

Column names and annotations

Querying derived metrics

Compatible dimensions

Nested derived metrics

Descriptions

Populations

Defining populations

Adding a description

Using populations in queries

Discovery

Enrichment dimensions

How it works

Defining enrichment tables

Availability

Query generation

The geo_grain= annotation

Enrichment data scripts

Demographics (US Census)

Economics (FRED)

Weather (Open-Meteo)

Incremental updates (cron jobs)

Backend-specific optimizations

Installation

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

The `geo_grain=` annotation