Skip to main content

A lightweight semantic layer that infers metrics and dimensions from your database schema

Project description

💫 Cantrip

Cantrip is a lightweight semantic layer that reads metrics and infers dimensions from your database schema. Define metrics as SQL views, and use views to add metadata like types, join hints, and foreign keys. Cantrip will infer dimensions, join paths, and type information automatically. No configuration needed.

How it works

Cantrip reads your database schema (tables, views, foreign keys, column types) and builds a semantic model:

  • Metrics are SQL views containing a single aggregate expression (e.g., SUM, COUNT, AVG).
  • Dimensions are columns on tables connected via foreign keys.
  • Join paths are discovered automatically by following FK relationships.
  • Metadata (types, join hints, time grains, foreign keys) is declared via annotations on view columns.

When you request a query combining metrics and dimensions, Cantrip figures out which tables to join, how to group, how to filter, and then generates the SQL.

Currently supports SQLite, BigQuery, DuckDB, PostgreSQL, and Snowflake.

Quick example

Given these tables and views in a (SQLite) database:

CREATE TABLE dim_users (
    customer_id INTEGER PRIMARY KEY,
    name TEXT,
    country TEXT
);

CREATE TABLE fact_orders (
    order_id INTEGER PRIMARY KEY,
    buyer_id INTEGER REFERENCES dim_users(customer_id),
    quantity INTEGER,
    unit_price REAL
);

-- A metric: one view, one aggregate
CREATE VIEW total_revenue AS
SELECT SUM(quantity * unit_price) AS "total_revenue [type=REAL]"
FROM fact_orders;

Cantrip automatically discovers:

  • total_revenue as a metric (type REAL)
  • dim_users.name and dim_users.country as dimensions
  • The join path fact_orders.buyer_id → dim_users.customer_id

You can then query:

from sqlalchemy import create_engine
from cantrip.implementations.sqlite import SQLiteSemanticLayer

engine = create_engine("sqlite:///my.db")
sl = SQLiteSemanticLayer(engine)

sv = next(iter(sl.get_semantic_views()))
metrics = sl.get_metrics(sv)
dimensions = sl.get_dimensions(sv)

# Build a query for total_revenue broken down by country
revenue = next(m for m in metrics if m.name == "total_revenue")
country = next(d for d in dimensions if d.name == "dim_users.country")
query = sl.get_query(sv, {revenue}, {country}, set())

print(query.sql)
SELECT
  SUM(quantity * unit_price) AS total_revenue,
  dim_users.country AS "dim_users.country"
FROM fact_orders
JOIN dim_users
  ON fact_orders.buyer_id = dim_users.customer_id
GROUP BY
  dim_users.country

Filters

Dimensions and metrics support Python operators for building filters. The filter type is inferred automatically — dimensions produce WHERE clauses, metrics produce HAVING clauses:

# Filter on a dimension (WHERE)
query = sl.get_query(sv, {revenue}, {country}, {country == "br"})
SELECT
  SUM(quantity * unit_price) AS total_revenue,
  dim_users.country AS "dim_users.country"
FROM fact_orders
JOIN dim_users
  ON fact_orders.buyer_id = dim_users.customer_id
WHERE
  dim_users.country = 'br'
GROUP BY
  dim_users.country
# Filter on a metric (HAVING)
query = sl.get_query(sv, {revenue}, {country}, {revenue > 1000})

Filters can be composed with & (AND), | (OR), and ~ (NOT):

query = sl.get_query(sv, {revenue}, {country}, {
    (country == "br") | (country == "us"),
    revenue > 1000,
})

Additional filter methods:

country.isin("br", "us", "mx")     # IN
quantity.between(10, 100)          # BETWEEN
country.like("%bra%")              # LIKE
country.is_null()                  # IS NULL
country.is_not_null()              # IS NOT NULL

Annotations

Views serve double duty in Cantrip: they define metrics (via aggregate expressions) and carry metadata (via annotations on column aliases or descriptions). The way annotations are expressed depends on the database.

SQLite

In SQLite, annotations are embedded directly in the column alias using [key=value] syntax:

-- Declare the metric type explicitly
CREATE VIEW total_sales AS
SELECT SUM(amount) AS "total_sales [type=REAL]" FROM orders;

-- Hint which FK column to use for joins
CREATE VIEW sellers AS
SELECT COUNT(DISTINCT seller_id) AS "sellers [type=INTEGER, join=seller_id]"
FROM orders;

-- Declare a foreign key when the database doesn't have one
CREATE VIEW fact_events_annotated AS
SELECT user_id AS "user_id [fk=dim_users.customer_id]", amount FROM events;

BigQuery

BigQuery doesn't allow special characters in column aliases. Instead, annotations go in column descriptions using ALTER VIEW ... ALTER COLUMN ... SET OPTIONS(description='...'):

-- Declare the metric type explicitly
CREATE OR REPLACE VIEW `my_project.my_dataset.total_sales` AS
SELECT SUM(amount) AS total_sales FROM orders;

ALTER VIEW `my_project.my_dataset.total_sales`
ALTER COLUMN total_sales SET OPTIONS(description='type=FLOAT64');

-- Hint which FK column to use for joins
CREATE OR REPLACE VIEW `my_project.my_dataset.sellers` AS
SELECT COUNT(DISTINCT seller_id) AS sellers FROM orders;

ALTER VIEW `my_project.my_dataset.sellers`
ALTER COLUMN sellers SET OPTIONS(description='type=INT64, join=seller_id');

-- Declare a foreign key when the database doesn't have one
CREATE OR REPLACE VIEW `my_project.my_dataset.fact_events_annotated` AS
SELECT user_id, amount FROM events;

ALTER VIEW `my_project.my_dataset.fact_events_annotated`
ALTER COLUMN user_id SET OPTIONS(description='fk=dim_users.customer_id');

DuckDB / PostgreSQL

DuckDB and PostgreSQL use the same bracket-style annotations as SQLite:

CREATE VIEW total_sales AS
SELECT SUM(amount) AS "total_sales [type=DOUBLE PRECISION]" FROM orders;

Snowflake

Snowflake annotations use inline column comments in CREATE VIEW:

-- Declare the metric type explicitly
CREATE OR REPLACE VIEW total_sales (
    total_sales COMMENT 'type=FLOAT'
) AS
SELECT SUM(amount) AS total_sales FROM orders;

-- Hint which FK column to use for joins
CREATE OR REPLACE VIEW sellers (
    sellers COMMENT 'type=INTEGER, join=seller_id'
) AS
SELECT COUNT(DISTINCT seller_id) AS sellers FROM orders;

-- Declare a foreign key when the database doesn't have one
CREATE OR REPLACE VIEW fact_events_annotated (
    event_id,
    user_id COMMENT 'fk=dim_users.customer_id',
    amount
) AS
SELECT event_id, user_id, amount FROM events;

Note: COMMENT ON COLUMN only works for tables in Snowflake, not views. The inline CREATE VIEW (col COMMENT '...') syntax is the only way to annotate view columns.

Supported annotation keys

Key Description
type Column type (INTEGER, REAL, TEXT, DATE, DATETIME, etc. for SQLite; INT64, FLOAT64, STRING, etc. for BigQuery)
join Column name to prefer when joining to dimension tables
grain Time grain (year, quarter, month, week, day, hour)
fk Declare a foreign key as table.column (for databases without FK constraints)
partition Partition filter for date-partitioned dimensions (latest or event)

Supported databases

SQLite

pip install cantrip
from cantrip.implementations.sqlite import SQLiteSemanticLayer

BigQuery

pip install cantrip[bigquery]
from cantrip.implementations.bigquery import BigQuerySemanticLayer

Uses INFORMATION_SCHEMA for metadata discovery and supports standard SQL time functions natively.

DuckDB

pip install cantrip[duckdb]
from cantrip.implementations.duckdb import DuckDBSemanticLayer

In-process like SQLite but with PostgreSQL-compatible SQL. Uses INFORMATION_SCHEMA for metadata and bracket-style annotations in column aliases.

PostgreSQL

pip install cantrip[postgres]
from cantrip.implementations.postgres import PostgresSemanticLayer

Uses INFORMATION_SCHEMA for metadata, real FOREIGN KEY constraints for join discovery, and bracket-style annotations in column aliases. Supports FILTER clauses and CTEs for multi-context queries. Optional PostGIS support for spatial dimensions.

Snowflake

pip install cantrip[snowflake]
from cantrip.implementations.snowflake import SnowflakeSemanticLayer

Uses INFORMATION_SCHEMA for metadata, SHOW IMPORTED KEYS / SHOW PRIMARY KEYS for FK/PK discovery, and column comments for annotations. Supports GEOGRAPHY/GEOMETRY spatial types via ST_WITHIN. Identifiers are uppercased by default following Snowflake conventions.

Spatial dimensions

Cantrip supports spatial (geographic) grouping via a geometries table named cantrip__dim_geometries. If present, Cantrip discovers spatial grains (e.g., country, city, zip_code) and exposes them as Space.* dimensions — the geographic counterpart to Time.* dimensions.

How it works

  1. Point column: Your fact table has a geometry/geography column (e.g., POINT in SQLite, GEOGRAPHY in BigQuery).
  2. Geometries table: A cantrip__dim_geometries table defines the spatial regions to group by:
-- SQLite (SpatiaLite)
CREATE TABLE cantrip__dim_geometries (
    id INTEGER PRIMARY KEY,
    grain TEXT NOT NULL,      -- e.g. 'country', 'city'
    value TEXT NOT NULL,      -- e.g. 'United States', 'New York'
    geometry MULTIPOLYGON     -- region boundary
);

-- BigQuery
CREATE TABLE cantrip__dim_geometries (
    grain STRING NOT NULL,
    value STRING NOT NULL,
    geometry GEOGRAPHY NOT NULL
);
  1. Metric: If the fact table has exactly one spatial column, Cantrip finds it automatically. If there are multiple (e.g., pickup_geom and dropoff_geom), use join=<column> to pick one:
-- SQLite: explicit join hint (only needed with multiple spatial columns)
CREATE VIEW pickups AS
SELECT COUNT(*) AS "total_pickups [join=pickup_geom]"
FROM trips;

-- BigQuery equivalent
ALTER VIEW pickups
ALTER COLUMN total_pickups SET OPTIONS(description='join=pickup_geom');
  1. Query: Request Space.* dimensions just like any other dimension:
country = next(d for d in dimensions if d.name == "Space.country")
query = sl.get_query(sv, {checkins_metric}, {country}, set())

GeoJSON output

Spatial dimension values are returned as GeoJSON Feature strings, containing both the region name and its geometry — ready for display or map rendering:

{
  "type": "Feature",
  "properties": {"name": "United States"},
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [[[[-130, 20], [-60, 20], [-60, 55], [-130, 55], [-130, 20]]]]
  }
}

Database support

Database Point type Spatial function Index
SQLite POINT (SpatiaLite) ST_Within R-tree SpatialIndex (physical tables only)
BigQuery GEOGRAPHY ST_WITHIN Automatic
DuckDB GEOMETRY (spatial extension) ST_Within Automatic
PostgreSQL geometry / geography (PostGIS) ST_Within GiST index recommended
Snowflake GEOGRAPHY / GEOMETRY ST_WITHIN Automatic

Calendar table

Cantrip supports an optional calendar dimension table named cantrip__dim_calendar. If present, its columns (except date) become time-related dimensions that are automatically joined to any metric with a date/datetime column:

CREATE TABLE cantrip__dim_calendar (
    date DATE PRIMARY KEY,
    us_holiday TEXT,
    is_weekend BOOLEAN
);

Partitioned dimensions

Data warehouses commonly use date-partitioned slowly-changing dimensions — a full snapshot of every entity per date partition (ds). Cantrip supports these via the partition= annotation.

Wrap the partitioned table in a view with partition=latest on the date column, then declare the FK from your fact table:

-- SQLite / DuckDB / PostgreSQL
CREATE VIEW dim_users AS
SELECT
    user_id,
    name,
    country,
    ds AS "ds [partition=latest]"
FROM dim_all_users;

CREATE VIEW fact_orders_annotated AS
SELECT
    order_id,
    buyer_id AS "buyer_id [fk=dim_users.user_id]",
    quantity
FROM fact_orders;
-- Snowflake
CREATE OR REPLACE VIEW dim_users (
    user_id, name, country,
    ds COMMENT 'partition=latest'
) AS
SELECT user_id, name, country, ds FROM dim_all_users;

Cantrip will:

  1. Exclude the partition column (ds) from dimensions
  2. Add AND dim_users.ds = (SELECT MAX(ds) FROM dim_users) to every join involving that dimension

For point-in-time joins, use partition=event to join at the metric's event time:

-- Join to the user snapshot as of the order date
CREATE VIEW dim_users_historical AS
SELECT
    user_id,
    name,
    country,
    ds AS "ds [partition=event]"
FROM dim_all_users;

Cantrip resolves the event time column automatically when the fact table has a single time column (DATE, TIME, or TIMESTAMP). If the fact table has multiple time columns, use the metric's join= annotation to disambiguate (e.g., join=event_time). The generated predicate is AND dim_users.ds = CAST(fact.event_time AS DATE). The CAST uses the partition column's type — if ds is a TIMESTAMP, the CAST targets TIMESTAMP instead of DATE, preserving hour-level granularity. Only the fact side is cast, so indexes on the dimension's partition column are used.

Installation

pip install cantrip              # SQLite only
pip install cantrip[bigquery]    # BigQuery support
pip install cantrip[duckdb]      # DuckDB support
pip install cantrip[postgres]    # PostgreSQL support
pip install cantrip[snowflake]   # Snowflake support
pip install cantrip[calendar]    # Calendar table generation (requires `holidays`)
pip install cantrip[spatial]     # Spatial dimensions (SpatiaLite, DuckDB spatial, PostGIS, BigQuery, or Snowflake)

Development

git clone https://github.com/betodealmeida/cantrip.git
cd cantrip
uv sync --all-extras
make pr   # runs tests (100% coverage required) + pre-commit hooks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cantrip-0.1.0.tar.gz (191.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cantrip-0.1.0-py3-none-any.whl (66.4 kB view details)

Uploaded Python 3

File details

Details for the file cantrip-0.1.0.tar.gz.

File metadata

  • Download URL: cantrip-0.1.0.tar.gz
  • Upload date:
  • Size: 191.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cantrip-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2d09d5551a97e2feb3a5ae0ad9058195e1f4df51e3bc67215e2c7bfd158691cb
MD5 3b1b1aeeaa909ecd34f4198c83a67014
BLAKE2b-256 af7f372b0eb343d965571b5f85a34e300f8d4d16032a7e4985df3b66a4ad0a1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cantrip-0.1.0.tar.gz:

Publisher: release.yml on betodealmeida/cantrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cantrip-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cantrip-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cantrip-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c28571a44910620dad8708b4ebf5dfb1a15a604c59262de4614f928a5f17895
MD5 f0aaec24855747b7f1b4a1b4e3ba1abe
BLAKE2b-256 059a68203ff41645cc03c085e5629589b012f7f8acf1b82f24100f3683f0c26c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cantrip-0.1.0-py3-none-any.whl:

Publisher: release.yml on betodealmeida/cantrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page