Python Open-source semantic layer for Data engineers

These details have not been verified by PyPI

Project description

PySemantic

A lightweight, graph-based Semantic Layer for Python and SQL.

Define metrics once. Generate SQL everywhere.

Your browser does not support the video tag. Watch the demo here.

What is PySemantic?

PySemantic lets you define your data models as Python objects -- dimensions, measures, and entity relationships -- and generates correct, optimized SQL from simple metric queries. No more hand-writing joins, no more duplicated business logic scattered across dashboards and notebooks.

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")

sql = sl.query(
    measures=["total_order_price", "total_number_of_distinct_orders"],
    dimensions=["customer_state", "customer_city"],
    filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
    order_by=["total_order_price DESC"],
    limit=10,
)

SELECT
  customers.customer_city AS customer_city,
  customers.customer_state AS customer_state,
  SUM(order_items.price) AS total_order_price,
  COUNT(DISTINCT order_items.order_id) AS total_number_of_distinct_orders
FROM order_items
LEFT JOIN orders
  ON order_items.order_id = orders.order_id
LEFT JOIN customers
  ON orders.customer_id = customers.customer_id
WHERE
  customers.customer_state IN ('SP', 'RJ')
GROUP BY
  1,
  2
ORDER BY
  total_order_price DESC
LIMIT 10

CLI Magic:

pysemantic query

Joins, table references, WHERE vs HAVING -- all resolved automatically from your model definitions.

Multi-Fact Queries

Need measures from multiple fact tables? PySemantic automatically generates CTE-based SQL that avoids the chasm trap — a pitfall where naive joins between fact tables inflate aggregates due to row fan-out.

multi-fact query

sql = sl.query(
    measures=["total_order_price", "total_customers"],  # order_items + customers
    dimensions=["customer_state"],                       # conformed dimension
    filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
)

WITH cte_order_items AS (
  SELECT
    customers.customer_state AS customer_state,
    SUM(order_items.price) AS total_order_price
  FROM order_items
  LEFT JOIN orders ON order_items.order_id = orders.order_id
  LEFT JOIN customers ON orders.customer_id = customers.customer_id
  WHERE customers.customer_state IN ('SP', 'RJ')
  GROUP BY 1
),
cte_customers AS (
  SELECT
    customers.customer_state AS customer_state,
    COUNT(customers.customer_id) AS total_customers
  FROM customers
  WHERE customers.customer_state IN ('SP', 'RJ')
  GROUP BY 1
)
SELECT
  COALESCE(cte_order_items.customer_state, cte_customers.customer_state) AS customer_state,
  cte_order_items.total_order_price,
  cte_customers.total_customers
FROM cte_order_items
FULL OUTER JOIN cte_customers
  ON cte_order_items.customer_state = cte_customers.customer_state

Key guarantees:

Conformed dimensions only — all dimensions must be reachable from every fact table. Non-conformed dimensions raise NonConformedDimensionError.
Intelligent Filter Pushdown — conformed dimension filters are pushed into every CTE's WHERE; non-conformed filters (Grand Total only) are routed to only the CTEs that can reach them; measure filters are applied to the outer query. Zero manual routing.
Grand Total exception — querying with no dimensions produces scalar aggregates via CROSS JOIN, even across unrelated facts.

Why PySemantic?

	Feature	Description
1	Single source of truth	Define a metric once, use it everywhere
2	Automatic join resolution	Declare entity relationships; PySemantic finds the path
3	SQL injection safe	Structured filters with operator whitelisting and value escaping
4	Multi-fact queries	Combine measures from multiple tables with CTE-based SQL; no chasm trap
5	Intelligent Filter Pushdown	Filters are automatically routed to the correct CTE or outer query — conformed, fact-specific, or measure
6	Dialect support	MySQL, Postgres, and more via SQLGlot
7	Zero infrastructure	Pure Python, no server required at definition time
8	Interactive Studio	Explore your models, graph, and test queries in the browser
9	CLI powered by Typer	Generate SQL and launch the Studio from the terminal

Installation

pip install pysemantic-layer

Or with Poetry:

poetry add pysemantic-layer

Requires Python 3.11+

Quick Start

1. Define your models

Create a directory (e.g. models/) with one Python file per table:

# models/orders.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType

orders = Model(
    name="orders",
    table="orders",
    primary_key="order_id",
    dimensions=[
        Dimension(name="order_status", column="order_status", dtype="string"),
    ],
    measures=[
        Measure(name="total_orders", agg="count", column="order_id"),
        Measure(name="unique_customers", agg="distinct_count", column="customer_id"),
    ],
    entities=[
        Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
        Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
    ],
)

# models/customers.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType

model = Model(
    name="customers",
    table="customers",
    primary_key="customer_id",
    dimensions=[
        Dimension(name="customer_city", column="customer_city", dtype="string"),
        Dimension(name="customer_state", column="customer_state", dtype="string"),
    ],
    measures=[
        Measure(name="total_customers", agg="count", column="customer_id"),
    ],
    entities=[
        Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
    ],
)

PySemantic automatically discovers the join path: orders.customer_id (FOREIGN) links to customers.customer_id (PRIMARY) through the shared entity name customer.

2. Query

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")

sql = sl.query(
    measures=["total_orders"],
    dimensions=["customer_city"],
    filters=[{"field": "customer_state", "operator": "=", "value": "SP"}],
    order_by=["total_orders DESC"],
    limit=10,
)
print(sql)

Generated SQL:

SELECT
  customers.customer_city AS customer_city,
  COUNT(orders.order_id) AS total_orders
FROM orders
LEFT JOIN customers
  ON orders.customer_id = customers.customer_id
WHERE
  customers.customer_state = 'SP'
GROUP BY 1
ORDER BY total_orders DESC
LIMIT 10

PySemantic Studio

PySemantic ships with a built-in interactive web UI powered by Streamlit.

pysemantic studio ./models

pysemantic studio

The Studio has three tabs:

Tab	What it does
Entity Graph	Interactive visualization of your model relationships. Click nodes to isolate, fullscreen mode, drag & zoom.
Data Dictionary	Browse all registered models with their measures, dimensions, entities, and column mappings.
Query Playground	Pick measures & dimensions from dropdowns, add filters, click "Generate SQL" and see the output. Multi-fact supported: combining measures from multiple models produces CTE-based SQL; non-conformed dimensions surface clear errors in real-time.

# Custom port and light theme
pysemantic studio ./models --port 8080 --light

CLI

PySemantic includes a full command-line interface powered by Typer.

pysemantic --help

`pysemantic studio` -- Launch the web UI

pysemantic studio ./models
pysemantic studio ./models --port 8080 --light

`pysemantic query` -- Generate SQL from the terminal

# Single measure
pysemantic query ./models -m total_orders -d customer_city --limit 10

# Multiple measures (comma-separated or repeated)
pysemantic query ./models -m "total_orders,unique_customers" -d order_status

# With filters
pysemantic query ./models \
  -m total_order_price \
  -d customer_state \
  -f "customer_state IN ('SP', 'RJ')" \
  -f "total_order_price > 100" \
  --order-by "total_order_price DESC" \
  --limit 5

Filter syntax: "field OPERATOR value" -- supports =, !=, >, <, >=, <=, IN, NOT IN, LIKE, IS, IS NOT.

`pysemantic graph` -- Export entity graph

pysemantic graph ./models -o my_graph.html

pysemantic graph

Generates a standalone interactive HTML file with your entity graph.

Core Concepts

Model

A Model maps to a database table and defines its semantic metadata:

Component	Purpose	Example
Dimensions	Columns to group or filter by	`customer_city`, `order_status`
Measures	Aggregated metrics	`SUM(price)`, `COUNT(DISTINCT id)`
Entities	Relationship keys (PRIMARY / FOREIGN)	`order_id`, `customer_id`

Entities & Join Resolution

Entities define how models connect. A PRIMARY entity declares ownership of a concept; a FOREIGN entity references it:

order_items  ──(FK: order)──>  orders  ──(FK: customer)──>  customers
     |
     |──(FK: seller)──>  sellers
     |──(FK: product)──>  products

When you query a measure from order_items with a dimension from customers, PySemantic automatically traverses the graph and generates the required LEFT JOIN chain.

Supported Aggregations

`agg` value	SQL output
`sum`	`SUM(column)`
`count`	`COUNT(column)`
`avg`	`AVG(column)`
`min`	`MIN(column)`
`max`	`MAX(column)`
`distinct_count`	`COUNT(DISTINCT column)`

Filters

Filters are passed as dictionaries:

{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}
{"field": "total_order_price", "operator": ">", "value": "1000"}
{"field": "order_status", "operator": "IS", "value": None}

Dimension filters go to WHERE; measure filters go to HAVING -- automatically.

Supported operators: =, !=, >, <, >=, <=, IN, NOT IN, LIKE, ILIKE, IS, IS NOT

SQL Dialect

PySemantic defaults to MySQL and supports all dialects provided by SQLGlot. Pass the dialect parameter when initializing SemanticLayer:

sl = SemanticLayer(model_path="./models", dialect="postgres")

Dialect	Value
MySQL	`"mysql"` (default)
PostgreSQL	`"postgres"`
BigQuery	`"bigquery"`
Snowflake	`"snowflake"`
DuckDB	`"duckdb"`
Databricks	`"databricks"`
Redshift	`"redshift"`
ClickHouse	`"clickhouse"`
Trino / Presto	`"trino"` / `"presto"`
SQLite	`"sqlite"`
... and more	See SQLGlot dialect list

Architecture

User Query (measures, dimensions, filters)
    |
    v
+----------+     +----------+     +----------+
|   AST    |---->| Planner  |---->|Generator |---> SQL string
| (Parser) |     |(Resolver)|     |(Compiler)|
+----------+     +----------+     +----------+
                      |
                +-----+-----+
                | Registry  |
                | + Entity  |
                |   Graph   |
                +-----------+

Layer	Responsibility
AST	Parses raw input into a structured, validated syntax tree
Registry	Loads model files, validates them, builds the entity graph
Planner	Resolves measures/dimensions to models, calculates join paths; detects single-fact vs multi-fact and enforces conformed dimensions
Generator	Translates the logical plan into dialect-specific SQL (flat query or CTE-based for multi-fact)

Initializing the Semantic Layer

PySemantic supports two ways to load your models. Use whichever fits your workflow — the query API is identical either way.

Option 1: Model Directory (recommended for projects)

Point to a directory of .py files. Each file exports a Model object. PySemantic auto-discovers and loads them all.

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")

models/
├── order_items.py   # exports Model(name="order_items", ...)
├── orders.py        # exports Model(name="orders", ...)
├── customers.py     # exports Model(name="customers", ...)
└── products.py      # exports Model(name="products", ...)

Option 2: Explicit Model List (great for notebooks & tests)

Define Model objects inline and pass them directly — no files needed.

from pysemantic.client import SemanticLayer
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType

customers = Model(
    name="customers",
    table="customers",
    primary_key="customer_id",
    dimensions=[
        Dimension(name="customer_city", column="customer_city", dtype="string"),
        Dimension(name="customer_state", column="customer_state", dtype="string"),
    ],
    measures=[
        Measure(name="total_customers", agg="count", column="customer_id"),
    ],
    entities=[
        Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
    ],
)

orders = Model(
    name="orders",
    table="orders",
    primary_key="order_id",
    dimensions=[
        Dimension(name="order_status", column="order_status", dtype="string"),
    ],
    measures=[
        Measure(name="total_orders", agg="count", column="order_id"),
    ],
    entities=[
        Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
        Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
    ],
)

sl = SemanticLayer(models=[customers, orders])

Hot-Reload (for Jupyter / REPL)

Switch models or reload from disk without restarting the kernel:

sl.reload(model_path="./updated_models")   # reload from a different directory
sl.reload(models=[customers, orders])       # reload with a new model list

Note: You must provide either model_path or models, never both. Passing both raises a ValueError.

API Reference

`SemanticLayer`

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")                     # from directory
sl = SemanticLayer(models=[customers, orders])                # from model list
sl = SemanticLayer(model_path="./models", dialect="postgres") # custom dialect

Method	Description
`query(measures, dimensions, filters, order_by, limit)`	Generate a SQL query string
`reload(model_path=None, models=None)`	Hot-reload models from disk or a new list (useful in notebooks)
`generate_graph(output_file)`	Export an interactive entity graph as HTML

`SemanticLayer()` Constructor

Parameter	Type	Default	Description
`model_path`	`str`	—	Path to the directory containing model `.py` files
`models`	`list[Model]`	—	Explicit list of `Model` objects (alternative to `model_path`)
`dialect`	`str`	`"mysql"`	SQL dialect — any SQLGlot-supported dialect

`query()` Parameters

Parameter	Type	Required	Description
`measures`	`list[str]`	Yes	Metric names to aggregate
`dimensions`	`list[str]`	No	Dimension names to group by
`filters`	`list[dict]`	No	Filter conditions as `{field, operator, value}` dicts
`order_by`	`list[str]`	No	Sort columns (append `DESC` for descending)
`limit`	`int`	No	Maximum rows to return

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.5

Mar 13, 2026

0.1.4

Mar 7, 2026

0.1.3

Mar 5, 2026

0.1.2

Mar 5, 2026

0.1.1

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysemantic_layer-0.1.5.tar.gz (50.7 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pysemantic_layer-0.1.5-py3-none-any.whl (60.5 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file pysemantic_layer-0.1.5.tar.gz.

File metadata

Download URL: pysemantic_layer-0.1.5.tar.gz
Upload date: Mar 13, 2026
Size: 50.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.3 Linux/6.14.0-1017-azure

File hashes

Hashes for pysemantic_layer-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`9ce8700e3414437e6d0bbaf9639a620b8ad835d9f93f36bfeeba0cf27c8ff01e`
MD5	`626bdaa9b62da2eca716d22bd4107919`
BLAKE2b-256	`677575bd1525e333576a0b6373fc97caac1541aabb07a98b18dd401dc80b5129`

See more details on using hashes here.

File details

Details for the file pysemantic_layer-0.1.5-py3-none-any.whl.

File metadata

Download URL: pysemantic_layer-0.1.5-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 60.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.3 Linux/6.14.0-1017-azure

File hashes

Hashes for pysemantic_layer-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`912cc6c612cbb6f102349201390ebf988a211e31a744526e4ab46dbf0570f5e8`
MD5	`4fe66704f2ef63e0af42348e128311a2`
BLAKE2b-256	`5fdf4c6037c72d801de76aef0f5ac67036498077a1c3f7db33b61e5e79623706`

See more details on using hashes here.

pysemantic-layer 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PySemantic

What is PySemantic?

Multi-Fact Queries

Why PySemantic?

Installation

Quick Start

1. Define your models

2. Query

PySemantic Studio

CLI

pysemantic studio -- Launch the web UI

pysemantic query -- Generate SQL from the terminal

pysemantic graph -- Export entity graph

Core Concepts

Model

Entities & Join Resolution

Supported Aggregations

Filters

SQL Dialect

Architecture

Initializing the Semantic Layer

Option 1: Model Directory (recommended for projects)

Option 2: Explicit Model List (great for notebooks & tests)

Hot-Reload (for Jupyter / REPL)

API Reference

SemanticLayer

SemanticLayer() Constructor

query() Parameters

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`pysemantic studio` -- Launch the web UI

`pysemantic query` -- Generate SQL from the terminal

`pysemantic graph` -- Export entity graph

`SemanticLayer`

`SemanticLayer()` Constructor

`query()` Parameters