Python Open-source semantic layer for Data engineers
Project description
PySemantic
Python lightweight semantic layer for data engineers.
PySemantic lets you define your data models as Python objects -- dimensions, measures, and entity relationships -- and generates correct, optimized SQL from simple metric queries. No more hand-writing joins, no more duplicated business logic across dashboards.
Why PySemantic?
- Single source of truth -- Define a metric once, use it everywhere.
- Automatic join resolution -- Declare entity relationships; PySemantic finds the join path.
- SQL injection safe -- Structured filters with operator whitelisting and value escaping.
- Dialect support -- Generates SQL for MySQL, Postgres, and more via SQLGlot.
- Zero infrastructure -- Pure Python, no server, no database required at definition time.
- Interactive graph visualization -- Inspect your entity graph in the browser.
Installation
pip install pysemantic
Or with Poetry:
poetry add pysemantic
Requires Python 3.11+
Quick Start
1. Define your models
Create a directory (e.g. models/) with one Python file per table:
# models/orders.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
orders = Model(
name="orders",
table="orders",
primary_key="order_id",
dimensions=[
Dimension(name="order_status", column="order_status", dtype="string"),
],
measures=[
Measure(name="total_number_of_orders", agg="count", column="order_id"),
Measure(name="unique_customers", agg="distinct_count", column="customer_id"),
],
entities=[
Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
],
)
# models/customers.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
model = Model(
name="customers",
table="customers",
primary_key="customer_id",
dimensions=[
Dimension(name="customer_city", column="customer_city", dtype="string"),
Dimension(name="customer_state", column="customer_state", dtype="string"),
],
measures=[
Measure(name="total_customers", agg="count", column="customer_id"),
],
entities=[
Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
],
)
PySemantic automatically discovers the join path: orders.customer_id (FOREIGN) links to customers.customer_id (PRIMARY) through the shared entity name customer.
2. Query
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
sql = sl.query(
measures=["total_number_of_orders"],
dimensions=["customer_city"],
filters=[{"field": "customer_state", "operator": "=", "value": "SP"}],
order_by=["total_number_of_orders DESC"],
limit=10,
)
print(sql)
Generated SQL:
SELECT
customers.customer_city AS customer_city,
COUNT(orders.order_id) AS total_number_of_orders
FROM orders
LEFT JOIN customers
ON orders.customer_id = customers.customer_id
WHERE
customers.customer_state = 'SP'
GROUP BY
1
ORDER BY
total_number_of_orders DESC
LIMIT 10
3. Visualize the entity graph
sl.generate_graph(output_file="entity_graph.html")
Opens an interactive HTML graph showing all models, their relationships, and metadata.
Core Concepts
Model
A Model maps to a database table and defines its semantic metadata:
| Component | Purpose | Example |
|---|---|---|
| Dimensions | Columns to group or filter by | customer_city, order_status |
| Measures | Aggregated metrics | SUM(price), COUNT(DISTINCT id) |
| Entities | Relationship keys (PRIMARY / FOREIGN) | order_id, customer_id |
Entities & Join Resolution
Entities define how models connect. A PRIMARY entity declares ownership of a concept; a FOREIGN entity references it:
order_items ──(FK: order)──> orders ──(FK: customer)──> customers
│
├──(FK: seller)──> sellers
└──(FK: product)──> products
When you query a measure from order_items with a dimension from customers, PySemantic automatically traverses the graph and generates the required LEFT JOIN chain.
Supported Aggregations
agg value |
SQL output |
|---|---|
sum |
SUM(column) |
count |
COUNT(column) |
avg |
AVG(column) |
min |
MIN(column) |
max |
MAX(column) |
distinct_count |
COUNT(DISTINCT column) |
Filters
Filters can be passed as dictionaries or strings:
# Dictionary (recommended)
{"field": "customer_state", "operator": "IN", "value": ["SP", "RJ"]}
{"field": "total_order_price", "operator": ">", "value": 1000}
{"field": "order_status", "operator": "IS", "value": None}
# String (simple cases)
"order_status = 'delivered'"
Dimension filters go to WHERE; measure filters go to HAVING -- automatically.
Supported operators: =, !=, >, <, >=, <=, IN, NOT IN, LIKE, ILIKE, IS, IS NOT
Architecture
User Query (measures, dimensions, filters)
│
▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ AST │────>│ Planner │────>│Generator │───> SQL string
│ (Parser) │ │(Resolver)│ │(Compiler)│
└──────────┘ └──────────┘ └──────────┘
│
┌─────┴─────┐
│ Registry │
│ + Entity │
│ Graph │
└───────────┘
| Layer | Responsibility |
|---|---|
| AST | Parses raw input into a structured, validated syntax tree |
| Registry | Loads model files, validates them, builds the entity graph |
| Planner | Resolves measures/dimensions to models, calculates join paths |
| Generator | Translates the logical plan into dialect-specific SQL |
API Reference
SemanticLayer
sl = SemanticLayer(model_path="./models")
| Method | Description |
|---|---|
query(measures, dimensions, filters, order_by, limit) |
Generate a SQL query string |
reload(model_path=None) |
Hot-reload models from disk (useful in notebooks) |
generate_graph(output_file) |
Export an interactive entity graph as HTML |
query() Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
measures |
list[str] |
Yes | Metric names to aggregate |
dimensions |
list[str] |
No | Dimension names to group by |
filters |
list[dict | str] |
No | Filter conditions |
order_by |
list[str] |
No | Sort columns (append DESC for descending) |
limit |
int |
No | Maximum rows to return |
Development
git clone https://github.com/user/pysemantic.git
cd pysemantic
poetry install
Run tests:
poetry run pytest
Run linting:
poetry run pre-commit run --all-files
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysemantic_layer-0.1.1.tar.gz.
File metadata
- Download URL: pysemantic_layer-0.1.1.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42219d24a3c96b824a908b1d4731b146d4e8ccf81fc0972024d162e66af8fc5c
|
|
| MD5 |
8f5a754f36275f23191a51cc3ed2f0b5
|
|
| BLAKE2b-256 |
867ed9bccfaae814815694c4d6573773526b19cec6baccb40a0b17c0fc26b1eb
|
File details
Details for the file pysemantic_layer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pysemantic_layer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e2c219d28d17a0d8b9615832c202a153ecd8dc16336f1f35f7f46a7c39d4378
|
|
| MD5 |
dedc7612c52c7d03771334a6f366ee1e
|
|
| BLAKE2b-256 |
b951375a53611606e62498f18c627ba7dcc0594d34252df36da58585b91d3d45
|