Python Open-source semantic layer for Data engineers
Project description
PySemantic
A lightweight, graph-based Semantic Layer for Python and SQL.
Define metrics once. Generate SQL everywhere.
What is PySemantic?
PySemantic lets you define your data models as Python objects -- dimensions, measures, and entity relationships -- and generates correct, optimized SQL from simple metric queries. No more hand-writing joins, no more duplicated business logic scattered across dashboards and notebooks.
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
sql = sl.query(
measures=["total_order_price"],
dimensions=["customer_city"],
filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
order_by=["total_order_price DESC"],
limit=10,
)
SELECT
customers.customer_city AS customer_city,
SUM(order_items.price) AS total_order_price
FROM order_items
LEFT JOIN orders ON order_items.order_id = orders.order_id
LEFT JOIN customers ON orders.customer_id = customers.customer_id
WHERE customers.customer_state IN ('SP', 'RJ')
GROUP BY 1
ORDER BY total_order_price DESC
LIMIT 10
Joins, table references, WHERE vs HAVING -- all resolved automatically from your model definitions.
Why PySemantic?
| Feature | Description | |
|---|---|---|
| 1 | Single source of truth | Define a metric once, use it everywhere |
| 2 | Automatic join resolution | Declare entity relationships; PySemantic finds the path |
| 3 | SQL injection safe | Structured filters with operator whitelisting and value escaping |
| 4 | Dialect support | MySQL, Postgres, and more via SQLGlot |
| 5 | Zero infrastructure | Pure Python, no server required at definition time |
| 6 | Interactive Studio | Explore your models, graph, and test queries in the browser |
| 7 | CLI powered by Typer | Generate SQL and launch the Studio from the terminal |
Installation
pip install pysemantic-layer
Or with Poetry:
poetry add pysemantic-layer
Requires Python 3.11+
Quick Start
1. Define your models
Create a directory (e.g. models/) with one Python file per table:
# models/orders.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
orders = Model(
name="orders",
table="orders",
primary_key="order_id",
dimensions=[
Dimension(name="order_status", column="order_status", dtype="string"),
],
measures=[
Measure(name="total_orders", agg="count", column="order_id"),
Measure(name="unique_customers", agg="distinct_count", column="customer_id"),
],
entities=[
Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
],
)
# models/customers.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType
model = Model(
name="customers",
table="customers",
primary_key="customer_id",
dimensions=[
Dimension(name="customer_city", column="customer_city", dtype="string"),
Dimension(name="customer_state", column="customer_state", dtype="string"),
],
measures=[
Measure(name="total_customers", agg="count", column="customer_id"),
],
entities=[
Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
],
)
PySemantic automatically discovers the join path: orders.customer_id (FOREIGN) links to customers.customer_id (PRIMARY) through the shared entity name customer.
2. Query
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
sql = sl.query(
measures=["total_orders"],
dimensions=["customer_city"],
filters=[{"field": "customer_state", "operator": "=", "value": "SP"}],
order_by=["total_orders DESC"],
limit=10,
)
print(sql)
Generated SQL:
SELECT
customers.customer_city AS customer_city,
COUNT(orders.order_id) AS total_orders
FROM orders
LEFT JOIN customers
ON orders.customer_id = customers.customer_id
WHERE
customers.customer_state = 'SP'
GROUP BY 1
ORDER BY total_orders DESC
LIMIT 10
PySemantic Studio
PySemantic ships with a built-in interactive web UI powered by Streamlit.
pysemantic studio ./models
The Studio has three tabs:
| Tab | What it does |
|---|---|
| Entity Graph | Interactive visualization of your model relationships. Click nodes to isolate, fullscreen mode, drag & zoom. |
| Data Dictionary | Browse all registered models with their measures, dimensions, entities, and column mappings. |
| Query Playground | Pick measures & dimensions from dropdowns, add filters, click "Generate SQL" and see the output. Invalid combos show your error protections in action. |
# Custom port and light theme
pysemantic studio ./models --port 8080 --light
CLI
PySemantic includes a full command-line interface powered by Typer.
pysemantic --help
pysemantic studio -- Launch the web UI
pysemantic studio ./models
pysemantic studio ./models --port 8080 --light
pysemantic query -- Generate SQL from the terminal
# Single measure
pysemantic query ./models -m total_orders -d customer_city --limit 10
# Multiple measures (comma-separated or repeated)
pysemantic query ./models -m "total_orders,unique_customers" -d order_status
# With filters
pysemantic query ./models \
-m total_order_price \
-d customer_state \
-f "customer_state IN ('SP', 'RJ')" \
-f "total_order_price > 100" \
--order-by "total_order_price DESC" \
--limit 5
Filter syntax: "field OPERATOR value" -- supports =, !=, >, <, >=, <=, IN, NOT IN, LIKE, IS, IS NOT.
pysemantic graph -- Export entity graph
pysemantic graph ./models -o my_graph.html
Generates a standalone interactive HTML file with your entity graph.
Core Concepts
Model
A Model maps to a database table and defines its semantic metadata:
| Component | Purpose | Example |
|---|---|---|
| Dimensions | Columns to group or filter by | customer_city, order_status |
| Measures | Aggregated metrics | SUM(price), COUNT(DISTINCT id) |
| Entities | Relationship keys (PRIMARY / FOREIGN) | order_id, customer_id |
Entities & Join Resolution
Entities define how models connect. A PRIMARY entity declares ownership of a concept; a FOREIGN entity references it:
order_items ──(FK: order)──> orders ──(FK: customer)──> customers
|
|──(FK: seller)──> sellers
|──(FK: product)──> products
When you query a measure from order_items with a dimension from customers, PySemantic automatically traverses the graph and generates the required LEFT JOIN chain.
Supported Aggregations
agg value |
SQL output |
|---|---|
sum |
SUM(column) |
count |
COUNT(column) |
avg |
AVG(column) |
min |
MIN(column) |
max |
MAX(column) |
distinct_count |
COUNT(DISTINCT column) |
Filters
Filters are passed as dictionaries:
{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}
{"field": "total_order_price", "operator": ">", "value": "1000"}
{"field": "order_status", "operator": "IS", "value": None}
Dimension filters go to WHERE; measure filters go to HAVING -- automatically.
Supported operators: =, !=, >, <, >=, <=, IN, NOT IN, LIKE, ILIKE, IS, IS NOT
Architecture
User Query (measures, dimensions, filters)
|
v
+----------+ +----------+ +----------+
| AST |---->| Planner |---->|Generator |---> SQL string
| (Parser) | |(Resolver)| |(Compiler)|
+----------+ +----------+ +----------+
|
+-----+-----+
| Registry |
| + Entity |
| Graph |
+-----------+
| Layer | Responsibility |
|---|---|
| AST | Parses raw input into a structured, validated syntax tree |
| Registry | Loads model files, validates them, builds the entity graph |
| Planner | Resolves measures/dimensions to models, calculates join paths |
| Generator | Translates the logical plan into dialect-specific SQL |
API Reference
SemanticLayer
from pysemantic.client import SemanticLayer
sl = SemanticLayer(model_path="./models")
| Method | Description |
|---|---|
query(measures, dimensions, filters, order_by, limit) |
Generate a SQL query string |
reload(model_path=None) |
Hot-reload models from disk (useful in notebooks) |
generate_graph(output_file) |
Export an interactive entity graph as HTML |
query() Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
measures |
list[str] |
Yes | Metric names to aggregate |
dimensions |
list[str] |
No | Dimension names to group by |
filters |
list[dict] |
No | Filter conditions as {field, operator, value} dicts |
order_by |
list[str] |
No | Sort columns (append DESC for descending) |
limit |
int |
No | Maximum rows to return |
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysemantic_layer-0.1.2.tar.gz.
File metadata
- Download URL: pysemantic_layer-0.1.2.tar.gz
- Upload date:
- Size: 46.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ba476fce8eb4f0726e49f186db987bf9e52338e9637215f28db2d04b6fb99d1
|
|
| MD5 |
6d2757a38512cc419e69c9c88fe63455
|
|
| BLAKE2b-256 |
c45ca3fac73b71a44d176588fd85aa0a92505095576959590e832b844a87a108
|
File details
Details for the file pysemantic_layer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pysemantic_layer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 58.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8bdd9efb136a1c04d428ae337747943aca489a837159c4fa8d41ab98bc0dd26
|
|
| MD5 |
6476f7c0f3b5e4e002d10f641700cb1a
|
|
| BLAKE2b-256 |
a9114aadba4511f4a03121440fae42364be2e2ec749c34fbaaa7693519a42091
|