Skip to main content

Python Open-source semantic layer for Data engineers

Project description

PySemantic

A lightweight, graph-based Semantic Layer for Python and SQL.

Define metrics once. Generate SQL everywhere.

Python 3.11+ License: MIT Poetry Typer CLI


Your browser does not support the video tag. Watch the demo here.


What is PySemantic?

PySemantic lets you define your data models as Python objects -- dimensions, measures, and entity relationships -- and generates correct, optimized SQL from simple metric queries. No more hand-writing joins, no more duplicated business logic scattered across dashboards and notebooks.

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")

sql = sl.query(
    measures=["total_order_price", "total_number_of_distinct_orders"],
    dimensions=["customer_state", "customer_city"],
    filters=[{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}],
    order_by=["total_order_price DESC"],
    limit=10,
)
SELECT
  customers.customer_city AS customer_city,
  customers.customer_state AS customer_state,
  SUM(order_items.price) AS total_order_price,
  COUNT(DISTINCT order_items.order_id) AS total_number_of_distinct_orders
FROM order_items
LEFT JOIN orders
  ON order_items.order_id = orders.order_id
LEFT JOIN customers
  ON orders.customer_id = customers.customer_id
WHERE
  customers.customer_state IN ('SP', 'RJ')
GROUP BY
  1,
  2
ORDER BY
  total_order_price DESC
LIMIT 10

CLI Magic:

pysemantic query

Joins, table references, WHERE vs HAVING -- all resolved automatically from your model definitions.


Why PySemantic?

Feature Description
1 Single source of truth Define a metric once, use it everywhere
2 Automatic join resolution Declare entity relationships; PySemantic finds the path
3 SQL injection safe Structured filters with operator whitelisting and value escaping
4 Dialect support MySQL, Postgres, and more via SQLGlot
5 Zero infrastructure Pure Python, no server required at definition time
6 Interactive Studio Explore your models, graph, and test queries in the browser
7 CLI powered by Typer Generate SQL and launch the Studio from the terminal

Installation

pip install pysemantic-layer

Or with Poetry:

poetry add pysemantic-layer

Requires Python 3.11+


Quick Start

1. Define your models

Create a directory (e.g. models/) with one Python file per table:

# models/orders.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType

orders = Model(
    name="orders",
    table="orders",
    primary_key="order_id",
    dimensions=[
        Dimension(name="order_status", column="order_status", dtype="string"),
    ],
    measures=[
        Measure(name="total_orders", agg="count", column="order_id"),
        Measure(name="unique_customers", agg="distinct_count", column="customer_id"),
    ],
    entities=[
        Entity(name="order", entity_type=EntityType.PRIMARY, column="order_id"),
        Entity(name="customer", entity_type=EntityType.FOREIGN, column="customer_id"),
    ],
)
# models/customers.py
from pysemantic.modeling import Model, Dimension, Measure, Entity, EntityType

model = Model(
    name="customers",
    table="customers",
    primary_key="customer_id",
    dimensions=[
        Dimension(name="customer_city", column="customer_city", dtype="string"),
        Dimension(name="customer_state", column="customer_state", dtype="string"),
    ],
    measures=[
        Measure(name="total_customers", agg="count", column="customer_id"),
    ],
    entities=[
        Entity(name="customer", entity_type=EntityType.PRIMARY, column="customer_id"),
    ],
)

PySemantic automatically discovers the join path: orders.customer_id (FOREIGN) links to customers.customer_id (PRIMARY) through the shared entity name customer.

2. Query

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")

sql = sl.query(
    measures=["total_orders"],
    dimensions=["customer_city"],
    filters=[{"field": "customer_state", "operator": "=", "value": "SP"}],
    order_by=["total_orders DESC"],
    limit=10,
)
print(sql)

Generated SQL:

SELECT
  customers.customer_city AS customer_city,
  COUNT(orders.order_id) AS total_orders
FROM orders
LEFT JOIN customers
  ON orders.customer_id = customers.customer_id
WHERE
  customers.customer_state = 'SP'
GROUP BY 1
ORDER BY total_orders DESC
LIMIT 10

PySemantic Studio

PySemantic ships with a built-in interactive web UI powered by Streamlit.

pysemantic studio ./models

pysemantic studio

The Studio has three tabs:

Tab What it does
Entity Graph Interactive visualization of your model relationships. Click nodes to isolate, fullscreen mode, drag & zoom.
Data Dictionary Browse all registered models with their measures, dimensions, entities, and column mappings.
Query Playground Pick measures & dimensions from dropdowns, add filters, click "Generate SQL" and see the output. Invalid combos show your error protections in action.
# Custom port and light theme
pysemantic studio ./models --port 8080 --light

CLI

PySemantic includes a full command-line interface powered by Typer.

pysemantic --help

pysemantic studio -- Launch the web UI

pysemantic studio ./models
pysemantic studio ./models --port 8080 --light

pysemantic query -- Generate SQL from the terminal

# Single measure
pysemantic query ./models -m total_orders -d customer_city --limit 10

# Multiple measures (comma-separated or repeated)
pysemantic query ./models -m "total_orders,unique_customers" -d order_status

# With filters
pysemantic query ./models \
  -m total_order_price \
  -d customer_state \
  -f "customer_state IN ('SP', 'RJ')" \
  -f "total_order_price > 100" \
  --order-by "total_order_price DESC" \
  --limit 5

Filter syntax: "field OPERATOR value" -- supports =, !=, >, <, >=, <=, IN, NOT IN, LIKE, IS, IS NOT.

pysemantic graph -- Export entity graph

pysemantic graph ./models -o my_graph.html

pysemantic graph

Generates a standalone interactive HTML file with your entity graph.


Core Concepts

Model

A Model maps to a database table and defines its semantic metadata:

Component Purpose Example
Dimensions Columns to group or filter by customer_city, order_status
Measures Aggregated metrics SUM(price), COUNT(DISTINCT id)
Entities Relationship keys (PRIMARY / FOREIGN) order_id, customer_id

Entities & Join Resolution

Entities define how models connect. A PRIMARY entity declares ownership of a concept; a FOREIGN entity references it:

order_items  ──(FK: order)──>  orders  ──(FK: customer)──>  customers
     |
     |──(FK: seller)──>  sellers
     |──(FK: product)──>  products

When you query a measure from order_items with a dimension from customers, PySemantic automatically traverses the graph and generates the required LEFT JOIN chain.

Supported Aggregations

agg value SQL output
sum SUM(column)
count COUNT(column)
avg AVG(column)
min MIN(column)
max MAX(column)
distinct_count COUNT(DISTINCT column)

Filters

Filters are passed as dictionaries:

{"field": "customer_state", "operator": "IN", "value": "('SP', 'RJ')"}
{"field": "total_order_price", "operator": ">", "value": "1000"}
{"field": "order_status", "operator": "IS", "value": None}

Dimension filters go to WHERE; measure filters go to HAVING -- automatically.

Supported operators: =, !=, >, <, >=, <=, IN, NOT IN, LIKE, ILIKE, IS, IS NOT


Architecture

User Query (measures, dimensions, filters)
    |
    v
+----------+     +----------+     +----------+
|   AST    |---->| Planner  |---->|Generator |---> SQL string
| (Parser) |     |(Resolver)|     |(Compiler)|
+----------+     +----------+     +----------+
                      |
                +-----+-----+
                | Registry  |
                | + Entity  |
                |   Graph   |
                +-----------+
Layer Responsibility
AST Parses raw input into a structured, validated syntax tree
Registry Loads model files, validates them, builds the entity graph
Planner Resolves measures/dimensions to models, calculates join paths
Generator Translates the logical plan into dialect-specific SQL

API Reference

SemanticLayer

from pysemantic.client import SemanticLayer

sl = SemanticLayer(model_path="./models")
Method Description
query(measures, dimensions, filters, order_by, limit) Generate a SQL query string
reload(model_path=None) Hot-reload models from disk (useful in notebooks)
generate_graph(output_file) Export an interactive entity graph as HTML

query() Parameters

Parameter Type Required Description
measures list[str] Yes Metric names to aggregate
dimensions list[str] No Dimension names to group by
filters list[dict] No Filter conditions as {field, operator, value} dicts
order_by list[str] No Sort columns (append DESC for descending)
limit int No Maximum rows to return

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysemantic_layer-0.1.3.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysemantic_layer-0.1.3-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file pysemantic_layer-0.1.3.tar.gz.

File metadata

  • Download URL: pysemantic_layer-0.1.3.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0

File hashes

Hashes for pysemantic_layer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c8deb76015e8078927a61079b31d4513729383bfe374b4ba9a315b0d18956b3b
MD5 de658d399ffcd4404a40d9c241514b54
BLAKE2b-256 c24d101b5efb8b22c1350795a97b2d28d2ea63b24b6b11d36ff5843e3cc40269

See more details on using hashes here.

File details

Details for the file pysemantic_layer-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pysemantic_layer-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.11 Darwin/25.3.0

File hashes

Hashes for pysemantic_layer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0c885d81804bbd27f6c86942643bb873292d087f6ef293052c8207b911965e13
MD5 e12a1a1abcc4a6ffe39de3b7d8f2131d
BLAKE2b-256 7a08545f84078638eab51b248a9e9c277313475b7cca8911ef7732d87d6d89a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page