Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.
Project description
OpenMedallion
Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.
OpenMedallion is an opinionated open-source library for building Bronze → Silver → Gold data warehouse and lakehouse pipelines using dlt, Polars, and Hamilton — without depending on expensive enterprise platforms or proprietary tooling.
Why OpenMedallion?
Modern open-source data tools are individually excellent — but combining them into a production-ready medallion architecture is still fragmented.
You already have great tools for ingestion, transformation, loading, orchestration, and validation. But you still have to stitch everything together yourself — writing glue code, defining project structure, creating naming conventions, managing layer boundaries, and maintaining all of it over time.
OpenMedallion exists to reduce that friction.
| Without OpenMedallion | With OpenMedallion |
|---|---|
| Glue code per project | Convention-driven project layout |
| Ad-hoc layer boundaries | Enforced Bronze / Silver / Gold contracts |
| Inline transforms | Composable Python UDFs |
| Manual orchestration | Hamilton DAG — wired automatically |
| Cloud-only dev loop | Local Parquet first, S3 with one config change |
Quickstart
pip install openmedallion
medallion init my_project # scaffold: YAML configs + UDF stubs + kestra_flow.yml
medallion run my_project # Bronze → Silver → Gold in one command
medallion run my_project --layer silver # re-run a single layer
medallion dag # print the Hamilton DAG
medallion serve # launch the live pipeline tracker UI
Key Features
- Declarative YAML config — define pipeline layers without writing boilerplate
- Incremental loads — append and merge modes via dlt cursor columns and primary keys
- Composable UDFs — drop Python functions into
udf/silver/orudf/gold/; no new framework to learn - Live DAG tracker — Hamilton-powered web UI to visualise and monitor execution
- Local first — run the full pipeline against Parquet files with zero cloud credentials
- Cloud portable — swap
filesystemfor S3 in one line; logic stays unchanged - Source agnostic — any dlt source: SQL databases, REST APIs, filesystems, and more
- Fast by default — Polars for all transforms; no pandas bottlenecks
How It Works
OpenMedallion wires three best-in-class open-source tools under a unified declarative config:
YAML config
│
▼
Hamilton DAG ← orchestrates which layer runs and in what order
│
├── Bronze (dlt) ← ingests raw data from any source into Parquet
├── Silver (Polars) ← typed UDF transforms: rename, cast, filter, enrich
└── Gold (Polars) ← YAML-declared group-by aggregations + window metrics
| Layer | Tool | Role |
|---|---|---|
| 🟤 Bronze | dlt | Schema-inferred raw load from any source |
| ⚪ Silver | Polars | Typed, composable Python UDFs |
| 🟡 Gold | Polars | YAML-declared group-by metrics |
| 📤 Export | Polars | Parquet + CSV for BI tools |
| 🔗 Orchestration | Hamilton | DAG wiring with live web tracker |
Installation
pip install openmedallion
Optional extras:
pip install "openmedallion[s3]" # S3 support via s3fs + boto3
pip install "openmedallion[viz]" # DAG visualisation via graphviz
Requires Python 3.11+
Project Structure
medallion init my_project generates a complete, ready-to-run project:
my_project/
├── main.yaml # pipeline name + layer includes + paths
├── backend/
│ ├── bronze.yaml # source connection + incremental config
│ ├── silver.yaml # table transforms (rename, cast, filter, UDFs)
│ ├── gold.yaml # aggregations (group_by + metrics + window fns)
│ └── udf/
│ ├── silver/ # Python UDFs called from silver.yaml
│ └── gold/ # Python UDFs called from gold.yaml
├── frontend/ # dashboard files (Tableau, Power BI, etc.)
├── data/ # gitignored pipeline outputs
├── summary/ # analysis write-ups
├── kestra_flow.yml # Kestra orchestration flow — mount via docker-compose.yml
└── README.md # pre-filled project documentation template
Configuration
main.yaml — declare your layers and data paths:
pipeline:
name: customer_warehouse
includes:
bronze: bronze.yaml
silver: silver.yaml
gold: gold.yaml
paths:
bronze: "./data/bronze"
silver: "./data/silver"
gold: "./data/gold"
export: "./data/export"
silver.yaml — declarative transforms with optional UDFs:
bronze_to_silver:
tables:
- source_file: ORDERS.parquet
output_file: orders.parquet
transforms:
- type: rename
columns:
ORDER_ID: order_id
CUSTOMER_ID: customer_id
- type: cast
columns:
order_id: Int64
amount: Float64
- type: udf
file: udf/silver/enrich.py
function: flag_large_orders
args:
threshold: 500.0
gold.yaml — YAML-declared aggregations:
silver_to_gold:
projects:
- name: customer_warehouse
aggregations:
- source_file: orders.parquet
group_by: [customer_id]
metrics:
- {column: order_id, agg: count, alias: total_orders}
- {column: amount, agg: sum, alias: total_spent}
output_file: customer_summary.parquet
Python UDFs
Business logic stays in plain Python — no custom DSL, no magic.
# udf/silver/enrich.py
import polars as pl
def flag_large_orders(df: pl.DataFrame, threshold: float = 500.0) -> pl.DataFrame:
return df.with_columns(
(pl.col("amount") >= threshold).alias("is_large_order")
)
Drop the file next to your config, reference it in silver.yaml, done.
Incremental Loads
OpenMedallion supports dlt's native incremental strategies out of the box:
# bronze.yaml
source:
type: sql_database
dialect: sqlite
connection_string: "sqlite:///data/mydb.db"
tables:
- name: orders
incremental:
mode: append # cursor-based — only new rows
cursor_column: created_at
initial_value: "2024-01-01"
- name: customers
incremental:
mode: merge # upsert — handles updates + deletes
primary_key: customer_id
dlt tracks cursor state automatically. Re-running bronze only pulls the delta.
Scheduling with Kestra
medallion init generates a kestra_flow.yml inside every new project — a ready-to-use Kestra flow that orchestrates bronze → silver → gold with per-task observability and retry support.
1. Start a local Kestra server
# from the repo root — requires Docker
make kestra-up
# UI available at http://localhost:8080
2. Register a project flow
Add one volume mount to the kestra service in docker-compose.yml:
- ./my_project/kestra_flow.yml:/app/flows/my_project.yml
Kestra picks up the file automatically on the next make kestra-up — no copying needed.
3. Trigger a run
From the UI at http://localhost:8080, or via the API:
curl -X POST \
http://localhost:8080/api/v1/executions/openmedallion.projects/my_project
4. Enable scheduled refresh
Uncomment the triggers: block in kestra_flow.yml:
triggers:
- id: daily_refresh
type: io.kestra.plugin.core.trigger.Schedule
cron: "0 6 * * *" # every day at 06:00 UTC
Restart with make kestra-up and Kestra picks up the change immediately.
Kestra vs GitHub Actions
| Kestra | GitHub Actions | |
|---|---|---|
| Best for | Recurring pipeline runs, local/on-prem data | CI tests + PyPI publish on tag push |
| Scheduling | Cron + backfill | Cron only, no backfill |
| Observability | Per-task logs, run history, retry from failed task | Flat job log |
| Infrastructure | Self-hosted Docker | GitHub-managed runners |
Recommended split: GitHub Actions for CI + publish; Kestra for pipeline scheduling.
Examples
Three self-contained examples — no cloud credentials required. See examples/README.md for a side-by-side comparison.
| Example | Tables | What it demonstrates |
|---|---|---|
local_parquet_demo/ |
1 | Zero-credential quickstart: full Bronze → Silver → Gold with local Parquet files |
incremental_sql_demo/ |
2 | Incremental append + merge from SQLite; delta load simulation |
ecommerce_analytics_demo/ |
3 | Multi-table joins, margin analysis, and monthly trends — most complete example |
When to Use OpenMedallion
A great fit if you:
- Want a standard medallion project layout without inventing one from scratch
- Prefer YAML-first config with Python escape hatches for complex logic
- Need local-first development that can scale to S3 with minimal changes
- Want full ownership of your code and infrastructure
- Are building on a tight budget without enterprise platform procurement
Not a fit if you need:
- A full enterprise data platform (Databricks, Snowflake, BigQuery)
- A no-code or drag-and-drop ETL tool
- A universal framework for every possible pipeline architecture
Tradeoffs
| You get | You accept |
|---|---|
| Lower cost — fully open-source | More engineering responsibility than a managed platform |
| Full control over code and infrastructure | Initial setup and config learning curve |
| No vendor lock-in | You own the infrastructure decisions |
| Transparent, inspectable pipeline | Not a drag-and-drop tool |
Roadmap
| Item | Status |
|---|---|
| Bronze / Silver / Gold pipeline | ✅ 2026.4.1 |
| Hamilton DAG + live tracker | ✅ 2026.4.1 |
| Local Parquet + S3 storage | ✅ 2026.4.1 |
| Incremental append + merge | ✅ 2026.4.1 |
CLI scaffolding (medallion init) |
✅ 2026.4.1 |
| PyPI publish (OIDC trusted publishing) | ✅ 2026.4.1 |
| LazyFrame UDF contract | 🔜 2026.5 |
| Schema contract enforcement | 🔜 2026.6 |
| Lineage + metadata helpers | 🔜 2026.6 |
| Additional cloud destinations | 🔜 2026.6 |
Contributing
Contributions are welcome. Good areas to contribute:
- Bug fixes and edge-case handling
- Documentation improvements and example additions
- Tests and coverage
- New pipeline templates
- New source or destination adapters
- CLI enhancements
If you are interested in open-source data architecture, your help is appreciated.
License
MIT — free to use, modify, and distribute.
If OpenMedallion looks useful, consider starring the repo — it helps others find it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openmedallion-2026.5.1.tar.gz.
File metadata
- Download URL: openmedallion-2026.5.1.tar.gz
- Upload date:
- Size: 45.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0012afbc89004d1b9f768831bba9e4da9eb0b16465873f9b7040a573b26fd6cc
|
|
| MD5 |
f1eb59e2e67f9a9b136fd4511e2c089e
|
|
| BLAKE2b-256 |
5f082c810fe06f2d6f22a81bbde041c8f2af66c28ee3617fa2b605f0ed8d8333
|
File details
Details for the file openmedallion-2026.5.1-py3-none-any.whl.
File metadata
- Download URL: openmedallion-2026.5.1-py3-none-any.whl
- Upload date:
- Size: 45.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99eb85bc64421b2090ea77a75366914f8e7b18a356b4c1d5f67c1809a58b6771
|
|
| MD5 |
ed1afb9aa084868ca65806cbda25184c
|
|
| BLAKE2b-256 |
3d83c21c575f119e422e39cd7f7a172565aa3b16fc07b15f6db767c6f4f2f0ba
|