Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.

These details have not been verified by PyPI

Project links

Project description

OpenMedallion

Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.

OpenMedallion is an opinionated open-source library for building Bronze → Silver → Gold data warehouse and lakehouse pipelines using dlt, Polars, and Hamilton — without depending on expensive enterprise platforms or proprietary tooling.

Why OpenMedallion?

Modern open-source data tools are individually excellent — but combining them into a production-ready medallion architecture is still fragmented.

You already have great tools for ingestion, transformation, loading, orchestration, and validation. But you still have to stitch everything together yourself — writing glue code, defining project structure, creating naming conventions, managing layer boundaries, and maintaining all of it over time.

OpenMedallion exists to reduce that friction.

Without OpenMedallion	With OpenMedallion
Glue code per project	Convention-driven project layout
Ad-hoc layer boundaries	Enforced Bronze / Silver / Gold contracts
Inline transforms	Composable Python UDFs
Manual orchestration	Hamilton DAG — wired automatically
Cloud-only dev loop	Local Parquet first, S3 with one config change

Quickstart

pip install openmedallion

medallion init my_project       # scaffold: YAML configs + UDF stubs + kestra_flow.yml
medallion run my_project        # Bronze → Silver → Gold in one command
medallion run my_project --layer silver   # re-run a single layer
medallion dag                   # print the Hamilton DAG
medallion serve                 # launch the live pipeline tracker UI

Key Features

Declarative YAML config — define pipeline layers without writing boilerplate
Incremental loads — append and merge modes via dlt cursor columns and primary keys
Composable UDFs — drop Python functions into udf/silver/ or udf/gold/; no new framework to learn
Live DAG tracker — Hamilton-powered web UI to visualise and monitor execution
Local first — run the full pipeline against Parquet files with zero cloud credentials
Cloud portable — swap filesystem for S3 in one line; logic stays unchanged
Source agnostic — any dlt source: SQL databases, REST APIs, filesystems, and more
Fast by default — Polars for all transforms; no pandas bottlenecks

How It Works

OpenMedallion wires three best-in-class open-source tools under a unified declarative config:

YAML config
    │
    ▼
Hamilton DAG           ← orchestrates which layer runs and in what order
    │
    ├── Bronze  (dlt)     ← ingests raw data from any source into Parquet
    ├── Silver  (Polars)  ← typed UDF transforms: rename, cast, filter, enrich
    └── Gold    (Polars)  ← YAML-declared group-by aggregations + window metrics

Layer	Tool	Role
🟤 Bronze	dlt	Schema-inferred raw load from any source
⚪ Silver	Polars	Typed, composable Python UDFs
🟡 Gold	Polars	YAML-declared group-by metrics
📤 Export	Polars	Parquet + CSV for BI tools
🔗 Orchestration	Hamilton	DAG wiring with live web tracker

Installation

pip install openmedallion

Optional extras:

pip install "openmedallion[s3]"    # S3 support via s3fs + boto3
pip install "openmedallion[viz]"   # DAG visualisation via graphviz

Requires Python 3.11+

Project Structure

medallion init my_project generates a complete, ready-to-run project:

my_project/
├── main.yaml                    # pipeline name + layer includes + paths
├── backend/
│   ├── bronze.yaml              # source connection + incremental config
│   ├── silver.yaml              # table transforms (rename, cast, filter, UDFs)
│   ├── gold.yaml                # aggregations (group_by + metrics + window fns)
│   └── udf/
│       ├── silver/              # Python UDFs called from silver.yaml
│       └── gold/                # Python UDFs called from gold.yaml
├── frontend/                    # dashboard files (Tableau, Power BI, etc.)
├── data/                        # gitignored pipeline outputs
├── summary/                     # analysis write-ups
├── kestra_flow.yml              # Kestra orchestration flow — mount via docker-compose.yml
└── README.md                    # pre-filled project documentation template

Configuration

main.yaml — declare your layers and data paths:

pipeline:
  name: customer_warehouse

includes:
  bronze: bronze.yaml
  silver: silver.yaml
  gold:   gold.yaml

paths:
  bronze: "./data/bronze"
  silver: "./data/silver"
  gold:   "./data/gold"
  export: "./data/export"

silver.yaml — declarative transforms with optional UDFs:

bronze_to_silver:
  tables:
    - source_file: ORDERS.parquet
      output_file: orders.parquet
      transforms:
        - type: rename
          columns:
            ORDER_ID:    order_id
            CUSTOMER_ID: customer_id
        - type: cast
          columns:
            order_id: Int64
            amount:   Float64
        - type: udf
          file: udf/silver/enrich.py
          function: flag_large_orders
          args:
            threshold: 500.0

gold.yaml — YAML-declared aggregations:

silver_to_gold:
  projects:
    - name: customer_warehouse
      aggregations:
        - source_file: orders.parquet
          group_by: [customer_id]
          metrics:
            - {column: order_id, agg: count, alias: total_orders}
            - {column: amount,   agg: sum,   alias: total_spent}
          output_file: customer_summary.parquet

Python UDFs

Business logic stays in plain Python — no custom DSL, no magic.

# udf/silver/enrich.py
import polars as pl

def flag_large_orders(df: pl.DataFrame, threshold: float = 500.0) -> pl.DataFrame:
    return df.with_columns(
        (pl.col("amount") >= threshold).alias("is_large_order")
    )

Drop the file next to your config, reference it in silver.yaml, done.

Incremental Loads

OpenMedallion supports dlt's native incremental strategies out of the box:

# bronze.yaml
source:
  type: sql_database
  dialect: sqlite
  connection_string: "sqlite:///data/mydb.db"
  tables:
    - name: orders
      incremental:
        mode: append          # cursor-based — only new rows
        cursor_column: created_at
        initial_value: "2024-01-01"
    - name: customers
      incremental:
        mode: merge           # upsert — handles updates + deletes
        primary_key: customer_id

dlt tracks cursor state automatically. Re-running bronze only pulls the delta.

Scheduling with Kestra

medallion init generates a kestra_flow.yml inside every new project — a ready-to-use Kestra flow that orchestrates bronze → silver → gold with per-task observability and retry support.

1. Start a local Kestra server

# from the repo root — requires Docker
make kestra-up
# UI available at http://localhost:8080

2. Register a project flow

Add one volume mount to the kestra service in docker-compose.yml:

- ./my_project/kestra_flow.yml:/app/flows/my_project.yml

Kestra picks up the file automatically on the next make kestra-up — no copying needed.

3. Trigger a run

From the UI at http://localhost:8080, or via the API:

curl -X POST \
  http://localhost:8080/api/v1/executions/openmedallion.projects/my_project

4. Enable scheduled refresh

Uncomment the triggers: block in kestra_flow.yml:

triggers:
  - id: daily_refresh
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "0 6 * * *"   # every day at 06:00 UTC

Restart with make kestra-up and Kestra picks up the change immediately.

Kestra vs GitHub Actions

	Kestra	GitHub Actions
Best for	Recurring pipeline runs, local/on-prem data	CI tests + PyPI publish on tag push
Scheduling	Cron + backfill	Cron only, no backfill
Observability	Per-task logs, run history, retry from failed task	Flat job log
Infrastructure	Self-hosted Docker	GitHub-managed runners

Recommended split: GitHub Actions for CI + publish; Kestra for pipeline scheduling.

Examples

Three self-contained examples — no cloud credentials required. See examples/README.md for a side-by-side comparison.

Example	Tables	What it demonstrates
`local_parquet_demo/`	1	Zero-credential quickstart: full Bronze → Silver → Gold with local Parquet files
`incremental_sql_demo/`	2	Incremental append + merge from SQLite; delta load simulation
`ecommerce_analytics_demo/`	3	Multi-table joins, margin analysis, and monthly trends — most complete example

When to Use OpenMedallion

A great fit if you:

Want a standard medallion project layout without inventing one from scratch
Prefer YAML-first config with Python escape hatches for complex logic
Need local-first development that can scale to S3 with minimal changes
Want full ownership of your code and infrastructure
Are building on a tight budget without enterprise platform procurement

Not a fit if you need:

A full enterprise data platform (Databricks, Snowflake, BigQuery)
A no-code or drag-and-drop ETL tool
A universal framework for every possible pipeline architecture

Tradeoffs

You get	You accept
Lower cost — fully open-source	More engineering responsibility than a managed platform
Full control over code and infrastructure	Initial setup and config learning curve
No vendor lock-in	You own the infrastructure decisions
Transparent, inspectable pipeline	Not a drag-and-drop tool

Roadmap

Item	Status
Bronze / Silver / Gold pipeline	✅ 2026.4.1
Hamilton DAG + live tracker	✅ 2026.4.1
Local Parquet + S3 storage	✅ 2026.4.1
Incremental append + merge	✅ 2026.4.1
CLI scaffolding (`medallion init`)	✅ 2026.4.1
PyPI publish (OIDC trusted publishing)	✅ 2026.4.1
LazyFrame UDF contract	🔜 2026.5
Schema contract enforcement	🔜 2026.6
Lineage + metadata helpers	🔜 2026.6
Additional cloud destinations	🔜 2026.6

Contributing

Contributions are welcome. Good areas to contribute:

Bug fixes and edge-case handling
Documentation improvements and example additions
Tests and coverage
New pipeline templates
New source or destination adapters
CLI enhancements

If you are interested in open-source data architecture, your help is appreciated.

License

MIT — free to use, modify, and distribute.

If OpenMedallion looks useful, consider starring the repo — it helps others find it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2026.5.1

May 6, 2026

2026.4.1

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmedallion-2026.5.1.tar.gz (45.6 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openmedallion-2026.5.1-py3-none-any.whl (45.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file openmedallion-2026.5.1.tar.gz.

File metadata

Download URL: openmedallion-2026.5.1.tar.gz
Upload date: May 6, 2026
Size: 45.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openmedallion-2026.5.1.tar.gz
Algorithm	Hash digest
SHA256	`0012afbc89004d1b9f768831bba9e4da9eb0b16465873f9b7040a573b26fd6cc`
MD5	`f1eb59e2e67f9a9b136fd4511e2c089e`
BLAKE2b-256	`5f082c810fe06f2d6f22a81bbde041c8f2af66c28ee3617fa2b605f0ed8d8333`

See more details on using hashes here.

File details

Details for the file openmedallion-2026.5.1-py3-none-any.whl.

File metadata

Download URL: openmedallion-2026.5.1-py3-none-any.whl
Upload date: May 6, 2026
Size: 45.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openmedallion-2026.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`99eb85bc64421b2090ea77a75366914f8e7b18a356b4c1d5f67c1809a58b6771`
MD5	`ed1afb9aa084868ca65806cbda25184c`
BLAKE2b-256	`3d83c21c575f119e422e39cd7f7a172565aa3b16fc07b15f6db767c6f4f2f0ba`

See more details on using hashes here.

openmedallion 2026.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenMedallion

Why OpenMedallion?

Quickstart

Key Features

How It Works

Installation

Project Structure

Configuration

Python UDFs

Incremental Loads

Scheduling with Kestra

1. Start a local Kestra server

2. Register a project flow

3. Trigger a run

4. Enable scheduled refresh

Kestra vs GitHub Actions

Examples

When to Use OpenMedallion

Tradeoffs

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes