Parameter-driven data engineering framework (Kimball + SCD2) with CLI, Polars/DuckDB by default, Spark optional.
Project description
TransmuteDB
⚗️ TransmuteDB is an open-source, parameter-driven data engineering framework for building Kimball-style dimensional models (including Type 2 SCDs) in a modern data lakehouse/warehouse.
It blends Laravel-style scaffolding for developer speed with a declarative, metadata-driven pipeline engine that runs on Python, Polars, or PySpark — making it suitable for everything from local dev on DuckDB to production-scale clusters.
🚀 What It Does
- CLI-First — Manage and run pipelines from the terminal with one command.
- Parameter-Driven — All orchestration logic comes from YAML + metadata tables — no hardcoded pipelines.
- Kimball-Ready — Build facts, dimensions, and Type 2 SCD tables automatically from configs.
- Data Quality First — Built-in null, uniqueness, schema, and type checks with quarantine flows.
- Flexible Compute — Runs on Polars or PySpark.
- Any Warehouse — Start with DuckDB or PostgreSQL; scale to Snowflake, Databricks, Synapse, or others.
🛠 Architecture
TransmuteDB projects are self-contained and follow this structure:
your_project/
src/transmutedb/
cli/ # Typer CLI commands
core/ # Config models, logging, registry
connectors/ # DuckDB, REST, SQL
transforms/ # SCD2, bronze→silver→gold helpers
templates/ # Jinja2 scaffolding templates
pipelines/
<domain>/
pipeline.yaml # Orchestration + schedules
sources/ # Source system configs
models/ # Bronze/Silver/Gold model definitions
dq/ # Data quality rules
profiles/ # Optional per-developer overrides
tests/
📦 Example Features
-
Orchestration Engine
- Reads from
pipeline.yamland metadata tables. - Handles parallel execution by file, entity, or notebook scope.
- Reads from
-
Dimension Builder (
dim_build)- Automatically applies Type 2 SCD logic based on metadata.
-
Fact Builder (
fact_build)- Joins to current dimensions and handles surrogate key creation.
-
Data Quality Engine
- Supports uniqueness, null, min/max, schema, and data type checks.
- Optional record-level quarantine with separate storage paths.
-
Dev Mode
- Spin up pipelines without touching production configs or metadata.
🔧 Quickstart (Alpha Mode, No PyPi)
1. Install TransmuteDB
uv pip install -e .
2. Create a Project
uv run transmutedb init my_project
cd my_project
3. Scaffold Pipeline Components
# Add a new pipeline
uv run transmutedb scaffold pipeline aviation
# Add a dimension model
uv run transmutedb scaffold model silver.customer
# Add data quality rules
uv run transmutedb scaffold dq customer
4. Run the Orchestrator
uv run transmutedb run pipelines/aviation/pipeline.yaml
🧪 Testing
pytest
📍 Roadmap
- Out-of-the-box Airflow DAG generation.
- Built-in backfill support for SCD2 facts/dims.
- Incremental load strategies per source.
- Additional connectors (Snowflake, Synapse, REST APIs).
- CLI-driven data quality dashboards.
🧠 Design Principles
- ✅ Convention over configuration — sensible defaults.
- ✅ Reproducible — same config runs locally or in prod.
- ✅ Observable — rich logs and metadata capture.
- ✅ Warehouse-agnostic — SQL templates adapt to your target.
- ✅ Dev-friendly — zero to pipeline in minutes.
💬 Contributing
TransmuteDB is in active development and welcomes contributions.
- Fork the repo
- Branch from
main - Use Conventional Commits
- Include tests & docs for new features
- Open a PR
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transmutedb-0.1.1a1.tar.gz.
File metadata
- Download URL: transmutedb-0.1.1a1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f5893d0714e0c473eb1bccb59b241c52347c9fc4dc4dbc545fbb9997eef854b
|
|
| MD5 |
98209e6b88163686d2327dc63bf35c54
|
|
| BLAKE2b-256 |
5e624aa77ff23c2fa7510dc15457d54a77c9210dfdb49c5ac5b9d7ac76a8136d
|
File details
Details for the file transmutedb-0.1.1a1-py3-none-any.whl.
File metadata
- Download URL: transmutedb-0.1.1a1-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0796344786ed77a91d5ce049fe0213c33e2ff70ad438156ab73e7f64d9c4aaca
|
|
| MD5 |
9af7696a101ace328b6722a6e2ec8e2a
|
|
| BLAKE2b-256 |
a92c527602dbe154731c87b2aff1ef2c16594845f86b786d9df44726119150aa
|