Skip to main content

Parameter-driven data engineering framework (Kimball + SCD2) with CLI, Polars/DuckDB by default, Spark optional.

Project description

TransmuteDB

⚗️ TransmuteDB is an open-source, parameter-driven data engineering framework for building Kimball-style dimensional models (including Type 2 SCDs) in a modern data lakehouse/warehouse.

It blends Laravel-style scaffolding for developer speed with a declarative, metadata-driven pipeline engine that runs on Python, Polars, or PySpark — making it suitable for everything from local dev on DuckDB to production-scale clusters.


🚀 What It Does

  • CLI-First — Manage and run pipelines from the terminal with one command.
  • Parameter-Driven — All orchestration logic comes from YAML + metadata tables — no hardcoded pipelines.
  • Kimball-Ready — Build facts, dimensions, and Type 2 SCD tables automatically from configs.
  • Data Quality First — Built-in null, uniqueness, schema, and type checks with quarantine flows.
  • Flexible Compute — Runs on Polars or PySpark.
  • Any Warehouse — Start with DuckDB or PostgreSQL; scale to Snowflake, Databricks, Synapse, or others.

🛠 Architecture

TransmuteDB projects are self-contained and follow this structure:

your_project/
  src/transmutedb/
    cli/              # Typer CLI commands
    core/             # Config models, logging, registry
    connectors/       # DuckDB, REST, SQL
    transforms/       # SCD2, bronze→silver→gold helpers
    templates/        # Jinja2 scaffolding templates
  pipelines/
    <domain>/
      pipeline.yaml   # Orchestration + schedules
      sources/        # Source system configs
      models/         # Bronze/Silver/Gold model definitions
      dq/             # Data quality rules
  profiles/           # Optional per-developer overrides
  tests/

📦 Example Features

  • Orchestration Engine

    • Reads from pipeline.yaml and metadata tables.
    • Handles parallel execution by file, entity, or notebook scope.
  • Dimension Builder (dim_build)

    • Automatically applies Type 2 SCD logic based on metadata.
  • Fact Builder (fact_build)

    • Joins to current dimensions and handles surrogate key creation.
  • Data Quality Engine

    • Supports uniqueness, null, min/max, schema, and data type checks.
    • Optional record-level quarantine with separate storage paths.
  • Dev Mode

    • Spin up pipelines without touching production configs or metadata.

🔧 Quickstart (Alpha Mode, No PyPi)

1. Install TransmuteDB

uv pip install -e .

2. Create a Project

uv run transmutedb init my_project
cd my_project

3. Scaffold Pipeline Components

# Add a new pipeline
uv run transmutedb scaffold pipeline aviation

# Add a dimension model
uv run transmutedb scaffold model silver.customer

# Add data quality rules
uv run transmutedb scaffold dq customer

4. Run the Orchestrator

uv run transmutedb run pipelines/aviation/pipeline.yaml

🧪 Testing

pytest

📍 Roadmap

  • Out-of-the-box Airflow DAG generation.
  • Built-in backfill support for SCD2 facts/dims.
  • Incremental load strategies per source.
  • Additional connectors (Snowflake, Synapse, REST APIs).
  • CLI-driven data quality dashboards.

🧠 Design Principles

  • Convention over configuration — sensible defaults.
  • Reproducible — same config runs locally or in prod.
  • Observable — rich logs and metadata capture.
  • Warehouse-agnostic — SQL templates adapt to your target.
  • Dev-friendly — zero to pipeline in minutes.

💬 Contributing

TransmuteDB is in active development and welcomes contributions.

  1. Fork the repo
  2. Branch from main
  3. Use Conventional Commits
  4. Include tests & docs for new features
  5. Open a PR

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmutedb-0.1.1a1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmutedb-0.1.1a1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file transmutedb-0.1.1a1.tar.gz.

File metadata

  • Download URL: transmutedb-0.1.1a1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for transmutedb-0.1.1a1.tar.gz
Algorithm Hash digest
SHA256 4f5893d0714e0c473eb1bccb59b241c52347c9fc4dc4dbc545fbb9997eef854b
MD5 98209e6b88163686d2327dc63bf35c54
BLAKE2b-256 5e624aa77ff23c2fa7510dc15457d54a77c9210dfdb49c5ac5b9d7ac76a8136d

See more details on using hashes here.

File details

Details for the file transmutedb-0.1.1a1-py3-none-any.whl.

File metadata

File hashes

Hashes for transmutedb-0.1.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 0796344786ed77a91d5ce049fe0213c33e2ff70ad438156ab73e7f64d9c4aaca
MD5 9af7696a101ace328b6722a6e2ec8e2a
BLAKE2b-256 a92c527602dbe154731c87b2aff1ef2c16594845f86b786d9df44726119150aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page