A modern, dbt-inspired data transformation platform with ML integration

These details have not been verified by PyPI

Project links

Project description

TransformDash

Hybrid Data Transformation & Dashboard Platform

Run SQL transformations with dependency management and lineage tracking directly against PostgreSQL, without needing a data warehouse.

Version Python License

Features

Core Capabilities

Multi-Layer Architecture: Bronze → Silver → Gold medallion pattern
SQL & Python Models: SQL with Jinja templating and Python transformations
DAG Orchestration: Automatic dependency resolution and parallel execution
Interactive Web UI: Real-time lineage graphs and dashboards
PostgreSQL Support: Full support for transformations
Incremental Syntax: Write incremental models (full refresh for now, true incremental on roadmap)

Model Features

{{ source() }} and {{ ref() }} macros
{{ config() }} for model configuration
{% if is_incremental() %} syntax support (currently does full refreshes)
YAML-based source definitions
View and table materializations

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      TransformDash                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Raw Sources (PostgreSQL)                                   │
│         ↓                                                   │
│  Bronze Layer (stg_* models - Views)                       │
│    • Direct extraction from raw tables                      │
│    • Column aliasing and standardization                    │
│         ↓                                                   │
│  Silver Layer (int_* models - Tables)                      │
│    • Multi-table joins                                      │
│    • Business logic and calculations                        │
│    • Aggregations and window functions                      │
│         ↓                                                   │
│  Gold Layer (fct_*/dim_* models - Tables)                  │
│    • Analytics-ready fact and dimension tables              │
│    • Final business metrics                                 │
│         ↓                                                   │
│  Web Dashboard & API                                        │
│    • Interactive lineage visualization                      │
│    • Model catalog and documentation                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Fastest Way to Try It (Docker - Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Setup environment for Docker
cp .env.docker .env
# Generate JWT secret and add to .env
python -c 'import secrets; print("JWT_SECRET_KEY=" + secrets.token_urlsafe(32))' >> .env

# Start all services (PostgreSQL + TransformDash)
docker-compose up -d

# Wait for containers to start (about 10 seconds)
sleep 10

# Run database migrations
docker-compose exec web bash run_migrations.sh

# Load sample data
docker-compose exec web python load_sample_data.py

# Train example ML model
docker-compose exec web python ml/train_telco_churn.py

Then visit http://localhost:8000 (default login: admin / admin)

What you get:

PostgreSQL database with user authentication
Sample e-commerce dataset (24 tables, 100+ customers, 500+ orders)
Trained ML model (Telco Customer Churn with realistic metrics)
Interactive dashboards and chart builder
ML Models tab with prediction capabilities

Local Development Setup (Without Docker)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start PostgreSQL with Docker (just the database)
docker-compose up -d db

# Wait for database to start
sleep 5

# Run migrations
bash run_migrations.sh

# Load sample data
python load_sample_data.py

# Train ML model
PYTHONPATH=. python ml/train_telco_churn.py

# Start the application
python ui/app.py  # Visit http://localhost:8000

# Login: admin / admin

Prerequisites

Docker & Docker Compose (for Quick Start)
OR Python 3.9+ and PostgreSQL 15+ (for manual installation)
Git

Installation Options

Option 1: Docker Compose (Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))' > jwt_key.txt
export JWT_SECRET_KEY=$(cat jwt_key.txt)

# Start all services (includes PostgreSQL)
docker-compose up -d

# Access at http://localhost:8000

Option 2: Install via pip (From Source)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install all dependencies (including ML support)
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))'
# Add the key to .env as: JWT_SECRET_KEY=<generated-key>

# Run the web UI
python ui/app.py

Option 3: Docker Only

# Build and run with your own PostgreSQL
docker build -t transformdash:latest .
docker run -d -p 8000:8000 \
  -e TRANSFORMDASH_HOST=your-postgres-host \
  -e TRANSFORMDASH_PASSWORD=your-password \
  transformdash:latest

Option 4: Kubernetes (Production)

# See DEPLOYMENT.md for full instructions
kubectl apply -f k8s/

Option 5: Testing Kubernetes Configs Locally (Minikube)

# Start minikube
minikube start

# Build Docker image in minikube's Docker environment
eval $(minikube docker-env)
docker build -t transformdash:latest .

# Deploy to minikube
kubectl apply -f k8s/

# Access the application
minikube service transformdash-service -n transformdash

For detailed deployment instructions, see DEPLOYMENT.md

Configuration

Set up database credentials:

cp .env.example .env
# Edit .env with your database credentials

Example .env:

TRANSFORMDASH_HOST=localhost
TRANSFORMDASH_PORT=5432
TRANSFORMDASH_DB=transformdash
TRANSFORMDASH_USER=postgres
TRANSFORMDASH_PASSWORD=your_password

APP_HOST=localhost
APP_PORT=5432
APP_DB=production
APP_USER=postgres
APP_PASSWORD=your_password

Initialize databases (if not using Docker Compose):

createdb transformdash
createdb production

Run Your First Transformation

Method 1: Web UI (Recommended)

# Start the web interface
python ui/app_refactored.py
# Visit http://localhost:8000

# Navigate to Models and click "▶️ Run Models"

Method 2: Create Charts and Dashboards

# Access the UI at http://localhost:8000
# 1. Go to "Chart Builder" to create visualizations
# 2. Go to "Dashboards" to build interactive dashboards
# 3. Use filters and drill-downs for analysis

Method 3: Train ML Models

# Train an example model
PYTHONPATH=. python ml/examples/train_example_model.py

# View registered models
PYTHONPATH=. python ml/registry/model_registry.py

# Use models in SQL transformations (see ml/README.md)

Method 4: API Access

# View API documentation
open http://localhost:8000/docs

# Execute transformations via API
curl -X POST http://localhost:8000/api/models/execute

# Query data
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"table": "my_model", "limit": 100}'

📁 Project Structure

transformdash/
├── connectors/              # Database connectors
│   ├── redis.py            # Redis connector
│   └── (mongodb, etc.)
├── models/                  # SQL transformation models
│   ├── sources.yml         # Data source definitions
│   ├── bronze/             # Staging layer (stg_*)
│   │   ├── stg_customers.sql
│   │   └── stg_orders.sql
│   ├── silver/             # Intermediate layer (int_*)
│   │   └── int_customer_orders.sql
│   └── gold/               # Analytics layer (fct_*, dim_*)
│       └── fct_orders.sql
├── transformations/         # Core transformation engine
│   ├── model.py            # Transformation model class
│   ├── dag.py              # DAG builder and validator
│   └── model_loader.py     # SQL model loader
├── orchestration/           # Execution engine
│   └── engine.py           # DAG orchestrator
├── ui/                      # Web interface
│   └── app.py              # FastAPI application
├── tests/                   # Test suite
├── config.py               # Environment configuration
├── postgres.py             # PostgreSQL connector
├── requirements.txt         # Python dependencies
└── README.md               # This file

Creating Models

Bronze Layer (Staging)

File: models/bronze/stg_customers.sql

{{ config(materialized='view') }}

-- Bronze layer: Direct extraction with minimal transformation

with transformed_data as (
    select
        id as customer_id,
        email,
        name as customer_name,
        created_at
    from {{ source('raw', 'customers') }}
)

select * from transformed_data

Silver Layer (Intermediate)

File: models/silver/int_customer_orders.sql

{{ config(
    materialized='incremental',
    unique_key='order_id'
) }}

-- Silver layer: Join customers with orders

with transformed_data as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.email as customer_email,
        o.order_date,
        o.total_amount
    from {{ ref('stg_orders') }} o
    join {{ ref('stg_customers') }} c
        on o.customer_id = c.customer_id

    {% if is_incremental() %}
        -- Only process new orders
        where o.order_date > (select max(order_date) from {{ this }})
    {% endif %}
)

select * from transformed_data

Gold Layer (Analytics)

File: models/gold/fct_orders.sql

{{ config(materialized='table') }}

-- Gold layer: Final fact table

with transformed_data as (
    select
        order_id,
        customer_id,
        customer_name,
        order_date,
        total_amount,
        extract(year from order_date) as order_year,
        extract(month from order_date) as order_month
    from {{ ref('int_customer_orders') }}
)

select * from transformed_data

Web UI Features

Dashboard

Model Catalog: Browse all transformation models
Layer Statistics: Bronze/Silver/Gold model counts
Real-time Updates: Refresh models dynamically
Code Viewer: Click any model to see its SQL code
Run Transformations: One-click execution of entire DAG

Lineage Graph

Interactive Visualization: D3.js-powered lineage graphs
Dependency Tracking: See how models depend on each other
Color-Coded Layers: Bronze (🟫), Silver (⚪), Gold (🟡)

Execution

▶️ Run Button: Execute all transformations in DAG order
Status Tracking: See execution progress and results
Error Handling: Clear error messages if something fails
Metrics: Total time, successes, failures

API Endpoints

GET /: Interactive dashboard
GET /api/models: List all models with dependencies
GET /api/models/{name}/code: Get SQL code for a model
POST /api/execute: Run all transformations
GET /api/lineage: Get DAG structure
GET /api/health: Health check

Configuration

Environment Variables (`.env`)

# PostgreSQL
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=your_database
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password

# MongoDB (optional)
MONGO_URI=mongodb://localhost:27017
MONGO_DB=your_mongo_db

# Redis (optional)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

Sources Configuration (`models/sources.yml`)

version: 2

sources:
  - name: raw
    description: "Your raw data source"
    database: your_database
    schema: public
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - not_null
              - unique
          - name: email

Testing

# Run unit tests
pytest tests/

# Test database connection
python postgres.py

# Test model loader
python transformations/model_loader.py

# Run example pipeline
python run_transformations.py

Development

Adding a New Database Connector

Create connector class in connectors/:

class MyDatabaseConnector:
    def __init__(self, connection_string):
        self.conn = ...

    def query_to_dataframe(self, query):
        return pd.read_sql(query, self.conn)

Add to config.py:

MY_DB_URI = os.getenv('MY_DB_URI')

Use in transformations:

from connectors.mydatabase import MyDatabaseConnector

def my_transformation(context):
    with MyDatabaseConnector() as db:
        return db.query_to_dataframe("SELECT * FROM table")

Adding Custom Macros

Extend ModelLoader in transformations/model_loader.py:

def my_custom_macro(self, arg1, arg2):
    return f"processed_{arg1}_{arg2}"

# Register in render_sql method
env.globals['my_macro'] = self.my_custom_macro

Use Cases

Data Warehousing

Extract data from multiple sources
Transform with SQL for performance
Load into analytics-ready tables

Business Intelligence

Create conformed dimensions
Build fact tables for metrics
Serve dashboards and reports

Data Engineering

Orchestrate complex pipelines
Track data lineage
Incremental processing for efficiency

Analytics Engineering

SQL transformations with dependency management
Version-controlled SQL
Collaborative data modeling

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built after working with dbt, Airflow, Airbyte, Superset, Tableau, and building/running custom Rust transformations in Kubernetes via CronJobs - wanted a single tool that combines transformation, orchestration, and visualization
Built with FastAPI, Pandas, and D3.js
Follows the Medallion Architecture pattern

Support

Documentation: See GETTING_STARTED.md and DEPLOYMENT.md
Issues: GitHub Issues
Discussions: GitHub Discussions

Roadmap

Add Spark connector for big data
Add Netsuite connector
Add S3 connector
Building transformdash pip package
Implement data quality testing framework
Add CI/CD pipeline templates
Create VSCode extension
Real-time data streaming
Cloud deployment guides (AWS, GCP, Azure)
Metric computation layer
Row-level security

Star us on GitHub

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Jan 17, 2026

This version

1.0.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformdash-1.0.0.tar.gz (221.2 kB view details)

Uploaded Dec 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformdash-1.0.0-py3-none-any.whl (225.3 kB view details)

Uploaded Dec 12, 2025 Python 3

File details

Details for the file transformdash-1.0.0.tar.gz.

File metadata

Download URL: transformdash-1.0.0.tar.gz
Upload date: Dec 12, 2025
Size: 221.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`07384fbda487b906a705a4d15d2c5024d8e10e7d3352397cf84823b238f0e968`
MD5	`90779130bcfc0556fe5180bf0c9c12e0`
BLAKE2b-256	`619cf9993e5962247ab89e463613f5eec2be3ba190b4ea997acdd99d1ffa73f3`

See more details on using hashes here.

File details

Details for the file transformdash-1.0.0-py3-none-any.whl.

File metadata

Download URL: transformdash-1.0.0-py3-none-any.whl
Upload date: Dec 12, 2025
Size: 225.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3180a7a507c49d22bcd7395f46e3f503e74cd925c88259b8f60d56da8ad7a21`
MD5	`a582738a8e92c3fe3a0b97b2b8377edd`
BLAKE2b-256	`32a3f7b0bde1bfcb55d3731e544c2421a897527eb8ca252c4fc2879e2cc39e02`

See more details on using hashes here.

transformdash 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TransformDash

Features

Core Capabilities

Model Features

Architecture

Quick Start

Fastest Way to Try It (Docker - Recommended)

Local Development Setup (Without Docker)

Prerequisites

Installation Options

Configuration

Run Your First Transformation

📁 Project Structure

Creating Models

Bronze Layer (Staging)

Silver Layer (Intermediate)

Gold Layer (Analytics)

Web UI Features

Dashboard

Lineage Graph

Execution

API Endpoints

Configuration

Environment Variables (.env)

Sources Configuration (models/sources.yml)

Testing

Development

Adding a New Database Connector

Adding Custom Macros

Use Cases

Data Warehousing

Business Intelligence

Data Engineering

Analytics Engineering

Contributing

License

Acknowledgments

Support

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Environment Variables (`.env`)

Sources Configuration (`models/sources.yml`)