Skip to main content

A modern, dbt-inspired data transformation platform with ML integration

Project description

TransformDash

Hybrid Data Transformation & Dashboard Platform

Run SQL transformations with dependency management and lineage tracking directly against PostgreSQL, without needing a data warehouse.

Version Python License


Features

Core Capabilities

  • Multi-Layer Architecture: Bronze → Silver → Gold medallion pattern
  • SQL & Python Models: SQL with Jinja templating and Python transformations
  • DAG Orchestration: Automatic dependency resolution and parallel execution
  • Interactive Web UI: Real-time lineage graphs and dashboards
  • PostgreSQL Support: Full support for transformations
  • Incremental Syntax: Write incremental models (full refresh for now, true incremental on roadmap)

Model Features

  • {{ source() }} and {{ ref() }} macros
  • {{ config() }} for model configuration
  • {% if is_incremental() %} syntax support (currently does full refreshes)
  • YAML-based source definitions
  • View and table materializations

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      TransformDash                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Raw Sources (PostgreSQL)                                   │
│         ↓                                                   │
│  Bronze Layer (stg_* models - Views)                       │
│    • Direct extraction from raw tables                      │
│    • Column aliasing and standardization                    │
│         ↓                                                   │
│  Silver Layer (int_* models - Tables)                      │
│    • Multi-table joins                                      │
│    • Business logic and calculations                        │
│    • Aggregations and window functions                      │
│         ↓                                                   │
│  Gold Layer (fct_*/dim_* models - Tables)                  │
│    • Analytics-ready fact and dimension tables              │
│    • Final business metrics                                 │
│         ↓                                                   │
│  Web Dashboard & API                                        │
│    • Interactive lineage visualization                      │
│    • Model catalog and documentation                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Fastest Way to Try It (Docker - Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Setup environment for Docker
cp .env.docker .env
# Generate JWT secret and add to .env
python -c 'import secrets; print("JWT_SECRET_KEY=" + secrets.token_urlsafe(32))' >> .env

# Start all services (PostgreSQL + TransformDash)
docker-compose up -d

# Wait for containers to start (about 10 seconds)
sleep 10

# Run database migrations
docker-compose exec web bash run_migrations.sh

# Load sample data
docker-compose exec web python load_sample_data.py

# Train example ML model
docker-compose exec web python ml/train_telco_churn.py

Then visit http://localhost:8000 (default login: admin / admin)

What you get:

  • PostgreSQL database with user authentication
  • Sample e-commerce dataset (24 tables, 100+ customers, 500+ orders)
  • Trained ML model (Telco Customer Churn with realistic metrics)
  • Interactive dashboards and chart builder
  • ML Models tab with prediction capabilities

Local Development Setup (Without Docker)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start PostgreSQL with Docker (just the database)
docker-compose up -d db

# Wait for database to start
sleep 5

# Run migrations
bash run_migrations.sh

# Load sample data
python load_sample_data.py

# Train ML model
PYTHONPATH=. python ml/train_telco_churn.py

# Start the application
python ui/app.py  # Visit http://localhost:8000

# Login: admin / admin

Prerequisites

  • Docker & Docker Compose (for Quick Start)
  • OR Python 3.9+ and PostgreSQL 15+ (for manual installation)
  • Git

Installation Options

Option 1: Docker Compose (Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))' > jwt_key.txt
export JWT_SECRET_KEY=$(cat jwt_key.txt)

# Start all services (includes PostgreSQL)
docker-compose up -d

# Access at http://localhost:8000

Option 2: Install via pip (From Source)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install all dependencies (including ML support)
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))'
# Add the key to .env as: JWT_SECRET_KEY=<generated-key>

# Run the web UI
python ui/app.py

Option 3: Docker Only

# Build and run with your own PostgreSQL
docker build -t transformdash:latest .
docker run -d -p 8000:8000 \
  -e TRANSFORMDASH_HOST=your-postgres-host \
  -e TRANSFORMDASH_PASSWORD=your-password \
  transformdash:latest

Option 4: Kubernetes (Production)

# See DEPLOYMENT.md for full instructions
kubectl apply -f k8s/

Option 5: Testing Kubernetes Configs Locally (Minikube)

# Start minikube
minikube start

# Build Docker image in minikube's Docker environment
eval $(minikube docker-env)
docker build -t transformdash:latest .

# Deploy to minikube
kubectl apply -f k8s/

# Access the application
minikube service transformdash-service -n transformdash

For detailed deployment instructions, see DEPLOYMENT.md

Configuration

  1. Set up database credentials:
cp .env.example .env
# Edit .env with your database credentials

Example .env:

TRANSFORMDASH_HOST=localhost
TRANSFORMDASH_PORT=5432
TRANSFORMDASH_DB=transformdash
TRANSFORMDASH_USER=postgres
TRANSFORMDASH_PASSWORD=your_password

APP_HOST=localhost
APP_PORT=5432
APP_DB=production
APP_USER=postgres
APP_PASSWORD=your_password
  1. Initialize databases (if not using Docker Compose):
createdb transformdash
createdb production

Run Your First Transformation

Method 1: Web UI (Recommended)

# Start the web interface
python ui/app_refactored.py
# Visit http://localhost:8000

# Navigate to Models and click "▶️ Run Models"

Method 2: Create Charts and Dashboards

# Access the UI at http://localhost:8000
# 1. Go to "Chart Builder" to create visualizations
# 2. Go to "Dashboards" to build interactive dashboards
# 3. Use filters and drill-downs for analysis

Method 3: Train ML Models

# Train an example model
PYTHONPATH=. python ml/examples/train_example_model.py

# View registered models
PYTHONPATH=. python ml/registry/model_registry.py

# Use models in SQL transformations (see ml/README.md)

Method 4: API Access

# View API documentation
open http://localhost:8000/docs

# Execute transformations via API
curl -X POST http://localhost:8000/api/models/execute

# Query data
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"table": "my_model", "limit": 100}'

📁 Project Structure

transformdash/
├── connectors/              # Database connectors
│   ├── redis.py            # Redis connector
│   └── (mongodb, etc.)
├── models/                  # SQL transformation models
│   ├── sources.yml         # Data source definitions
│   ├── bronze/             # Staging layer (stg_*)
│   │   ├── stg_customers.sql
│   │   └── stg_orders.sql
│   ├── silver/             # Intermediate layer (int_*)
│   │   └── int_customer_orders.sql
│   └── gold/               # Analytics layer (fct_*, dim_*)
│       └── fct_orders.sql
├── transformations/         # Core transformation engine
│   ├── model.py            # Transformation model class
│   ├── dag.py              # DAG builder and validator
│   └── model_loader.py     # SQL model loader
├── orchestration/           # Execution engine
│   └── engine.py           # DAG orchestrator
├── ui/                      # Web interface
│   └── app.py              # FastAPI application
├── tests/                   # Test suite
├── config.py               # Environment configuration
├── postgres.py             # PostgreSQL connector
├── requirements.txt         # Python dependencies
└── README.md               # This file

Creating Models

Bronze Layer (Staging)

File: models/bronze/stg_customers.sql

{{ config(materialized='view') }}

-- Bronze layer: Direct extraction with minimal transformation

with transformed_data as (
    select
        id as customer_id,
        email,
        name as customer_name,
        created_at
    from {{ source('raw', 'customers') }}
)

select * from transformed_data

Silver Layer (Intermediate)

File: models/silver/int_customer_orders.sql

{{ config(
    materialized='incremental',
    unique_key='order_id'
) }}

-- Silver layer: Join customers with orders

with transformed_data as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.email as customer_email,
        o.order_date,
        o.total_amount
    from {{ ref('stg_orders') }} o
    join {{ ref('stg_customers') }} c
        on o.customer_id = c.customer_id

    {% if is_incremental() %}
        -- Only process new orders
        where o.order_date > (select max(order_date) from {{ this }})
    {% endif %}
)

select * from transformed_data

Gold Layer (Analytics)

File: models/gold/fct_orders.sql

{{ config(materialized='table') }}

-- Gold layer: Final fact table

with transformed_data as (
    select
        order_id,
        customer_id,
        customer_name,
        order_date,
        total_amount,
        extract(year from order_date) as order_year,
        extract(month from order_date) as order_month
    from {{ ref('int_customer_orders') }}
)

select * from transformed_data

Web UI Features

Dashboard

  • Model Catalog: Browse all transformation models
  • Layer Statistics: Bronze/Silver/Gold model counts
  • Real-time Updates: Refresh models dynamically
  • Code Viewer: Click any model to see its SQL code
  • Run Transformations: One-click execution of entire DAG

Lineage Graph

  • Interactive Visualization: D3.js-powered lineage graphs
  • Dependency Tracking: See how models depend on each other
  • Color-Coded Layers: Bronze (🟫), Silver (⚪), Gold (🟡)

Execution

  • ▶️ Run Button: Execute all transformations in DAG order
  • Status Tracking: See execution progress and results
  • Error Handling: Clear error messages if something fails
  • Metrics: Total time, successes, failures

API Endpoints

  • GET /: Interactive dashboard
  • GET /api/models: List all models with dependencies
  • GET /api/models/{name}/code: Get SQL code for a model
  • POST /api/execute: Run all transformations
  • GET /api/lineage: Get DAG structure
  • GET /api/health: Health check

Configuration

Environment Variables (.env)

# PostgreSQL
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=your_database
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password

# MongoDB (optional)
MONGO_URI=mongodb://localhost:27017
MONGO_DB=your_mongo_db

# Redis (optional)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

Sources Configuration (models/sources.yml)

version: 2

sources:
  - name: raw
    description: "Your raw data source"
    database: your_database
    schema: public
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - not_null
              - unique
          - name: email

Testing

# Run unit tests
pytest tests/

# Test database connection
python postgres.py

# Test model loader
python transformations/model_loader.py

# Run example pipeline
python run_transformations.py

Development

Adding a New Database Connector

  1. Create connector class in connectors/:
class MyDatabaseConnector:
    def __init__(self, connection_string):
        self.conn = ...

    def query_to_dataframe(self, query):
        return pd.read_sql(query, self.conn)
  1. Add to config.py:
MY_DB_URI = os.getenv('MY_DB_URI')
  1. Use in transformations:
from connectors.mydatabase import MyDatabaseConnector

def my_transformation(context):
    with MyDatabaseConnector() as db:
        return db.query_to_dataframe("SELECT * FROM table")

Adding Custom Macros

Extend ModelLoader in transformations/model_loader.py:

def my_custom_macro(self, arg1, arg2):
    return f"processed_{arg1}_{arg2}"

# Register in render_sql method
env.globals['my_macro'] = self.my_custom_macro

Use Cases

Data Warehousing

  • Extract data from multiple sources
  • Transform with SQL for performance
  • Load into analytics-ready tables

Business Intelligence

  • Create conformed dimensions
  • Build fact tables for metrics
  • Serve dashboards and reports

Data Engineering

  • Orchestrate complex pipelines
  • Track data lineage
  • Incremental processing for efficiency

Analytics Engineering

  • SQL transformations with dependency management
  • Version-controlled SQL
  • Collaborative data modeling

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Built after working with dbt, Airflow, Airbyte, Superset, Tableau, and building/running custom Rust transformations in Kubernetes via CronJobs - wanted a single tool that combines transformation, orchestration, and visualization
  • Built with FastAPI, Pandas, and D3.js
  • Follows the Medallion Architecture pattern

Support


Roadmap

  • Add Spark connector for big data
  • Add Netsuite connector
  • Add S3 connector
  • Building transformdash pip package
  • Implement data quality testing framework
  • Add CI/CD pipeline templates
  • Create VSCode extension
  • Real-time data streaming
  • Cloud deployment guides (AWS, GCP, Azure)
  • Metric computation layer
  • Row-level security

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformdash-1.0.0.tar.gz (221.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformdash-1.0.0-py3-none-any.whl (225.3 kB view details)

Uploaded Python 3

File details

Details for the file transformdash-1.0.0.tar.gz.

File metadata

  • Download URL: transformdash-1.0.0.tar.gz
  • Upload date:
  • Size: 221.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.0.tar.gz
Algorithm Hash digest
SHA256 07384fbda487b906a705a4d15d2c5024d8e10e7d3352397cf84823b238f0e968
MD5 90779130bcfc0556fe5180bf0c9c12e0
BLAKE2b-256 619cf9993e5962247ab89e463613f5eec2be3ba190b4ea997acdd99d1ffa73f3

See more details on using hashes here.

File details

Details for the file transformdash-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: transformdash-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 225.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3180a7a507c49d22bcd7395f46e3f503e74cd925c88259b8f60d56da8ad7a21
MD5 a582738a8e92c3fe3a0b97b2b8377edd
BLAKE2b-256 32a3f7b0bde1bfcb55d3731e544c2421a897527eb8ca252c4fc2879e2cc39e02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page