A modern, dbt-inspired data transformation platform with ML integration

These details have not been verified by PyPI

Project links

Project description

TransformDash

Hybrid Data Transformation & Dashboard Platform

Run SQL transformations with dependency management and lineage tracking directly against PostgreSQL, without needing a data warehouse.

Version Python License

Try the Live Demo

Live Demo

Viewer: demo / demo (read-only on source data, can run transformations)
Admin: admin / admin (full access)

Experience TransformDash without installing anything:

Pre-loaded with 100 customers, 500 orders across 24 tables
Run transformations and see Bronze → Silver → Gold pipeline in action
Build interactive dashboards and explore data visualizations
Test ML model predictions on sample data

Demo Notes:

Shared demo environment - source data is protected (read-only)
You can run transformations and create dashboards
First load takes 30-60 seconds if the server is sleeping (free tier)
Demo data maintained automatically

Features

Core Capabilities

Multi-Layer Architecture: Bronze → Silver → Gold medallion pattern
SQL & Python Models: SQL with Jinja templating and Python transformations
DAG Orchestration: Automatic dependency resolution and parallel execution
Interactive Web UI: Real-time lineage graphs and dashboards
PostgreSQL Support: Full support for transformations
Incremental Syntax: Write incremental models (full refresh for now, true incremental on roadmap)

AI-Powered Search (Optional)

Semantic Search: Natural language queries to find models (e.g., "customer revenue models")
FAISS Vector Search: Fast similarity search using sentence embeddings
Smart Model Discovery: Search by meaning, not just keywords
Graceful Degradation: Optional feature - install dependencies only if needed
Installation: pip install -r dbt_assistant/requirements.txt
See dbt_assistant/README.md for details

Model Features

{{ source() }} and {{ ref() }} macros
{{ config() }} for model configuration
{% if is_incremental() %} syntax support (currently does full refreshes)
YAML-based source definitions
View and table materializations

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      TransformDash                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Raw Sources (PostgreSQL)                                   │
│         ↓                                                   │
│  Bronze Layer (stg_* models - Views)                       │
│    • Direct extraction from raw tables                      │
│    • Column aliasing and standardization                    │
│         ↓                                                   │
│  Silver Layer (int_* models - Tables)                      │
│    • Multi-table joins                                      │
│    • Business logic and calculations                        │
│    • Aggregations and window functions                      │
│         ↓                                                   │
│  Gold Layer (fct_*/dim_* models - Tables)                  │
│    • Analytics-ready fact and dimension tables              │
│    • Final business metrics                                 │
│         ↓                                                   │
│  Web Dashboard & API                                        │
│    • Interactive lineage visualization                      │
│    • Model catalog and documentation                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Fastest Way to Try It (Docker - Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Setup environment for Docker
cp .env.docker .env
# Generate JWT secret and add to .env
python -c 'import secrets; print("JWT_SECRET_KEY=" + secrets.token_urlsafe(32))' >> .env

# Start all services (PostgreSQL + TransformDash)
docker-compose up -d

# Wait for containers to start (about 10 seconds)
sleep 10

# Run database migrations
docker-compose exec web bash run_migrations.sh

# Load sample data
docker-compose exec web python load_sample_data.py

# Train example ML model
docker-compose exec web python ml/train_telco_churn.py

Then visit http://localhost:8000 (default login: admin / admin)

What you get:

PostgreSQL database with user authentication
Sample e-commerce dataset (24 tables, 100+ customers, 500+ orders)
Trained ML model (Telco Customer Churn with realistic metrics)
Interactive dashboards and chart builder
ML Models tab with prediction capabilities

Install from PyPI (Simplest)

# Install the package
pip install transformdash

# Set up PostgreSQL (required - choose one option):

# Option A: Use Docker for PostgreSQL
docker run -d --name transformdash-db \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=mypassword \
  -e POSTGRES_DB=transformdash \
  postgres:15

# Option B: Use your existing PostgreSQL server
# (Make sure you have PostgreSQL 15+ running)

# Create .env file with your database credentials
cat > .env << 'EOF'
# Generate this: python -c 'import secrets; print(secrets.token_urlsafe(32))'
JWT_SECRET_KEY=your-secret-key-here

# Main TransformDash database (stores dashboards, charts, users)
TRANSFORMDASH_HOST=localhost
TRANSFORMDASH_PORT=5432
TRANSFORMDASH_DB=transformdash
TRANSFORMDASH_USER=postgres
TRANSFORMDASH_PASSWORD=mypassword

# Your analytics database (the data you want to analyze)
APP_HOST=localhost
APP_PORT=5432
APP_DB=production
APP_USER=postgres
APP_PASSWORD=mypassword
EOF

# Start the application
python -m ui.app

# Visit http://localhost:8000 (login: admin / admin)

Important Notes:

PostgreSQL is required - TransformDash is a PostgreSQL-based platform
CSV upload is available through the UI, but files are loaded into PostgreSQL tables
You cannot use TransformDash without a PostgreSQL database
The app provides dbt-like transformations + interactive dashboards for PostgreSQL

Local Development Setup (Without Docker)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start PostgreSQL with Docker (just the database)
docker-compose up -d db

# Wait for database to start
sleep 5

# Run migrations
bash run_migrations.sh

# Load sample data
python load_sample_data.py

# Train ML model
PYTHONPATH=. python ml/train_telco_churn.py

# Start the application
python ui/app.py  # Visit http://localhost:8000

# Login: admin / admin

Prerequisites

Docker & Docker Compose (for Quick Start)
OR Python 3.9+ and PostgreSQL 15+ (for manual installation)
Git

Installation Options

Option 1: Docker Compose (Recommended)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))' > jwt_key.txt
export JWT_SECRET_KEY=$(cat jwt_key.txt)

# Start all services (includes PostgreSQL)
docker-compose up -d

# Access at http://localhost:8000

Option 2: Install via pip (From Source)

# Clone the repository
git clone https://github.com/kraftaa/transformdash.git
cd transformdash

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install all dependencies (including ML support)
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Generate a secure JWT secret key
python -c 'import secrets; print(secrets.token_urlsafe(32))'
# Add the key to .env as: JWT_SECRET_KEY=<generated-key>

# Run the web UI
python ui/app.py

Option 3: Docker Only

# Build and run with your own PostgreSQL
docker build -t transformdash:latest .
docker run -d -p 8000:8000 \
  -e TRANSFORMDASH_HOST=your-postgres-host \
  -e TRANSFORMDASH_PASSWORD=your-password \
  transformdash:latest

Option 4: Kubernetes (Production)

# See DEPLOYMENT.md for full instructions
kubectl apply -f k8s/

Option 5: Testing Kubernetes Configs Locally (Minikube)

# Start minikube
minikube start

# Build Docker image in minikube's Docker environment
eval $(minikube docker-env)
docker build -t transformdash:latest .

# Deploy to minikube
kubectl apply -f k8s/

# Access the application
minikube service transformdash-service -n transformdash

For detailed deployment instructions, see DEPLOYMENT.md

Configuration

Set up database credentials:

cp .env.example .env
# Edit .env with your database credentials

Example .env:

TRANSFORMDASH_HOST=localhost
TRANSFORMDASH_PORT=5432
TRANSFORMDASH_DB=transformdash
TRANSFORMDASH_USER=postgres
TRANSFORMDASH_PASSWORD=your_password

APP_HOST=localhost
APP_PORT=5432
APP_DB=production
APP_USER=postgres
APP_PASSWORD=your_password

Initialize databases (if not using Docker Compose):

createdb transformdash
createdb production

Run Your First Transformation

Method 1: Web UI (Recommended)

# Start the web interface
python ui/app_refactored.py
# Visit http://localhost:8000

# Navigate to Models and click "▶️ Run Models"

Method 2: Create Charts and Dashboards

# Access the UI at http://localhost:8000
# 1. Go to "Chart Builder" to create visualizations
# 2. Go to "Dashboards" to build interactive dashboards
# 3. Use filters and drill-downs for analysis

Method 3: Train ML Models

# Train an example model
PYTHONPATH=. python ml/examples/train_example_model.py

# View registered models
PYTHONPATH=. python ml/registry/model_registry.py

# Use models in SQL transformations (see ml/README.md)

Method 4: API Access

# View API documentation
open http://localhost:8000/docs

# Execute transformations via API
curl -X POST http://localhost:8000/api/models/execute

# Query data
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"table": "my_model", "limit": 100}'

📁 Project Structure

transformdash/
├── connectors/              # Database connectors
│   ├── redis.py            # Redis connector
│   └── (mongodb, etc.)
├── dbt_assistant/          # Optional AI search module
│   ├── core.py            # AI search assistant
│   ├── parser.py          # SQL model parser
│   ├── embed_search.py    # FAISS semantic search
│   ├── requirements.txt   # Optional dependencies
│   └── README.md          # AI search documentation
├── models/                  # SQL transformation models
│   ├── sources.yml         # Data source definitions
│   ├── bronze/             # Staging layer (stg_*)
│   │   ├── stg_customers.sql
│   │   └── stg_orders.sql
│   ├── silver/             # Intermediate layer (int_*)
│   │   └── int_customer_orders.sql
│   └── gold/               # Analytics layer (fct_*, dim_*)
│       └── fct_orders.sql
├── transformations/         # Core transformation engine
│   ├── model.py            # Transformation model class
│   ├── dag.py              # DAG builder and validator
│   └── model_loader.py     # SQL model loader
├── orchestration/           # Execution engine
│   └── engine.py           # DAG orchestrator
├── ui/                      # Web interface
│   └── app.py              # FastAPI application
├── tests/                   # Test suite
├── config.py               # Environment configuration
├── postgres.py             # PostgreSQL connector
├── requirements.txt         # Python dependencies
└── README.md               # This file

Creating Models

Bronze Layer (Staging)

File: models/bronze/stg_customers.sql

{{ config(materialized='view') }}

-- Bronze layer: Direct extraction with minimal transformation

with transformed_data as (
    select
        id as customer_id,
        email,
        name as customer_name,
        created_at
    from {{ source('raw', 'customers') }}
)

select * from transformed_data

Silver Layer (Intermediate)

File: models/silver/int_customer_orders.sql

{{ config(
    materialized='incremental',
    unique_key='order_id'
) }}

-- Silver layer: Join customers with orders

with transformed_data as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.email as customer_email,
        o.order_date,
        o.total_amount
    from {{ ref('stg_orders') }} o
    join {{ ref('stg_customers') }} c
        on o.customer_id = c.customer_id

    {% if is_incremental() %}
        -- Only process new orders
        where o.order_date > (select max(order_date) from {{ this }})
    {% endif %}
)

select * from transformed_data

Gold Layer (Analytics)

File: models/gold/fct_orders.sql

{{ config(materialized='table') }}

-- Gold layer: Final fact table

with transformed_data as (
    select
        order_id,
        customer_id,
        customer_name,
        order_date,
        total_amount,
        extract(year from order_date) as order_year,
        extract(month from order_date) as order_month
    from {{ ref('int_customer_orders') }}
)

select * from transformed_data

Web UI Features

Dashboard

Model Catalog: Browse all transformation models
Layer Statistics: Bronze/Silver/Gold model counts
Real-time Updates: Refresh models dynamically
Code Viewer: Click any model to see its SQL code
Run Transformations: One-click execution of entire DAG

Lineage Graph

Interactive Visualization: D3.js-powered lineage graphs
Dependency Tracking: See how models depend on each other
Color-Coded Layers: Bronze (🟫), Silver (⚪), Gold (🟡)

Execution

▶️ Run Button: Execute all transformations in DAG order
Status Tracking: See execution progress and results
Error Handling: Clear error messages if something fails
Metrics: Total time, successes, failures

API Endpoints

GET /: Interactive dashboard
GET /api/models: List all models with dependencies
GET /api/models/{name}/code: Get SQL code for a model
POST /api/execute: Run all transformations
GET /api/lineage: Get DAG structure
GET /api/health: Health check

Configuration

Environment Variables (`.env`)

# PostgreSQL
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=your_database
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password

# MongoDB (optional)
MONGO_URI=mongodb://localhost:27017
MONGO_DB=your_mongo_db

# Redis (optional)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

Sources Configuration (`models/sources.yml`)

version: 2

sources:
  - name: raw
    description: "Your raw data source"
    database: your_database
    schema: public
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - not_null
              - unique
          - name: email

Testing

# Run unit tests
pytest tests/

# Test database connection
python postgres.py

# Test model loader
python transformations/model_loader.py

# Run example pipeline
python run_transformations.py

Development

Adding a New Database Connector

Create connector class in connectors/:

class MyDatabaseConnector:
    def __init__(self, connection_string):
        self.conn = ...

    def query_to_dataframe(self, query):
        return pd.read_sql(query, self.conn)

Add to config.py:

MY_DB_URI = os.getenv('MY_DB_URI')

Use in transformations:

from connectors.mydatabase import MyDatabaseConnector

def my_transformation(context):
    with MyDatabaseConnector() as db:
        return db.query_to_dataframe("SELECT * FROM table")

Adding Custom Macros

Extend ModelLoader in transformations/model_loader.py:

def my_custom_macro(self, arg1, arg2):
    return f"processed_{arg1}_{arg2}"

# Register in render_sql method
env.globals['my_macro'] = self.my_custom_macro

Use Cases

Data Warehousing

Extract data from multiple sources
Transform with SQL for performance
Load into analytics-ready tables

Business Intelligence

Create conformed dimensions
Build fact tables for metrics
Serve dashboards and reports

Data Engineering

Orchestrate complex pipelines
Track data lineage
Incremental processing for efficiency

Analytics Engineering

SQL transformations with dependency management
Version-controlled SQL
Collaborative data modeling

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built after working with dbt, Airflow, Airbyte, Superset, Tableau, and building/running custom Rust transformations in Kubernetes via CronJobs - wanted a single tool that combines transformation, orchestration, and visualization
Built with FastAPI, Pandas, and D3.js
Follows the Medallion Architecture pattern

Support

Documentation: See GETTING_STARTED.md and DEPLOYMENT.md
Issues: GitHub Issues
Discussions: GitHub Discussions

Roadmap

Building transformdash pip package
Add Spark connector for big data
Add Netsuite connector
Add S3 connector
Implement data quality testing framework
Add CI/CD pipeline templates
Create VSCode extension
Real-time data streaming
Cloud deployment guides (AWS, GCP, Azure)
Metric computation layer
Row-level security

Star us on GitHub

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Jan 17, 2026

1.0.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformdash-1.0.1.tar.gz (2.1 MB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformdash-1.0.1-py3-none-any.whl (2.1 MB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file transformdash-1.0.1.tar.gz.

File metadata

Download URL: transformdash-1.0.1.tar.gz
Upload date: Jan 17, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1bc539500eab507da13fb3d26801c2596c16992d99fb37d456adb8b272febd4b`
MD5	`2a7af9c226db9d6b888caf962348be1c`
BLAKE2b-256	`604bed0c9bb68070f6a54ef24fae040256c0d93fcc85c5efce090797ebd7a0eb`

See more details on using hashes here.

File details

Details for the file transformdash-1.0.1-py3-none-any.whl.

File metadata

Download URL: transformdash-1.0.1-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 2.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for transformdash-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54eca4c2a7b76deb9b7a73c2898cd6a6dae3e44d37ce58ede9cf10ef3fb3bdb1`
MD5	`f0f10298a9fda16f3fa232b75117f203`
BLAKE2b-256	`ba0ef116a7185be6167b3804e6f53e77b5294974e9526de96648ff0e8e43fbd3`

See more details on using hashes here.

transformdash 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TransformDash

Try the Live Demo

Features

Core Capabilities

AI-Powered Search (Optional)

Model Features

Architecture

Quick Start

Fastest Way to Try It (Docker - Recommended)

Install from PyPI (Simplest)

Local Development Setup (Without Docker)

Prerequisites

Installation Options

Configuration

Run Your First Transformation

📁 Project Structure

Creating Models

Bronze Layer (Staging)

Silver Layer (Intermediate)

Gold Layer (Analytics)

Web UI Features

Dashboard

Lineage Graph

Execution

API Endpoints

Configuration

Environment Variables (.env)

Sources Configuration (models/sources.yml)

Testing

Development

Adding a New Database Connector

Adding Custom Macros

Use Cases

Data Warehousing

Business Intelligence

Data Engineering

Analytics Engineering

Contributing

License

Acknowledgments

Support

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Environment Variables (`.env`)

Sources Configuration (`models/sources.yml`)