Skip to main content

Generate beautiful, interactive column-level lineage for dbt projects

Project description

miswag-dbt-lineage

๐Ÿ” Generate beautiful, interactive column-level lineage for your dbt projects

PyPI version Python 3.9+ License

miswag-dbt-lineage is a lightweight, dbt-native tool that generates a static website with interactive column-level lineage visualization. No backend, no serversโ€”just beautiful, deployable lineage documentation.

Lineage Portal Screenshot

โœจ Features

  • ๐Ÿ”— Column-level lineage โ€” trace data flow through transformations
  • ๐Ÿ“Š Table-level lineage โ€” visualize model dependencies
  • ๐ŸŽจ Interactive visualization โ€” pan, zoom, and explore your data pipelines
  • ๐Ÿš€ Static output โ€” deploy to S3, GCS, GitHub Pages, or any static host
  • ๐ŸŽฏ dbt-native โ€” works with your existing dbt artifacts (no code changes needed)
  • โšก Fast โ€” handles 1000+ models and 10,000+ columns
  • ๐ŸŒˆ Beautiful UI โ€” dark theme, color-coded layers, transformation indicators

๐ŸŽฏ What It Does

  1. Reads your dbt artifacts (manifest.json, catalog.json)
  2. Extracts column-level lineage using SQL parsing (powered by sqlglot)
  3. Generates a static website with an interactive lineage explorer
  4. Deploys anywhere โ€” S3, GCS, Azure Blob, GitHub Pages, etc.

๐Ÿ“ฆ Installation

pip install miswag-dbt-lineage

Or install from source:

git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

# Navigate to your dbt project
cd my-dbt-project

# Generate lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json

All-in-One Build

# Runs 'dbt docs generate' + generates lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage build

View Locally

cd target/lineage_website
python -m http.server 8080
# Open http://localhost:8080

๐Ÿ“š Usage

Commands

generate โ€” Generate lineage site from artifacts

miswag-dbt-lineage generate [OPTIONS]

Options:

  • --manifest, -m PATH โ€” Path to manifest.json (default: target/manifest.json)
  • --catalog, -c PATH โ€” Path to catalog.json (optional but recommended)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --dialect, -d TEXT โ€” SQL dialect: clickhouse, postgres, snowflake, bigquery, etc. (default: clickhouse)
  • --verbose โ€” Enable verbose logging
  • --help โ€” Show help

Example:

miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output docs/lineage \
  --dialect snowflake

build โ€” Build lineage (runs dbt docs + generate)

miswag-dbt-lineage build [OPTIONS]

Options:

  • --project-dir, -p PATH โ€” dbt project directory (default: .)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --skip-dbt-docs โ€” Skip running dbt docs generate
  • --dialect, -d TEXT โ€” SQL dialect (default: clickhouse)
  • --help โ€” Show help

Example:

miswag-dbt-lineage build --dialect postgres

Supported SQL Dialects

  • clickhouse (default)
  • postgres
  • snowflake
  • bigquery
  • redshift
  • databricks
  • mysql
  • tsql (SQL Server)
  • And more โ€” see sqlglot docs

๐ŸŒ Deployment

The generated site is a fully static collection of HTML/CSS/JS files. Deploy it anywhere:

AWS S3

aws s3 sync target/lineage_website s3://my-bucket/lineage-docs/
aws s3 website s3://my-bucket --index-document index.html

Google Cloud Storage

gsutil -m rsync -r target/lineage_website gs://my-bucket/lineage-docs/
gsutil web set -m index.html gs://my-bucket

Azure Blob Storage

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination '$web' \
  --source target/lineage_website

GitHub Pages

# Push to gh-pages branch
cd target/lineage_website
git init
git checkout -b gh-pages
git add .
git commit -m "Deploy lineage site"
git remote add origin https://github.com/your-org/your-repo.git
git push -f origin gh-pages

๐ŸŽจ Features Walkthrough

Table Lineage

  • โœ… Visualize upstream & downstream model dependencies
  • โœ… Color-coded layers (source, staging, intermediate, mart, seed)
  • โœ… Click any model to see its lineage
  • โœ… Inline model metadata (layer, materialization, columns, tests, deps)
  • โœ… Adjustable depth (1-5 levels)

Column Lineage

  • โœ… Trace column-to-column data flow
  • โœ… Transformation type indicators (DIRECT, RENAMED, FUNCTION, CASE, AGG, CALC)
  • โœ… Color-coded edges for transformation types
  • โœ… Inline column metadata (name, type, model, transformation SQL)
  • โœ… Click any column to pivot to its lineage
  • โœ… Adjustable depth (1-5 levels)

Catalog Views

  • โœ… Models โ€” browse all models with metadata
  • โœ… Sources โ€” view all data sources
  • โœ… Tests โ€” see all data quality tests
  • โœ… Search and filter by layer, directory, etc.

๐Ÿ› ๏ธ How It Works

Architecture

dbt artifacts โ†’ SQL parsing โ†’ Lineage graph โ†’ Static website
    โ†“               โ†“              โ†“               โ†“
manifest.json   sqlglot      lineage.json    index.html
catalog.json                                  + data/

Lineage Resolution

  1. Read dbt artifacts โ€” Parse manifest.json and catalog.json
  2. Extract dependencies โ€” Identify model โ†’ model relationships
  3. Parse compiled SQL โ€” Use sqlglot to analyze SELECT statements
  4. Resolve columns โ€” Match columns across CTEs, aliases, and transformations
  5. Classify transformations โ€” Detect aggregations, functions, CASE expressions, etc.
  6. Generate graph โ€” Build node/edge graph with metadata
  7. Create static site โ€” Bundle HTML + JSON for deployment

๐Ÿ“– Configuration

Layer Classification

By default, models are classified into layers based on naming conventions:

  • source: source.*
  • staging: .stg_, staging
  • intermediate: .int_, intermediate
  • mart: .mart, .fct_, .dim_, marts
  • seed: seed.*

You can customize this in the extractor code (miswag_dbt_lineage/extractor.py).


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repo
git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Built for the dbt community
  • Powered by sqlglot for SQL parsing
  • Inspired by dbt docs and various lineage visualization tools

๐Ÿ“ง Contact


โญ If you find this useful, please star the repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miswag_dbt_lineage-0.1.3.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miswag_dbt_lineage-0.1.3-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file miswag_dbt_lineage-0.1.3.tar.gz.

File metadata

  • Download URL: miswag_dbt_lineage-0.1.3.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for miswag_dbt_lineage-0.1.3.tar.gz
Algorithm Hash digest
SHA256 dc418d23f3417c4cd171717b82bcb537af357f4aa084b3b07d74ba80772eb335
MD5 b74269a360ba528771fb715b95528455
BLAKE2b-256 28283838435eb6dc90bee6b120dcf9756c8acd15dfe7b9172173be4e4119e7ce

See more details on using hashes here.

File details

Details for the file miswag_dbt_lineage-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for miswag_dbt_lineage-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 52053e55e0073307ff6aa973d10e188a387e888586e75a356d42eeb17c4099f8
MD5 fdf05532303e4e788ca6d4754edc417e
BLAKE2b-256 e4c906c09eef9955097148db8bce0677f33de3b2f001b43aa130bdcf442baaad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page