Skip to main content

Generate beautiful, interactive column-level lineage for dbt projects

Project description

miswag-dbt-lineage

๐Ÿ” Generate beautiful, interactive column-level lineage for your dbt projects

PyPI version Python 3.9+ License

miswag-dbt-lineage is a lightweight, dbt-native tool that generates a static website with interactive column-level lineage visualization. No backend, no serversโ€”just beautiful, deployable lineage documentation.

Lineage Portal Screenshot

โœจ Features

  • ๐Ÿ”— Column-level lineage โ€” trace data flow through transformations
  • ๐Ÿ“Š Table-level lineage โ€” visualize model dependencies
  • ๐ŸŽจ Interactive visualization โ€” pan, zoom, and explore your data pipelines
  • ๐Ÿš€ Static output โ€” deploy to S3, GCS, GitHub Pages, or any static host
  • ๐ŸŽฏ dbt-native โ€” works with your existing dbt artifacts (no code changes needed)
  • โšก Fast โ€” handles 1000+ models and 10,000+ columns
  • ๐ŸŒˆ Beautiful UI โ€” dark theme, color-coded layers, transformation indicators

๐ŸŽฏ What It Does

  1. Reads your dbt artifacts (manifest.json, catalog.json)
  2. Extracts column-level lineage using SQL parsing (powered by sqlglot)
  3. Generates a static website with an interactive lineage explorer
  4. Deploys anywhere โ€” S3, GCS, Azure Blob, GitHub Pages, etc.

๐Ÿ“ฆ Installation

pip install miswag-dbt-lineage

Or install from source:

git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

# Navigate to your dbt project
cd my-dbt-project

# Generate lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json

All-in-One Build

# Runs 'dbt docs generate' + generates lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage build

View Locally

cd target/lineage_website
python -m http.server 8080
# Open http://localhost:8080

๐Ÿ“š Usage

Commands

generate โ€” Generate lineage site from artifacts

miswag-dbt-lineage generate [OPTIONS]

Options:

  • --manifest, -m PATH โ€” Path to manifest.json (default: target/manifest.json)
  • --catalog, -c PATH โ€” Path to catalog.json (optional but recommended)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --dialect, -d TEXT โ€” SQL dialect: clickhouse, postgres, snowflake, bigquery, etc. (default: clickhouse)
  • --verbose โ€” Enable verbose logging
  • --help โ€” Show help

Example:

miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output docs/lineage \
  --dialect snowflake

build โ€” Build lineage (runs dbt docs + generate)

miswag-dbt-lineage build [OPTIONS]

Options:

  • --project-dir, -p PATH โ€” dbt project directory (default: .)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --skip-dbt-docs โ€” Skip running dbt docs generate
  • --dialect, -d TEXT โ€” SQL dialect (default: clickhouse)
  • --help โ€” Show help

Example:

miswag-dbt-lineage build --dialect postgres

Supported SQL Dialects

  • clickhouse (default)
  • postgres
  • snowflake
  • bigquery
  • redshift
  • databricks
  • mysql
  • tsql (SQL Server)
  • And more โ€” see sqlglot docs

๐ŸŒ Deployment

The generated site is a fully static collection of HTML/CSS/JS files. Deploy it anywhere:

AWS S3

aws s3 sync target/lineage_website s3://my-bucket/lineage-docs/
aws s3 website s3://my-bucket --index-document index.html

Google Cloud Storage

gsutil -m rsync -r target/lineage_website gs://my-bucket/lineage-docs/
gsutil web set -m index.html gs://my-bucket

Azure Blob Storage

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination '$web' \
  --source target/lineage_website

GitHub Pages

# Push to gh-pages branch
cd target/lineage_website
git init
git checkout -b gh-pages
git add .
git commit -m "Deploy lineage site"
git remote add origin https://github.com/your-org/your-repo.git
git push -f origin gh-pages

๐ŸŽจ Features Walkthrough

Table Lineage

  • โœ… Visualize upstream & downstream model dependencies
  • โœ… Color-coded layers (source, staging, intermediate, mart, seed)
  • โœ… Click any model to see its lineage
  • โœ… Inline model metadata (layer, materialization, columns, tests, deps)
  • โœ… Adjustable depth (1-5 levels)

Column Lineage

  • โœ… Trace column-to-column data flow
  • โœ… Transformation type indicators (DIRECT, RENAMED, FUNCTION, CASE, AGG, CALC)
  • โœ… Color-coded edges for transformation types
  • โœ… Inline column metadata (name, type, model, transformation SQL)
  • โœ… Click any column to pivot to its lineage
  • โœ… Adjustable depth (1-5 levels)

Catalog Views

  • โœ… Models โ€” browse all models with metadata
  • โœ… Sources โ€” view all data sources
  • โœ… Tests โ€” see all data quality tests
  • โœ… Search and filter by layer, directory, etc.

๐Ÿ› ๏ธ How It Works

Architecture

dbt artifacts โ†’ SQL parsing โ†’ Lineage graph โ†’ Static website
    โ†“               โ†“              โ†“               โ†“
manifest.json   sqlglot      lineage.json    index.html
catalog.json                                  + data/

Lineage Resolution

  1. Read dbt artifacts โ€” Parse manifest.json and catalog.json
  2. Extract dependencies โ€” Identify model โ†’ model relationships
  3. Parse compiled SQL โ€” Use sqlglot to analyze SELECT statements
  4. Resolve columns โ€” Match columns across CTEs, aliases, and transformations
  5. Classify transformations โ€” Detect aggregations, functions, CASE expressions, etc.
  6. Generate graph โ€” Build node/edge graph with metadata
  7. Create static site โ€” Bundle HTML + JSON for deployment

๐Ÿ“– Configuration

Layer Classification

By default, models are classified into layers based on naming conventions:

  • source: source.*
  • staging: .stg_, staging
  • intermediate: .int_, intermediate
  • mart: .mart, .fct_, .dim_, marts
  • seed: seed.*

You can customize this in the extractor code (miswag_dbt_lineage/extractor.py).


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repo
git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Built for the dbt community
  • Powered by sqlglot for SQL parsing
  • Inspired by dbt docs and various lineage visualization tools

๐Ÿ“ง Contact


โญ If you find this useful, please star the repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miswag_dbt_lineage-0.1.1.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miswag_dbt_lineage-0.1.1-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file miswag_dbt_lineage-0.1.1.tar.gz.

File metadata

  • Download URL: miswag_dbt_lineage-0.1.1.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for miswag_dbt_lineage-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fdb9930cd4a2a28d59ab295bdccfa6e553c40d2f5290feb89d6b2a89d872291a
MD5 f339122769f4f2ca76cf61934cbc1b32
BLAKE2b-256 57880ca16e380bbf26d8cd398d50e7ead8f5da3a84197499456e82a8c719048b

See more details on using hashes here.

File details

Details for the file miswag_dbt_lineage-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for miswag_dbt_lineage-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 77b1e33d2aec2d014816a93e105bc5721217ee0d63d83a1ab298968908386620
MD5 d56ef13275e7fd452352f6a49788465f
BLAKE2b-256 a35c040865369c1b1760526fe3a7f4708dcbc2e3cffee55fb7d6008a320d3e04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page