Skip to main content

Generate beautiful, interactive column-level lineage for dbt projects

Project description

miswag-dbt-lineage

๐Ÿ” Generate beautiful, interactive column-level lineage for your dbt projects

PyPI version Python 3.9+ License

miswag-dbt-lineage is a lightweight, dbt-native tool that generates a static website with interactive column-level lineage visualization. No backend, no serversโ€”just beautiful, deployable lineage documentation.

Lineage Portal Screenshot

โœจ Features

  • ๐Ÿ”— Column-level lineage โ€” trace data flow through transformations
  • ๐Ÿ“Š Table-level lineage โ€” visualize model dependencies
  • ๐ŸŽจ Interactive visualization โ€” pan, zoom, and explore your data pipelines
  • ๐Ÿš€ Static output โ€” deploy to S3, GCS, GitHub Pages, or any static host
  • ๐ŸŽฏ dbt-native โ€” works with your existing dbt artifacts (no code changes needed)
  • โšก Fast โ€” handles 1000+ models and 10,000+ columns
  • ๐ŸŒˆ Beautiful UI โ€” dark theme, color-coded layers, transformation indicators

๐ŸŽฏ What It Does

  1. Reads your dbt artifacts (manifest.json, catalog.json)
  2. Extracts column-level lineage using SQL parsing (powered by sqlglot)
  3. Generates a static website with an interactive lineage explorer
  4. Deploys anywhere โ€” S3, GCS, Azure Blob, GitHub Pages, etc.

๐Ÿ“ฆ Installation

pip install miswag-dbt-lineage

Or install from source:

git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

# Navigate to your dbt project
cd my-dbt-project

# Generate lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json

All-in-One Build

# Runs 'dbt docs generate' + generates lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage build

View Locally

cd target/lineage_website
python -m http.server 8080
# Open http://localhost:8080

๐Ÿ“š Usage

Commands

generate โ€” Generate lineage site from artifacts

miswag-dbt-lineage generate [OPTIONS]

Options:

  • --manifest, -m PATH โ€” Path to manifest.json (default: target/manifest.json)
  • --catalog, -c PATH โ€” Path to catalog.json (optional but recommended)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --dialect, -d TEXT โ€” SQL dialect: clickhouse, postgres, snowflake, bigquery, etc. (default: clickhouse)
  • --verbose โ€” Enable verbose logging
  • --help โ€” Show help

Example:

miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output docs/lineage \
  --dialect snowflake

build โ€” Build lineage (runs dbt docs + generate)

miswag-dbt-lineage build [OPTIONS]

Options:

  • --project-dir, -p PATH โ€” dbt project directory (default: .)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --skip-dbt-docs โ€” Skip running dbt docs generate
  • --dialect, -d TEXT โ€” SQL dialect (default: clickhouse)
  • --help โ€” Show help

Example:

miswag-dbt-lineage build --dialect postgres

Supported SQL Dialects

  • clickhouse (default)
  • postgres
  • snowflake
  • bigquery
  • redshift
  • databricks
  • mysql
  • tsql (SQL Server)
  • And more โ€” see sqlglot docs

๐ŸŒ Deployment

The generated site is a fully static collection of HTML/CSS/JS files. Deploy it anywhere:

AWS S3

aws s3 sync target/lineage_website s3://my-bucket/lineage-docs/
aws s3 website s3://my-bucket --index-document index.html

Google Cloud Storage

gsutil -m rsync -r target/lineage_website gs://my-bucket/lineage-docs/
gsutil web set -m index.html gs://my-bucket

Azure Blob Storage

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination '$web' \
  --source target/lineage_website

GitHub Pages

# Push to gh-pages branch
cd target/lineage_website
git init
git checkout -b gh-pages
git add .
git commit -m "Deploy lineage site"
git remote add origin https://github.com/your-org/your-repo.git
git push -f origin gh-pages

๐ŸŽจ Features Walkthrough

Table Lineage

  • โœ… Visualize upstream & downstream model dependencies
  • โœ… Color-coded layers (source, staging, intermediate, mart, seed)
  • โœ… Click any model to see its lineage
  • โœ… Inline model metadata (layer, materialization, columns, tests, deps)
  • โœ… Adjustable depth (1-5 levels)

Column Lineage

  • โœ… Trace column-to-column data flow
  • โœ… Transformation type indicators (DIRECT, RENAMED, FUNCTION, CASE, AGG, CALC)
  • โœ… Color-coded edges for transformation types
  • โœ… Inline column metadata (name, type, model, transformation SQL)
  • โœ… Click any column to pivot to its lineage
  • โœ… Adjustable depth (1-5 levels)

Catalog Views

  • โœ… Models โ€” browse all models with metadata
  • โœ… Sources โ€” view all data sources
  • โœ… Tests โ€” see all data quality tests
  • โœ… Search and filter by layer, directory, etc.

๐Ÿ› ๏ธ How It Works

Architecture

dbt artifacts โ†’ SQL parsing โ†’ Lineage graph โ†’ Static website
    โ†“               โ†“              โ†“               โ†“
manifest.json   sqlglot      lineage.json    index.html
catalog.json                                  + data/

Lineage Resolution

  1. Read dbt artifacts โ€” Parse manifest.json and catalog.json
  2. Extract dependencies โ€” Identify model โ†’ model relationships
  3. Parse compiled SQL โ€” Use sqlglot to analyze SELECT statements
  4. Resolve columns โ€” Match columns across CTEs, aliases, and transformations
  5. Classify transformations โ€” Detect aggregations, functions, CASE expressions, etc.
  6. Generate graph โ€” Build node/edge graph with metadata
  7. Create static site โ€” Bundle HTML + JSON for deployment

๐Ÿ“– Configuration

Layer Classification

By default, models are classified into layers based on naming conventions:

  • source: source.*
  • staging: .stg_, staging
  • intermediate: .int_, intermediate
  • mart: .mart, .fct_, .dim_, marts
  • seed: seed.*

You can customize this in the extractor code (miswag_dbt_lineage/extractor.py).


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repo
git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Built for the dbt community
  • Powered by sqlglot for SQL parsing
  • Inspired by dbt docs and various lineage visualization tools

๐Ÿ“ง Contact


โญ If you find this useful, please star the repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miswag_dbt_lineage-0.1.2.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miswag_dbt_lineage-0.1.2-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file miswag_dbt_lineage-0.1.2.tar.gz.

File metadata

  • Download URL: miswag_dbt_lineage-0.1.2.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for miswag_dbt_lineage-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9db50fa04ec35a6d10404fcac4eea650938395ec37d1c2241883c6479c46e830
MD5 e89b5e789ef4bdcdbf7fad537a4e8e20
BLAKE2b-256 c56ef2289ee5b6974bb7d51d4512cf0a7216c2caac358ecbed0b3ff57d228717

See more details on using hashes here.

File details

Details for the file miswag_dbt_lineage-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for miswag_dbt_lineage-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2f0805daee5cf22d94b5277a2bb688e5077aa0ec275b0b62194c55f11664dd56
MD5 d278a07766fa91ad5bc8da01a63e3271
BLAKE2b-256 7428c851e4c3892211d2bbd95d48a4102a62936e6deca2497d96c549821c4ece

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page