Skip to main content

Generate beautiful, interactive column-level lineage for dbt projects

Project description

miswag-dbt-lineage

๐Ÿ” Generate beautiful, interactive column-level lineage for your dbt projects

PyPI version Python 3.9+ License

miswag-dbt-lineage is a lightweight, dbt-native tool that generates a static website with interactive column-level lineage visualization. No backend, no serversโ€”just beautiful, deployable lineage documentation.

Lineage Portal Screenshot

โœจ Features

  • ๐Ÿ”— Column-level lineage โ€” trace data flow through transformations
  • ๐Ÿ“Š Table-level lineage โ€” visualize model dependencies
  • ๐ŸŽจ Interactive visualization โ€” pan, zoom, and explore your data pipelines
  • ๐Ÿš€ Static output โ€” deploy to S3, GCS, GitHub Pages, or any static host
  • ๐ŸŽฏ dbt-native โ€” works with your existing dbt artifacts (no code changes needed)
  • โšก Fast โ€” handles 1000+ models and 10,000+ columns
  • ๐ŸŒˆ Beautiful UI โ€” dark theme, color-coded layers, transformation indicators

๐ŸŽฏ What It Does

  1. Reads your dbt artifacts (manifest.json, catalog.json)
  2. Extracts column-level lineage using SQL parsing (powered by sqlglot)
  3. Generates a static website with an interactive lineage explorer
  4. Deploys anywhere โ€” S3, GCS, Azure Blob, GitHub Pages, etc.

๐Ÿ“ฆ Installation

pip install miswag-dbt-lineage

Or install from source:

git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

# Navigate to your dbt project
cd my-dbt-project

# Generate lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json

All-in-One Build

# Runs 'dbt docs generate' + generates lineage site (output defaults to target/lineage_website)
miswag-dbt-lineage build

View Locally

cd target/lineage_website
python -m http.server 8080
# Open http://localhost:8080

๐Ÿ“š Usage

Commands

generate โ€” Generate lineage site from artifacts

miswag-dbt-lineage generate [OPTIONS]

Options:

  • --manifest, -m PATH โ€” Path to manifest.json (default: target/manifest.json)
  • --catalog, -c PATH โ€” Path to catalog.json (optional but recommended)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --dialect, -d TEXT โ€” SQL dialect: clickhouse, postgres, snowflake, bigquery, etc. (default: clickhouse)
  • --verbose โ€” Enable verbose logging
  • --help โ€” Show help

Example:

miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output docs/lineage \
  --dialect snowflake

build โ€” Build lineage (runs dbt docs + generate)

miswag-dbt-lineage build [OPTIONS]

Options:

  • --project-dir, -p PATH โ€” dbt project directory (default: .)
  • --output, -o PATH โ€” Output directory (default: target/lineage_website)
  • --skip-dbt-docs โ€” Skip running dbt docs generate
  • --dialect, -d TEXT โ€” SQL dialect (default: clickhouse)
  • --help โ€” Show help

Example:

miswag-dbt-lineage build --dialect postgres

Supported SQL Dialects

  • clickhouse (default)
  • postgres
  • snowflake
  • bigquery
  • redshift
  • databricks
  • mysql
  • tsql (SQL Server)
  • And more โ€” see sqlglot docs

๐ŸŒ Deployment

The generated site is a fully static collection of HTML/CSS/JS files. Deploy it anywhere:

AWS S3

aws s3 sync target/lineage_website s3://my-bucket/lineage-docs/
aws s3 website s3://my-bucket --index-document index.html

Google Cloud Storage

gsutil -m rsync -r target/lineage_website gs://my-bucket/lineage-docs/
gsutil web set -m index.html gs://my-bucket

Azure Blob Storage

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination '$web' \
  --source target/lineage_website

GitHub Pages

# Push to gh-pages branch
cd target/lineage_website
git init
git checkout -b gh-pages
git add .
git commit -m "Deploy lineage site"
git remote add origin https://github.com/your-org/your-repo.git
git push -f origin gh-pages

๐ŸŽจ Features Walkthrough

Table Lineage

  • โœ… Visualize upstream & downstream model dependencies
  • โœ… Color-coded layers (source, staging, intermediate, mart, seed)
  • โœ… Click any model to see its lineage
  • โœ… Inline model metadata (layer, materialization, columns, tests, deps)
  • โœ… Adjustable depth (1-5 levels)

Column Lineage

  • โœ… Trace column-to-column data flow
  • โœ… Transformation type indicators (DIRECT, RENAMED, FUNCTION, CASE, AGG, CALC)
  • โœ… Color-coded edges for transformation types
  • โœ… Inline column metadata (name, type, model, transformation SQL)
  • โœ… Click any column to pivot to its lineage
  • โœ… Adjustable depth (1-5 levels)

Catalog Views

  • โœ… Models โ€” browse all models with metadata
  • โœ… Sources โ€” view all data sources
  • โœ… Tests โ€” see all data quality tests
  • โœ… Search and filter by layer, directory, etc.

๐Ÿ› ๏ธ How It Works

Architecture

dbt artifacts โ†’ SQL parsing โ†’ Lineage graph โ†’ Static website
    โ†“               โ†“              โ†“               โ†“
manifest.json   sqlglot      lineage.json    index.html
catalog.json                                  + data/

Lineage Resolution

  1. Read dbt artifacts โ€” Parse manifest.json and catalog.json
  2. Extract dependencies โ€” Identify model โ†’ model relationships
  3. Parse compiled SQL โ€” Use sqlglot to analyze SELECT statements
  4. Resolve columns โ€” Match columns across CTEs, aliases, and transformations
  5. Classify transformations โ€” Detect aggregations, functions, CASE expressions, etc.
  6. Generate graph โ€” Build node/edge graph with metadata
  7. Create static site โ€” Bundle HTML + JSON for deployment

๐Ÿ“– Configuration

Layer Classification

By default, models are classified into layers based on naming conventions:

  • source: source.*
  • staging: .stg_, staging
  • intermediate: .int_, intermediate
  • mart: .mart, .fct_, .dim_, marts
  • seed: seed.*

You can customize this in the extractor code (miswag_dbt_lineage/extractor.py).


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repo
git clone https://github.com/hameeddataeng/miswag-dbt-lineage.git
cd miswag-dbt-lineage

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Built for the dbt community
  • Powered by sqlglot for SQL parsing
  • Inspired by dbt docs and various lineage visualization tools

๐Ÿ“ง Contact


โญ If you find this useful, please star the repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miswag_dbt_lineage-0.1.4.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miswag_dbt_lineage-0.1.4-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file miswag_dbt_lineage-0.1.4.tar.gz.

File metadata

  • Download URL: miswag_dbt_lineage-0.1.4.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for miswag_dbt_lineage-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ea182bce7e1c386b9bbc1a5c2bffa792858138d1378986ce38a66dc77173ec4f
MD5 cda3fdf0e452b23b6ac3f61b136304c8
BLAKE2b-256 49c2ae7d884cce07a79e04e022ca752bae968245efcbbe880ed5bcea543def40

See more details on using hashes here.

File details

Details for the file miswag_dbt_lineage-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for miswag_dbt_lineage-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f57501a423fe4608658714d5943c99dbc7192c3eec45d12d243ac300d01c4fd2
MD5 63e38a6146cb243356281a2559ed480a
BLAKE2b-256 0c7172f336a48dd0011f72f4a3f1a8c31c533eb5c795aefcb604fe23cc652888

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page