Skip to main content

Generate beautiful, interactive column-level lineage for dbt projects

Project description

miswag-dbt-lineage

๐Ÿ” Generate beautiful, interactive column-level lineage for your dbt projects

PyPI version Python 3.9+ License

miswag-dbt-lineage is a lightweight, dbt-native tool that generates a static website with interactive column-level lineage visualization. No backend, no serversโ€”just beautiful, deployable lineage documentation.

Lineage Portal Screenshot

โœจ Features

  • ๐Ÿ”— Column-level lineage โ€” trace data flow through transformations
  • ๐Ÿ“Š Table-level lineage โ€” visualize model dependencies
  • ๐ŸŽจ Interactive visualization โ€” pan, zoom, and explore your data pipelines
  • ๐Ÿš€ Static output โ€” deploy to S3, GCS, GitHub Pages, or any static host
  • ๐ŸŽฏ dbt-native โ€” works with your existing dbt artifacts (no code changes needed)
  • โšก Fast โ€” handles 1000+ models and 10,000+ columns
  • ๐ŸŒˆ Beautiful UI โ€” dark theme, color-coded layers, transformation indicators

๐ŸŽฏ What It Does

  1. Reads your dbt artifacts (manifest.json, catalog.json)
  2. Extracts column-level lineage using SQL parsing (powered by sqlglot)
  3. Generates a static website with an interactive lineage explorer
  4. Deploys anywhere โ€” S3, GCS, Azure Blob, GitHub Pages, etc.

๐Ÿ“ฆ Installation

pip install miswag-dbt-lineage

Or install from source:

git clone https://github.com/miswag/miswag-dbt-lineage.git
cd miswag-dbt-lineage
pip install -e .

๐Ÿš€ Quick Start

Basic Usage

# Navigate to your dbt project
cd my-dbt-project

# Generate lineage site
miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output lineage-site

All-in-One Build

# Runs 'dbt docs generate' + generates lineage site
miswag-dbt-lineage build --output lineage-site

View Locally

cd lineage-site
python -m http.server 8080
# Open http://localhost:8080

๐Ÿ“š Usage

Commands

generate โ€” Generate lineage site from artifacts

miswag-dbt-lineage generate [OPTIONS]

Options:

  • --manifest, -m PATH โ€” Path to manifest.json (default: target/manifest.json)
  • --catalog, -c PATH โ€” Path to catalog.json (optional but recommended)
  • --output, -o PATH โ€” Output directory (default: lineage-site)
  • --dialect, -d TEXT โ€” SQL dialect: clickhouse, postgres, snowflake, bigquery, etc. (default: clickhouse)
  • --verbose โ€” Enable verbose logging
  • --help โ€” Show help

Example:

miswag-dbt-lineage generate \
  --manifest target/manifest.json \
  --catalog target/catalog.json \
  --output docs/lineage \
  --dialect snowflake

build โ€” Build lineage (runs dbt docs + generate)

miswag-dbt-lineage build [OPTIONS]

Options:

  • --project-dir, -p PATH โ€” dbt project directory (default: .)
  • --output, -o PATH โ€” Output directory (default: lineage-site)
  • --skip-dbt-docs โ€” Skip running dbt docs generate
  • --dialect, -d TEXT โ€” SQL dialect (default: clickhouse)
  • --help โ€” Show help

Example:

miswag-dbt-lineage build --output lineage-site --dialect postgres

Supported SQL Dialects

  • clickhouse (default)
  • postgres
  • snowflake
  • bigquery
  • redshift
  • databricks
  • mysql
  • tsql (SQL Server)
  • And more โ€” see sqlglot docs

๐ŸŒ Deployment

The generated site is a fully static collection of HTML/CSS/JS files. Deploy it anywhere:

AWS S3

aws s3 sync lineage-site s3://my-bucket/lineage-docs/
aws s3 website s3://my-bucket --index-document index.html

Google Cloud Storage

gsutil -m rsync -r lineage-site gs://my-bucket/lineage-docs/
gsutil web set -m index.html gs://my-bucket

Azure Blob Storage

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination '$web' \
  --source lineage-site

GitHub Pages

# Push to gh-pages branch
cd lineage-site
git init
git checkout -b gh-pages
git add .
git commit -m "Deploy lineage site"
git remote add origin https://github.com/your-org/your-repo.git
git push -f origin gh-pages

๐ŸŽจ Features Walkthrough

Table Lineage

  • โœ… Visualize upstream & downstream model dependencies
  • โœ… Color-coded layers (source, staging, intermediate, mart, seed)
  • โœ… Click any model to see its lineage
  • โœ… Inline model metadata (layer, materialization, columns, tests, deps)
  • โœ… Adjustable depth (1-5 levels)

Column Lineage

  • โœ… Trace column-to-column data flow
  • โœ… Transformation type indicators (DIRECT, RENAMED, FUNCTION, CASE, AGG, CALC)
  • โœ… Color-coded edges for transformation types
  • โœ… Inline column metadata (name, type, model, transformation SQL)
  • โœ… Click any column to pivot to its lineage
  • โœ… Adjustable depth (1-5 levels)

Catalog Views

  • โœ… Models โ€” browse all models with metadata
  • โœ… Sources โ€” view all data sources
  • โœ… Tests โ€” see all data quality tests
  • โœ… Search and filter by layer, directory, etc.

๐Ÿ› ๏ธ How It Works

Architecture

dbt artifacts โ†’ SQL parsing โ†’ Lineage graph โ†’ Static website
    โ†“               โ†“              โ†“               โ†“
manifest.json   sqlglot      lineage.json    index.html
catalog.json                                  + data/

Lineage Resolution

  1. Read dbt artifacts โ€” Parse manifest.json and catalog.json
  2. Extract dependencies โ€” Identify model โ†’ model relationships
  3. Parse compiled SQL โ€” Use sqlglot to analyze SELECT statements
  4. Resolve columns โ€” Match columns across CTEs, aliases, and transformations
  5. Classify transformations โ€” Detect aggregations, functions, CASE expressions, etc.
  6. Generate graph โ€” Build node/edge graph with metadata
  7. Create static site โ€” Bundle HTML + JSON for deployment

๐Ÿ“– Configuration

Layer Classification

By default, models are classified into layers based on naming conventions:

  • source: source.*
  • staging: .stg_, staging
  • intermediate: .int_, intermediate
  • mart: .mart, .fct_, .dim_, marts
  • seed: seed.*

You can customize this in the extractor code (miswag_dbt_lineage/extractor.py).


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repo
git clone https://github.com/miswag/miswag-dbt-lineage.git
cd miswag-dbt-lineage

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

๐Ÿ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Built for the dbt community
  • Powered by sqlglot for SQL parsing
  • Inspired by dbt docs and various lineage visualization tools

๐Ÿ“ง Contact


โญ If you find this useful, please star the repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miswag_dbt_lineage-0.1.0.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miswag_dbt_lineage-0.1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file miswag_dbt_lineage-0.1.0.tar.gz.

File metadata

  • Download URL: miswag_dbt_lineage-0.1.0.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for miswag_dbt_lineage-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ce86b6d167de6d0de4d7f1a4a9381caa145e0fb74e42aebd14985c1ea6fcdee
MD5 09569ac03b50e75f0131ec11d87e7c30
BLAKE2b-256 3a94d3a606819b667f9bdc95d7188946bc1e057036ac55c99b8118b45c442808

See more details on using hashes here.

File details

Details for the file miswag_dbt_lineage-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for miswag_dbt_lineage-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19082b623cd8fb5eb923e472fbe79967d901fb4f84c2413966d08e46cce0be2b
MD5 4f033bec22225d6dd570d48cefd0d13b
BLAKE2b-256 c55964f2e5752d7f13d08cae473eaf687dc2fd8f823c17a2961246ff6df722b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page