Skip to main content

A sophisticated SQL lineage visualization tool for Medallion Architectures.

Project description

SQL DAG Flow

"Static Data Lineage for Modern Data Engineers. No databases, just code."

SQL DAG Flow is a lightweight, open-source Python library designed to transform your SQL code into visual architecture.

Unlike traditional lineage tools that require active database connections or query log access, SQL DAG Flow performs static analysis (parsing) of your local .sql files. This allows for instant, secure dependency visualization, bottleneck identification, and Data Lineage documentation without leaving your development environment.

Specially optimized for the Medallion Architecture (Bronze, Silver, Gold) and modern stacks (DuckDB, BigQuery, Snowflake), it bridges the gap between the code you write and the architecture you design.

💡 Philosophy: Why this exists

  • Local-First & Zero-Config: You don't need to configure servers, cloud credentials, or Docker containers. If you have SQL files, you have a diagram.
  • Security by Design: By relying on static analysis, your code never leaves your machine and no access to sensitive production data is required.
  • Living Documentation: The diagram is generated from the code. If the code changes, the documentation updates, eliminating obsolete manually-drawn diagrams.

🎯 Objectives & Use Cases

  • 1. Legacy Code Audit & Refactoring:
    • The Problem: You join a new project with 200+ undocumented SQL scripts. Nobody knows what breaks what.
    • The Solution: Run sql-dag-flow to instantly map the "spaghetti" dependencies. Identify orphan tables, circular dependencies, and the impact of changing a Silver layer table.
  • 2. Automated Architecture Documentation:
    • The Problem: Architecture diagrams in Lucidchart or Visio are always outdated.
    • The Solution: Generate interactive pipeline visualizations (ETL/ELT) to include in your Pull Requests, Wikis, or client deliverables.
  • 3. Medallion Architecture Validation:
    • The Problem: It's hard to verify if the logical separation of layers (Bronze → Silver → Gold) is being respected.
    • The Solution: The tool visually groups your scripts by folder structure, allowing you to validate that data flows correctly between quality layers without improper "jumps".
  • 4. Accelerated Onboarding:
    • The Problem: Explaining data flow to new engineers takes hours of whiteboard drawing.
    • The Solution: Deliver an interactive map where new team members can explore where data comes from, view associated SQL code, and understand business logic without reading thousands of lines of code.

SQL DAG Flow Screenshot

🚀 Key Features

  • Automatic Parsing & Visualization: Recursively scans your project folders to find .sql files and detect dependencies (FROM, JOIN, CTEs) using sqlglot.

  • Medallion Architecture Support: Automatically categorizes and colors nodes based on folder structure:

    • 🟤 Bronze: Raw ingestion layers.
    • Silver: Cleaned and conformed data.
    • 🟡 Gold: Business-level aggregates.
  • Smart Folder Selection:

    • Selective Exploration: Choose specific subfolders to analyze using an interactive tree view.
    • Deep Filtering: Focus only on relevant parts of your pipeline.
  • Advanced Organization:

    • Selection Toolbar: Multi-select nodes and align them horizontally/vertically.
    • Node Hiding: Hide specific nodes or entire trees to declutter the view.
    • Auto Layout: Automatically arrange nodes using Dagre layout engine.
  • Configuration Management:

    • Save & Load: Persist your layouts, hidden nodes, and viewport settings to JSON.
    • Workspaces: manage multiple project configurations.
  • Rich Metadata:

    • Details Panel: View full SQL content and schema details.
    • Annotations: Add sticky notes with Markdown support, resize them, and create visual groups.
  • Visual Cues:

    • Solid Border: Indicates a Table.
    • Dashed Border: Indicates a View (auto-detected).
  • Premium UI/UX:

    • Dark/Light Modes: Themed for your preference.
    • Export: Save as high-resolution PNG or vector SVG.
  • Discovery Mode (New 🔍):

    • Visualize Missing Files: Detects dependencies referenced in your SQL that don't satisfy the parser (e.g. valid external tables or missing files).
    • Ghost Nodes: These appear as "External" (Orange) nodes.
    • Quick Creation: Right-click any ghost node to instantly create the corresponding SQL file with a pre-filled template.
  • Enhanced Sidebar:

    • Grouped View: Nodes organized by type (Standard, External, CTE).
    • Usage Counts: See incoming dependency counts at a glance.
    • Quick Navigation: Click any node to center the graph on it.
  • CTE Visualization:

    • Internal Dependencies: Detects and visualizes Common Table Expressions (CTEs) within your SQL.
    • Code Transparency: Select a CTE node to view its specific SQL definition formatted in the Details Panel.

🌍 Supported Dialects

Powered by sqlglot, supporting:

  • BigQuery (Default)
  • Snowflake
  • PostgreSQL
  • Spark / Databricks
  • Amazon Redshift
  • DuckDB
  • MySQL
  • ...and more.

📦 Installation

Install easily via pip:

pip install sql-dag-flow

To update to the latest version:

pip install --upgrade sql-dag-flow

▶️ Usage

1. Command Line Interface (CLI)

You can run the tool directly from your terminal:

# Analyze the current directory
sql-dag-flow

# Analyze a specific SQL project
sql-dag-flow /path/to/my/dbt_project

2. Python API

Integrate it into your Python scripts or notebooks:

from sql_dag_flow import start

# Start the server and open the browser
start(directory="./my_sql_project")

📂 Project Structure Expectations

SQL DAG Flow is opinionated but flexible. It looks for standard Medallion Architecture naming conventions to assign colors:

  • Bronze Layer: Any folder named bronze, raw, landing, or staging.
  • Silver Layer: Any folder named silver, intermediate, or conformed.
  • Gold Layer: Any folder named gold, mart, serving, or presentation.
  • Other: Any other folder is categorized as "Other" (Gray).

🎨 Visual Legend & Color Palettes

SQL DAG Flow uses distinct colors to identify node types. You can switch between these palettes in the Settings.

Node Type Layer / Meaning Standard Vivid Pastel
Bronze Raw Ingestion 🟤 Brown (#8B4513) 🟠 Orange (#FF5722) 🟤 Pale Brown (#D7CCC8)
Silver Cleaned / Conformed ⚪ Gray (#708090) 🔵 Blue (#29B6F6) ⚪ Blue Grey (#CFD8DC)
Gold Business Aggregates 🟡 Gold (#DAA520) 🟡 Yellow (#FFEB3B) 🟡 Pale Yellow (#FFF9C4)
External Missing / Ghost Node 🟠 Dark Orange (#D35400) 🟠 Neon Orange (#FF9800) 🟠 Peach (#FFE0B2)
CTE Internal Common Table Expression 💖 Pink (#E91E63) 💗 Deep Pink (#F50057) 🌸 Light Pink (#F8BBD0)
Other Uncategorized 🔵 Teal (#4CA1AF) 💠 Cyan (#00BCD4) 🧊 Pale Cyan (#B2EBF2)

🛠️ Configuration & Customization

Settings

Click the Settings (Gear) icon in the bottom toolbar to:

  • Change SQL Dialect: Ensure your specific SQL syntax is parsed correctly.
  • Toggle Node Style: Switch between "Full" (colored body) and "Minimal" (colored border) styles.
  • Change Palette: Switch between Standard, Vivid, and Pastel color palettes.

Saving Layouts

Your graph layout (positions, hidden nodes) is not permanent by default. To save your work:

  1. Click Save in the top bar.
  2. Choose a filename (e.g., marketing_flow.json).
  3. Next time, click Load to restore that exact view.

🤝 Contributing

Contributions are welcome!

  1. Fork the repository.
  2. Create a feature branch.
  3. Submit a Pull Request.

Created by Flavio Sandoval

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_dag_flow-0.1.9.tar.gz (452.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_dag_flow-0.1.9-py3-none-any.whl (451.1 kB view details)

Uploaded Python 3

File details

Details for the file sql_dag_flow-0.1.9.tar.gz.

File metadata

  • Download URL: sql_dag_flow-0.1.9.tar.gz
  • Upload date:
  • Size: 452.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sql_dag_flow-0.1.9.tar.gz
Algorithm Hash digest
SHA256 fe95a02783899e051044cd758f566068256f98e05fc797e4997db3aaa9e53e0b
MD5 04eff5a74671619d121c00e118677271
BLAKE2b-256 ae4a59357d9e1545ea2bb58d961f93bc4ed92faacd98207a439c13204da96791

See more details on using hashes here.

File details

Details for the file sql_dag_flow-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: sql_dag_flow-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 451.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sql_dag_flow-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4ca190a461e43e685a33bfe93edc756139ed732697caedaf66468e3ad656931b
MD5 6db795385a5a7a9ddb13acd1c4dea5fb
BLAKE2b-256 5b5cfa293cd44b362e63e7bdd70ae79b26ec3b092e9eae01cce0226828a26dde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page