Extract dependency graphs from dbt (including model and column dependencies) and convert them to Cypher queries
Project description
dbt-to-cypher
Extract dependency graphs from dbt projects (including model and column dependencies) and convert them to Cypher queries.
These queries can be used for graph database visualization and analysis in Neo4j or any other Cypher-compatible database.
Purpose
This tool allows data engineers, analysts, and data scientists to gain insight into their data lineage and dependencies at both the model and column levels through a graph-based approach, improving data understanding and governance.
Features
- Model-level dependencies: Extract relationships between dbt models.
- Column-level lineage: Track data flow at the column level
- Cypher generation: Convert dependency graphs to Neo4j Cypher queries
- CLI tool: Easy command-line interface for quick conversions
- Library API: Programmatic access for integration into your workflows
Use Cases and Practical Applications
The dbt-to-cypher library is designed to provide actionable insights into data lineage and dependencies within dbt projects. It is particularly useful in large organizations or complex data pipelines where understanding relationships between models and columns is critical for data governance, debugging, and reporting. Some practical use cases include:
1. Data Lineage Visualization
Data engineers often struggle to understand how data flows through multiple layers of transformations in a dbt project. With dbt-to-cypher, you can generate Cypher scripts that create nodes and relationships in Neo4j, allowing you to visualize the full dependency graph. This visualization can help identify:
- Bottlenecks in your data pipeline
- Circular dependencies
- Unused or redundant models
- Models with high fan-in or fan-out dependencies
A clear visual representation of your data lineage can also help teams onboard new members more quickly and document complex pipelines effectively.
2. Impact Analysis
When making changes to dbt models, it is crucial to understand the potential impact on downstream models. By using dbt-to-cypher to generate a complete dependency graph, you can quickly query Neo4j to identify all downstream nodes that will be affected by a particular change. This capability is especially useful for:
- Regression testing
- Change management in production environments
- Risk assessment before deploying updates
Impact analysis reduces the risk of introducing errors and ensures that changes propagate safely through your pipeline.
3. Column-Level Lineage
In addition to model-level dependencies, dbt-to-cypher tracks column-level transformations. This allows you to:
- Understand how individual fields propagate through the pipeline
- Identify sensitive data flows for compliance purposes
- Trace the origin of a specific value or aggregated metric
Column-level lineage is critical for organizations dealing with regulated data or complex reporting structures, where understanding the precise flow of information is required for audits.
4. Integration with BI and Analytics Tools
Once the dependency graph is stored in Neo4j, it can be queried programmatically or connected to BI tools to generate reports, dashboards, or automated alerts. For example:
- Automatically notify analysts when upstream data changes
- Generate dependency tables for documentation
- Integrate lineage data into metrics tracking and observability pipelines
This makes it easier to maintain high-quality, trustworthy data across the organization.
5. Continuous Integration and Governance
dbt-to-cypher can be integrated into CI/CD pipelines. Every time new models are added or modified, the library can regenerate the Cypher scripts and update the graph database automatically. This ensures that your data lineage is always up-to-date and provides a single source of truth for governance and auditing. Benefits include:
- Automated checks for model and column consistency
- Improved collaboration between data engineering, analytics, and governance teams
- Faster onboarding for new projects or team members
This workflow enhances both reliability and transparency across data operations.
Example Scenario
Consider a retail analytics team tracking sales metrics across multiple regions. Using dbt-to-cypher, they can:
- Parse dbt models and catalog to generate a dependency graph.
- Load the graph into Neo4j for visualization.
- Identify which models contribute to monthly sales reports.
- Trace specific columns to their source data for validation.
- Update their CI/CD pipeline to regenerate lineage whenever dbt models are updated.
This approach not only simplifies debugging but also strengthens the organization’s confidence in its data quality and reporting accuracy.
Installation
Using uv (recommended):
uv pip install dbt-to-cypher
Using pip:
pip install dbt-to-cypher
Quick Start
Command Line
# Basic usage
dbt-to-cypher /path/to/dbt/project
# Save output to file
dbt-to-cypher /path/to/dbt/project -o output.cypher
Python API
The dbt_to_cypher module provides a high-level API for programmatic access:
from dbt_to_cypher import extract_dbt_project
cypher_script = extract_dbt_project(
project_path="/path/to/dbt/project",
output_path="output.cypher", # Optional
)
print(cypher_script)
Development
See DEVELOPMENT.md for development setup instructions.
Quick Development Setup
# Install uv
pip install uv
# Create virtual environment and install dependencies
uv venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
uv pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
Project Structure
dbt-to-cypher/
├── src/
│ └── dbt_to_cypher/
│ ├── __init__.py # Package initialization and exports
│ ├── dbt_to_cypher.py # Core API for dbt-to-cypher conversion
│ ├── extractor.py # dbt dependency extraction
│ ├── graph.py # Dependency graph management
│ ├── cypher.py # Cypher query generation
│ └── cli.py # Command-line interface
├── tests/ # Test suite
├── pyproject.toml # PEP 621 project metadata
├── .pre-commit-config.yaml # Pre-commit hooks configuration
└── README.md
Core Modules
- dbt_to_cypher.py: Main API module providing high-level functions for the conversion pipeline
- extractor.py: Parses dbt
manifest.jsonandcatalog.jsonto extract model and column-level dependencies - graph.py: Builds and manages a NetworkX-based dependency graph with models and columns as nodes
- cypher.py: Generates Neo4j Cypher CREATE statements from the dependency graph
Requirements
- Python 3.9+
- dbt project with generated
manifest.jsonandcatalog.json(rundbt compileanddbt docs generatefirst)
FAQ and Troubleshooting
Q1: Why do I get errors when running dbt-to-cypher?
Common causes include missing manifest.json or catalog.json files. Make sure you run dbt compile and dbt docs generate in your project before using this library.
Q2: Can I use dbt-to-cypher with partial dbt projects?
Yes, but some features like column-level lineage require complete model definitions. For partial projects, only model-level dependencies may be generated.
Q3: Installation issues on Windows or Mac
Ensure your virtual environment is properly activated. Use the recommended uv venv workflow or a standard Python venv. Some OSes may require python3 -m venv .venv and activation through .venv\Scripts\activate (Windows) or source .venv/bin/activate (Linux/Mac).
Q4: Performance tips
For large dbt projects, generating full column-level graphs can be resource-intensive. Consider filtering models or columns to focus on critical pipelines or using Neo4j’s batch import features for faster ingestion.
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Release a new version in PyPi
Update version name in pyproject.toml, and create a new release in Github to release a new version.
The new release will need to be in main branch only.
It will automatically publish to PyPi.
Roadmap
- Support for dbt sources and seeds
- Export to additional graph formats (GraphML, DOT)
- Support extracting model and column dependencies alone
- Support direct population of dependencies into a graph database (preferably Neo4j)
- Interactive visualization tools
- Support for dbt metrics and exposures
Author
laphoang
Acknowledgments
Built with modern Python packaging standards (PEP 621), uv for environment management, and best practices for Python libraries and data engineering workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_to_cypher-0.1.1.tar.gz.
File metadata
- Download URL: dbt_to_cypher-0.1.1.tar.gz
- Upload date:
- Size: 86.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e64b5530add9163a0d6dd39ddbb67eabb127ed0ed8ac061fc236647440f905e
|
|
| MD5 |
4d64537e1584466cb7a52dbf4707e212
|
|
| BLAKE2b-256 |
7aa127ade69cfe95f0493bb72f27eb2cac1f24f1ee14320d8d5c146d7f2159a4
|
File details
Details for the file dbt_to_cypher-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dbt_to_cypher-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c3beb33be0dc6704fd0a6ec2959324b5653832e213e3ccf404d9a8d9253400b
|
|
| MD5 |
b6091371b4c56622666c8c86162cbb73
|
|
| BLAKE2b-256 |
7d760b874f0f53c02872e3e4e12fe75fadb5626ceb56dce9e630a592a27a536f
|