Load DBT metadata into graph databases (Neo4j and FalkorDB)
Project description
DBT Graph Loader
Transform your DBT project's lineage and metadata into queryable knowledge graphs
DBT Graph Loader is a Python library that loads DBT (Data Build Tool) metadata into graph databases, enabling you to explore, query, and visualize your data lineage as an interactive knowledge graph.
🚀 Features
- 🔄 Multiple Graph Databases: Native support for Neo4j and FalkorDB
- 📊 Complete DBT Coverage: Models, sources, tests, macros, seeds, snapshots, and operations
- 🔗 Rich Relationships: Dependencies, references, macro usage, and test coverage mapping
- 📁 Flexible Input: Load from
manifest.jsonandcatalog.jsonfiles or strings - 🛠️ Easy CLI: Simple command-line interface for batch operations
- 🐍 Python API: Programmatic access for integration into data pipelines
- 📈 Graph Analytics: Built-in statistics and insights about your data lineage
- 🐳 Docker Ready: Easy containerization and deployment
📦 Installation
Using Poetry (Recommended)
poetry add dbt-graph-loader
Using pip
pip install dbt-graph-loader
Development Installation
# Clone the repository
git clone https://github.com/ponderedw/dbt-graph-loader.git
cd dbt-graph-loader
# Install with Poetry
poetry install
# Or with pip
pip install -e .
🎯 Quick Start
1. Generate DBT Metadata Files
First, ensure you have the required DBT files:
cd your-dbt-project
dbt compile # Generates manifest.json
dbt docs generate # Generates catalog.json (optional but recommended)
2. Load into Neo4j
# Using CLI
dbt-graph-loader neo4j \
--uri bolt://localhost:7687 \
--username neo4j \
--password your_password \
--manifest target/manifest.json \
--catalog target/catalog.json
3. Load into FalkorDB
# Using CLI
dbt-graph-loader falkordb \
--host localhost \
--port 6379 \
--graph-name my_dbt_lineage \
--manifest target/manifest.json \
--catalog target/catalog.json
📋 Supported DBT Resources
| Resource Type | Description | Properties Captured |
|---|---|---|
| Models | DBT models and their transformations | Materialization, dependencies, descriptions, tags |
| Sources | External data sources | Freshness rules, schemas, descriptions |
| Seeds | CSV files loaded as tables | File metadata, configurations |
| Snapshots | Slowly changing dimension tables | Strategies, unique keys, timestamps |
| Tests | Data quality tests | Severity levels, test parameters, attached nodes |
| Macros | Reusable SQL code blocks | Arguments, package info, usage patterns |
| Operations | Pre/post hooks and run operations | Execution context, dependencies |
🔗 Graph Relationships
The loader creates rich relationships between your DBT resources:
DEPENDS_ON: Direct dependencies between any resourcesREFERENCES: Model-to-model references viaref()functionsUSES_MACRO: Macro usage relationshipsTESTS: Test-to-resource relationships
🛠️ Usage
Command Line Interface
Neo4j Options
dbt-graph-loader neo4j --help
Options:
--uri TEXT Neo4j connection URI (required)
--username TEXT Neo4j username (required)
--password TEXT Neo4j password (required)
--manifest TEXT Path to manifest.json (required)
--catalog TEXT Path to catalog.json (optional)
FalkorDB Options
dbt-graph-loader falkordb --help
Options:
--host TEXT FalkorDB host (default: localhost)
--port INTEGER FalkorDB port (default: 6379)
--graph-name TEXT Graph name (default: dbt_graph)
--username TEXT FalkorDB username (optional)
--password TEXT FalkorDB password (optional)
--manifest TEXT Path to manifest.json (required)
--catalog TEXT Path to catalog.json (optional)
Python API
Neo4j Integration
from dbt_graph_loader.loaders.neo4j_loader import DBTNeo4jLoader
# Initialize the loader
loader = DBTNeo4jLoader(
neo4j_uri="bolt://localhost:7687",
username="neo4j",
password="your_password"
)
try:
# Load from files
loader.load_dbt_to_neo4j_from_files(
manifest_path="target/manifest.json",
catalog_path="target/catalog.json"
)
# View statistics
loader.get_graph_stats()
finally:
loader.close()
FalkorDB Integration
from dbt_graph_loader.loaders.falkordb_loader import DBTFalkorDBLoader
# Initialize the loader
loader = DBTFalkorDBLoader(
host="localhost",
port=6379,
graph_name="dbt_lineage",
username="your_username", # if auth enabled
password="your_password" # if auth enabled
)
try:
# Load from files
loader.load_dbt_to_falkordb(
manifest_path="target/manifest.json",
catalog_path="target/catalog.json"
)
# Load from strings (useful for APIs)
with open("target/manifest.json") as f:
manifest_str = f.read()
with open("target/catalog.json") as f:
catalog_str = f.read()
loader.load_dbt_to_falkordb_from_strings(manifest_str, catalog_str)
# View statistics
loader.get_graph_stats()
finally:
loader.close()
Convenience Functions
from dbt_graph_loader import load_to_neo4j, load_to_falkordb
# Simple Neo4j loading
load_to_neo4j(
uri="bolt://localhost:7687",
username="neo4j",
password="password",
manifest_path="target/manifest.json",
catalog_path="target/catalog.json"
)
# Simple FalkorDB loading
load_to_falkordb(
host="localhost",
port=6379,
graph_name="dbt_lineage",
manifest_path="target/manifest.json",
catalog_path="target/catalog.json"
)
🔍 Example Queries
Once your DBT metadata is loaded, you can query the graph using Cypher (Neo4j) or OpenCypher (FalkorDB).
Neo4j Cypher Examples
// Find all models that depend on a specific source
MATCH (m:Model)-[:DEPENDS_ON]->(s:Source {name: "raw_data.customers"})
RETURN m.name, m.materialized, m.description
// Get the complete downstream lineage from a model
MATCH path = (start:Model {name: "dim_customers"})-[:DEPENDS_ON*]->(downstream)
RETURN path
// Find models without any tests
MATCH (m:Model)
WHERE NOT EXISTS {
MATCH (t:Test)-[:TESTS]->(m)
}
RETURN m.name, m.schema, m.materialized
// Identify the most referenced models
MATCH (m:Model)<-[:REFERENCES]-(referencing)
RETURN m.name, count(referencing) as reference_count
ORDER BY reference_count DESC
LIMIT 10
// Find macro usage patterns
MATCH (m:Model)-[:USES_MACRO]->(macro:Macro)
RETURN macro.name, count(m) as usage_count
ORDER BY usage_count DESC
// Discover circular dependencies (if any)
MATCH path = (n)-[:DEPENDS_ON*]->(n)
WHERE length(path) > 1
RETURN path
FalkorDB OpenCypher Examples
// Models by materialization type
MATCH (m:Model)
RETURN m.materialized, count(m) as model_count
ORDER BY model_count DESC
// Source freshness analysis
MATCH (s:Source)
WHERE s.freshness_warn_after IS NOT NULL
RETURN s.name, s.freshness_warn_after, s.freshness_error_after
// Test coverage by schema
MATCH (m:Model)
OPTIONAL MATCH (t:Test)-[:TESTS]->(m)
RETURN m.schema,
count(m) as total_models,
count(t) as total_tests,
round(100.0 * count(t) / count(m), 2) as test_coverage_pct
ORDER BY test_coverage_pct DESC
🐳 Docker Integration
FastAPI Integration Example
from fastapi import FastAPI, UploadFile, File
from dbt_graph_loader.loaders.neo4j_loader import DBTNeo4jLoader
import os
app = FastAPI()
@app.post("/upload-dbt-metadata/")
async def upload_dbt_metadata(
manifest_file: UploadFile = File(...),
catalog_file: UploadFile = File(...)
):
manifest_content = await manifest_file.read()
catalog_content = await catalog_file.read()
loader = DBTNeo4jLoader(
neo4j_uri=os.getenv("NEO4J_URI"),
username=os.getenv("NEO4J_USERNAME"),
password=os.getenv("NEO4J_PASSWORD")
)
try:
loader.load_dbt_to_neo4j_from_strings(
manifest_content.decode('utf-8'),
catalog_content.decode('utf-8')
)
return {"status": "success", "message": "DBT metadata loaded"}
finally:
loader.close()
📊 Graph Schema
Node Properties
Models
unique_id,name,database,schema,materializeddescription,tags,package_name,path,enabledlanguage,checksum,access,relation_name
Sources
unique_id,name,source_name,identifierdatabase,schema,description,loaderfreshness_warn_after,freshness_error_after,columns
Tests
unique_id,name,column_name,severity,enabledtest_name,test_kwargs,package_name
Macros
unique_id,name,package_name,pathdescription,arguments
Seeds
unique_id,name,database,schema,pathdelimiter,materialized,enabled
Snapshots
unique_id,name,database,schema,strategyunique_key,updated_at,materialized
🧪 Development
Setup Development Environment
# Clone repository
git clone https://github.com/ponderedw/dbt-graph-loader.git
# Install dependencies
poetry install
# Build package
poetry build
📋 Prerequisites
For Neo4j
- Neo4j 4.0+ (local installation or cloud)
- Python 3.8+
For FalkorDB
- FalkorDB instance (Redis-compatible graph database)
- Python 3.8+
DBT Requirements
- DBT project with generated
manifest.json(required) - Generated
catalog.json(optional but recommended for richer metadata)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_graph_loader-0.1.0a2.tar.gz.
File metadata
- Download URL: dbt_graph_loader-0.1.0a2.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.18 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59539115e0788c5fab5608a87affa55c946022d8e29750eb0503be92b616acd3
|
|
| MD5 |
0a90070cbf5df8f8e0752662c6c35753
|
|
| BLAKE2b-256 |
b7144471a101969647644e3f09e3c0a65f3bae518f0c4a0262f6a89cbba9a060
|
File details
Details for the file dbt_graph_loader-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: dbt_graph_loader-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.18 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acd03d701515f2e8df24e8c8c3d565d013407a721ff75a1a540c2783563f33a2
|
|
| MD5 |
e4940fcc14b6499fa5f82491f64f52d4
|
|
| BLAKE2b-256 |
02df68489fb58b087b9a77073eaaad3960e07acd6b2ae2e010e65f7cc6da3a93
|