Declarative knowledge graph modeling tool inspired by dbt
Project description
grai.build
Schema-as-code for graph databases - Documentation like dbt, migrations for Neo4j
๐ What is grai.build?
grai.build brings dbt's documentation experience to graph databases - define your schema in YAML, generate beautiful docs, and manage migrations.
It manages your graph schema, not your data. You define entities and relations in YAML, and grai.build:
- โ Validates your schema for consistency
- โ Generates Cypher constraints and indexes
- โ
Documents your graph structure automatically (like
dbt docs) - โ Tracks lineage with interactive visualizations
- โ Integrates with your CI/CD pipeline
What it's NOT:
- โ Not an ETL tool (use Airflow, Prefect, or dbt for data loading)
- โ Not a data transformation framework (dbt does this for SQL)
- โ Not a replacement for your existing data infrastructure
Think of it as:
- Like dbt: Declarative YAML definitions, beautiful documentation, lineage tracking
- Like Alembic/Flyway: Database migrations and schema management
- For graphs: Manages Neo4j schema while your pipelines handle data
๐ Quick Start
Installation
pip install grai-build
Create Your First Project
# Initialize a new project
grai init my-graph-project
cd my-graph-project
# Validate and build
grai build
# Generate documentation (like dbt docs)
grai docs --serve
# Deploy schema to Neo4j
grai run --uri bolt://localhost:7687 --user neo4j --password secret
# Load sample data for local testing
grai run --load-csv --password secret
๐ Project Structure
my-graph-project/
โโโ grai.yml # Project manifest
โโโ entities/
โ โโโ customer.yml # Entity definitions
โ โโโ product.yml
โโโ relations/
โ โโโ purchased.yml # Relation definitions
โโโ target/ # Compiled output
โโโ neo4j/
โโโ compiled.cypher
๐ Example
Entity: entities/customer.yml
entity: customer
source: analytics.customers
keys: [customer_id]
properties:
- name: customer_id
type: string
- name: name
type: string
- name: region
type: string
Relation: relations/purchased.yml
relation: PURCHASED
from: customer
to: product
source: analytics.orders
mappings:
from_key: customer_id
to_key: product_id
properties:
- name: order_id
type: string
- name: order_date
type: datetime
Compile to Cypher
grai build
Output (target/neo4j/compiled.cypher):
// Create Customer nodes
MERGE (n:customer {customer_id: row.customer_id})
SET n.name = row.name,
n.region = row.region;
// Create Product nodes
MERGE (n:product {product_id: row.product_id})
SET n.name = row.name;
// Create PURCHASED relations
MATCH (from:customer {customer_id: row.customer_id})
MATCH (to:product {product_id: row.product_id})
MERGE (from)-[r:PURCHASED]->(to)
SET r.order_id = row.order_id,
r.order_date = row.order_date;
๐ฏ Features
- Declarative modeling - Define your graph schema in YAML (like dbt models)
- Schema validation - Catch errors before deployment
- Documentation generation - Beautiful HTML docs with
grai docs(likedbt docs generate/serve) - Lineage visualization - Interactive graph and Mermaid diagrams showing data flow
- Multi-backend support - Start with Neo4j, expand to Gremlin later
- CLI-first - Integrates into your CI/CD pipeline
- Type-safe - Built with Pydantic for robust validation
- Extensible - Easy to add custom backends and transformations
๐๏ธ Real-World Usage
Local Development
# 1. Define schema
vim entities/customer.yml
# 2. Validate
grai validate
# 3. Generate documentation
grai docs --serve # Opens browser with interactive docs
# 4. Deploy schema
grai run --schema-only
# 5. Test with sample data
grai run --load-csv
Production Deployment
# .github/workflows/deploy-schema.yml
name: Deploy Graph Schema
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Validate Schema
run: grai validate
- name: Deploy to Production
run: |
grai run --schema-only \
--uri ${{ secrets.NEO4J_URI }} \
--user ${{ secrets.NEO4J_USER }} \
--password ${{ secrets.NEO4J_PASSWORD }}
With Your ETL Pipeline
# Your Airflow DAG
from airflow import DAG
from airflow.operators.bash import BashOperator
from your_etl import load_customers_to_neo4j
dag = DAG('graph_pipeline')
# 1. grai.build ensures schema is up-to-date
deploy_schema = BashOperator(
task_id='deploy_schema',
bash_command='grai run --schema-only',
dag=dag
)
# 2. Your ETL loads the actual data
load_data = PythonOperator(
task_id='load_data',
python_callable=load_customers_to_neo4j,
dag=dag
)
deploy_schema >> load_data
๐ฆ Architecture
grai/
โโโ cli/ # Typer-based CLI commands
โโโ core/
โ โโโ models.py # Pydantic models (Entity, Relation, Property)
โ โโโ parser/ # YAML โ Python models
โ โโโ validator/ # Schema validation
โ โโโ compiler/ # Generate Cypher/Gremlin
โ โโโ loader/ # Execute against databases
โ โโโ utils/ # Shared utilities
โโโ templates/ # Project templates
๐งช Development
Setup
# Clone the repo
git clone https://github.com/asantora05/grai.build.git
cd grai.build
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black grai/
ruff check grai/
๐ Documentation
Generate beautiful, interactive documentation for your graph:
# Generate and serve documentation locally
grai docs --serve
# Generate to custom directory
grai docs --output ./my-docs
# Just generate (don't serve)
grai docs
The documentation includes:
- ๐ Project overview with stats
- ๐ฆ Entity catalog with properties
- ๐ Relation catalog with mappings
- ๐ธ๏ธ Interactive graph visualization (D3.js)
- ๐ Lineage diagrams (Mermaid.js)
For development guidance, check out the instructions.
๐บ๏ธ Roadmap
- Core Pydantic models
- YAML parser
- Schema validator
- Cypher compiler
- Neo4j loader
- CLI commands (
init,build,validate,run,docs) - Graph IR export (JSON)
- Documentation generation (dbt-style)
- Lineage visualization (Mermaid + D3.js)
- Graph visualization improvements
- Gremlin backend support
- Incremental sync
- Schema versioning and migrations
๐ Current Status
v0.3.0 - Feature-complete MVP with documentation
- โ Core Models - Pydantic models for Entity, Relation, Property
- โ YAML Parser - Parse and load entity/relation definitions
- โ Schema Validator - Validate references and mappings
- โ Cypher Compiler - Generate Neo4j constraints and indexes
- โ Neo4j Loader - Execute Cypher against Neo4j instances
- โ Documentation Generator - Interactive HTML docs (like dbt docs)
- โ Lineage Tracking - Visualize data flow and dependencies
- โ Graph Visualizer - D3.js and Cytoscape visualizations
- โ Build Cache - Incremental builds for faster iteration
- โ
CLI Commands - Full command suite (
init,build,validate,run,docs, etc.)
257 tests passing | High coverage across all modules
See it in action:
# Initialize example project
grai init my-project
cd my-project
# Generate and view documentation
grai docs --serve
๐ค Contributing
Contributions are welcome! This is an early-stage project, so there's plenty of room for improvement.
๐ License
MIT License - see LICENSE for details.
๐ก Inspiration
This project is inspired by:
- dbt - Analytics engineering workflow
- SQLMesh - Data transformation framework
- Amundsen - Data discovery and metadata
Built with โค๏ธ for the graph database community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grai_build-0.3.2.tar.gz.
File metadata
- Download URL: grai_build-0.3.2.tar.gz
- Upload date:
- Size: 73.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e3eb646752e726bade76e5bd38af8db478498a3f2b622ba784676345b44d2fc
|
|
| MD5 |
134032a0a9d433ab4f8e71e7fe5cb19d
|
|
| BLAKE2b-256 |
e3a5dc6247b0d4f686941bb617b0824e3b8de1c6b9fcb4a8644e05bdc8dbc47e
|
File details
Details for the file grai_build-0.3.2-py3-none-any.whl.
File metadata
- Download URL: grai_build-0.3.2-py3-none-any.whl
- Upload date:
- Size: 55.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0edfeb8afca1bb9c4422ce2ebf535bd8907ac52685dbd9768adc15fe42cf8ce
|
|
| MD5 |
c558c0d1af2fbf21a6335d8c428db879
|
|
| BLAKE2b-256 |
40f4a4fe203f3fd473b466420652904a01ae410893bf68fe689d9727ca88fc71
|