Skip to main content

Declarative knowledge graph modeling tool inspired by dbt

Project description

grai.build

Schema-as-code for graph databases - Documentation like dbt, migrations for Neo4j

CI codecov Python 3.11+ License: MIT Code style: black

๐Ÿ“˜ What is grai.build?

grai.build brings dbt's documentation experience to graph databases - define your schema in YAML, generate beautiful docs, and manage migrations.

It manages your graph schema, not your data. You define entities and relations in YAML, and grai.build:

  • โœ… Validates your schema for consistency
  • โœ… Generates Cypher constraints and indexes
  • โœ… Documents your graph structure automatically (like dbt docs)
  • โœ… Tracks lineage with interactive visualizations
  • โœ… Integrates with your CI/CD pipeline

What it's NOT:

  • โŒ Not an ETL tool (use Airflow, Prefect, or dbt for data loading)
  • โŒ Not a data transformation framework (dbt does this for SQL)
  • โŒ Not a replacement for your existing data infrastructure

Think of it as:

  • Like dbt: Declarative YAML definitions, beautiful documentation, lineage tracking
  • Like Alembic/Flyway: Database migrations and schema management
  • For graphs: Manages Neo4j schema while your pipelines handle data

๐Ÿš€ Quick Start

Installation

pip install grai-build

Create Your First Project

# Initialize a new project
grai init my-graph-project
cd my-graph-project

# Validate and build
grai build

# Generate documentation (like dbt docs)
grai docs --serve

# Deploy schema to Neo4j
grai run --uri bolt://localhost:7687 --user neo4j --password secret

# Load sample data for local testing
grai run --load-csv --password secret

๐Ÿ“‚ Project Structure

my-graph-project/
โ”œโ”€โ”€ grai.yml              # Project manifest
โ”œโ”€โ”€ entities/
โ”‚   โ”œโ”€โ”€ customer.yml      # Entity definitions
โ”‚   โ””โ”€โ”€ product.yml
โ”œโ”€โ”€ relations/
โ”‚   โ””โ”€โ”€ purchased.yml     # Relation definitions
โ””โ”€โ”€ target/               # Compiled output
    โ””โ”€โ”€ neo4j/
        โ””โ”€โ”€ compiled.cypher

๐Ÿ“ Example

Entity: entities/customer.yml

entity: customer
source: analytics.customers
keys: [customer_id]
properties:
  - name: customer_id
    type: string
  - name: name
    type: string
  - name: region
    type: string

Relation: relations/purchased.yml

relation: PURCHASED
from: customer
to: product
source: analytics.orders
mappings:
  from_key: customer_id
  to_key: product_id
properties:
  - name: order_id
    type: string
  - name: order_date
    type: datetime

Compile to Cypher

grai build

Output (target/neo4j/compiled.cypher):

// Create Customer nodes
MERGE (n:customer {customer_id: row.customer_id})
SET n.name = row.name,
    n.region = row.region;

// Create Product nodes
MERGE (n:product {product_id: row.product_id})
SET n.name = row.name;

// Create PURCHASED relations
MATCH (from:customer {customer_id: row.customer_id})
MATCH (to:product {product_id: row.product_id})
MERGE (from)-[r:PURCHASED]->(to)
SET r.order_id = row.order_id,
    r.order_date = row.order_date;

๐ŸŽฏ Features

  • Declarative modeling - Define your graph schema in YAML (like dbt models)
  • Schema validation - Catch errors before deployment
  • Documentation generation - Beautiful HTML docs with grai docs (like dbt docs generate/serve)
  • Lineage visualization - Interactive graph and Mermaid diagrams showing data flow
  • Multi-backend support - Start with Neo4j, expand to Gremlin later
  • CLI-first - Integrates into your CI/CD pipeline
  • Type-safe - Built with Pydantic for robust validation
  • Extensible - Easy to add custom backends and transformations

๐Ÿ—๏ธ Real-World Usage

Local Development

# 1. Define schema
vim entities/customer.yml

# 2. Validate
grai validate

# 3. Generate documentation
grai docs --serve  # Opens browser with interactive docs

# 4. Deploy schema
grai run --schema-only

# 5. Test with sample data
grai run --load-csv

Production Deployment

# .github/workflows/deploy-schema.yml
name: Deploy Graph Schema

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Validate Schema
        run: grai validate

      - name: Deploy to Production
        run: |
          grai run --schema-only \
            --uri ${{ secrets.NEO4J_URI }} \
            --user ${{ secrets.NEO4J_USER }} \
            --password ${{ secrets.NEO4J_PASSWORD }}

With Your ETL Pipeline

# Your Airflow DAG
from airflow import DAG
from airflow.operators.bash import BashOperator
from your_etl import load_customers_to_neo4j

dag = DAG('graph_pipeline')

# 1. grai.build ensures schema is up-to-date
deploy_schema = BashOperator(
    task_id='deploy_schema',
    bash_command='grai run --schema-only',
    dag=dag
)

# 2. Your ETL loads the actual data
load_data = PythonOperator(
    task_id='load_data',
    python_callable=load_customers_to_neo4j,
    dag=dag
)

deploy_schema >> load_data

๐Ÿ“ฆ Architecture

grai/
โ”œโ”€โ”€ cli/              # Typer-based CLI commands
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ models.py     # Pydantic models (Entity, Relation, Property)
โ”‚   โ”œโ”€โ”€ parser/       # YAML โ†’ Python models
โ”‚   โ”œโ”€โ”€ validator/    # Schema validation
โ”‚   โ”œโ”€โ”€ compiler/     # Generate Cypher/Gremlin
โ”‚   โ”œโ”€โ”€ loader/       # Execute against databases
โ”‚   โ””โ”€โ”€ utils/        # Shared utilities
โ””โ”€โ”€ templates/        # Project templates

๐Ÿงช Development

Setup

# Clone the repo
git clone https://github.com/asantora05/grai.build.git
cd grai.build

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black grai/
ruff check grai/

๐Ÿ“– Documentation

Generate beautiful, interactive documentation for your graph:

# Generate and serve documentation locally
grai docs --serve

# Generate to custom directory
grai docs --output ./my-docs

# Just generate (don't serve)
grai docs

The documentation includes:

  • ๐Ÿ“Š Project overview with stats
  • ๐Ÿ“ฆ Entity catalog with properties
  • ๐Ÿ”— Relation catalog with mappings
  • ๐Ÿ•ธ๏ธ Interactive graph visualization (D3.js)
  • ๐Ÿ”„ Lineage diagrams (Mermaid.js)

For development guidance, check out the instructions.

๐Ÿ—บ๏ธ Roadmap

  • Core Pydantic models
  • YAML parser
  • Schema validator
  • Cypher compiler
  • Neo4j loader
  • CLI commands (init, build, validate, run, docs)
  • Graph IR export (JSON)
  • Documentation generation (dbt-style)
  • Lineage visualization (Mermaid + D3.js)
  • Schema versioning and migrations
  • Graph visualization improvements
  • Gremlin backend support
  • Incremental sync

๐Ÿ“Š Current Status

v0.3.2 - Feature-complete MVP with migrations

  • โœ… Core Models - Pydantic models for Entity, Relation, Property
  • โœ… YAML Parser - Parse and load entity/relation definitions
  • โœ… Schema Validator - Validate references and mappings
  • โœ… Cypher Compiler - Generate Neo4j constraints and indexes
  • โœ… Neo4j Loader - Execute Cypher against Neo4j instances
  • โœ… Documentation Generator - Interactive HTML docs (like dbt docs)
  • โœ… Lineage Tracking - Visualize data flow and dependencies
  • โœ… Graph Visualizer - D3.js and Cytoscape visualizations
  • โœ… Build Cache - Incremental builds for faster iteration
  • โœ… CLI Commands - Full command suite (init, build, validate, run, docs, etc.)

257 tests passing | High coverage across all modules

See it in action:

# Initialize example project
grai init my-project
cd my-project

# Generate and view documentation
grai docs --serve

๐Ÿค Contributing

Contributions are welcome! This is an early-stage project, so there's plenty of room for improvement.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ’ก Inspiration

This project is inspired by:

  • dbt - Analytics engineering workflow
  • SQLMesh - Data transformation framework
  • Amundsen - Data discovery and metadata

Built with โค๏ธ for the graph database community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grai_build-0.4.1.tar.gz (174.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grai_build-0.4.1-py3-none-any.whl (67.0 kB view details)

Uploaded Python 3

File details

Details for the file grai_build-0.4.1.tar.gz.

File metadata

  • Download URL: grai_build-0.4.1.tar.gz
  • Upload date:
  • Size: 174.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grai_build-0.4.1.tar.gz
Algorithm Hash digest
SHA256 5b75aa8b464e07d0ad809847b26c5a59abd7fb4f92e50f4d8b263a544a0e50b0
MD5 b6fedf0c0be3d326f1d1dca7274eb5e6
BLAKE2b-256 91ce2518325cd1a78ca0d4d0a7471247baf0d5decfbf620de6a18c37a9285796

See more details on using hashes here.

File details

Details for the file grai_build-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: grai_build-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 67.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grai_build-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8ec8aaa82d94ea3fc69f7e38f29b9164cfb7ff3933bf1115e6a864df4945379e
MD5 e3ebddbf433662def7f6c34f3b9898e9
BLAKE2b-256 8dea77a1ae30903c6dc647ea006eaa42e6e3319f72a78d3e2253e2f82ffa6cbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page