CLI-first Django-based engine for Data Vault modeling and dbt project generation
Project description
TurboVault Engine
🎯 What is TurboVault Engine?
TurboVault Engine is a CLI-first, Django-based automation engine that accelerates Data Vault 2.0 implementations. It:
- Ingests source metadata from Excel files, database catalogs, or previously exported JSON files
- Maps metadata into a consistent Data Vault domain model (Hubs, Links, Satellites)
- Generates complete, production-ready dbt projects with datavault4dbt macros
- Validates your model before generation with comprehensive error checking
Perfect for: Data Engineers looking to rapidly prototype, standardize, or automate their Data Vault implementations.
✨ Key Features
🏗️ Complete dbt Project Generation
- Automatic model generation - SQL models with datavault4dbt macros
- YAML schemas - Complete dbt documentation for all models
- Organized structure - Clean folder hierarchy (staging, raw_vault, business_vault)
- Template customization - Customize any template via Django Admin
- Validation - Pre-generation checks to catch errors early
📦 Data Vault Modeling
- Hubs - Standard and reference hubs with business keys
- Links - Standard and non-historized links connecting multiple hubs
- Satellites - Standard, multi-active, non-historized, effectivity, and reference satellites
- PITs - Point-in-Time table generation
- Reference Tables - Reference data modeling
- Snapshot Controls - Configurable snapshot logic for temporal tracking
🔧 Source Management
- Source Systems - Define database schemas and connections
- Source Tables - Map physical tables with record source and load date
- Prejoins - Cross-table joins for complex link mappings
- Stage Models - Automatic staging layer with hashkeys and hashdiffs
🖥️ Developer Experience
- Modern CLI - Built with Typer and Rich for beautiful terminal output
- Web Initializer - Interactive, multi-step project creation wizard
- Django Admin - Full web interface for model and template management
- Config-Driven - YAML configuration for automation and CI/CD
- Comprehensive Testing - pytest test suite with 20+ tests
🚀 Quick Start
Prerequisites
- Python 3.12+
- pip (Python package manager)
- (Optional) Database drivers if using external databases:
- PostgreSQL:
psycopg2-binary - MySQL:
mysqlclient - SQL Server:
mssql-django - Oracle:
cx_Oracle - Snowflake:
django-snowflake
- PostgreSQL:
Installation
Install from PyPI:
pip install turbovault-engine
Install directly from GitHub (latest development version):
pip install git+https://github.com/ScalefreeCOM/turbovault-engine.git
We recommend installing into a dedicated virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install turbovault-engine
Initialize Your Workspace & First Project
TurboVault uses a two-step setup. First, create and enter a dedicated folder for your workspace:
mkdir my-turbovault-workspace
cd my-turbovault-workspace
Step 1 — Initialise the workspace (once per directory):
# Interactive (recommended for first time)
turbovault workspace init
# Or fully non-interactive:
turbovault workspace init \
--db-engine sqlite3 --db-name db.sqlite3 \
--stage-schema stage --rdv-schema rdv \
--admin-username admin --admin-password changeme --admin-email admin@example.com
This creates turbovault.yml, initialises the database, runs all migrations, and populates default templates.
Step 2 — Create a project (once per project):
# Interactive wizard
turbovault project init --interactive
# Non-interactive with flags (great for CI/scripts)
turbovault project init --name my_project --source ./metadata.xlsx \
--stage-schema stage --rdv-schema rdv
# Import from a previously exported JSON file (round-trip)
turbovault project init --name my_project --source ./exports/model.json
# Or from a per-project config file
turbovault project init --config config.example.yml
This creates projects/my_project/config.yml and the projects/my_project/exports/ folder.
Populate and Maintain your Data Vault model
You can check, define, and change your Data Vault model via the Django Admin interface. To launch the web interface:
# Launch the web interface
turbovault serve
Sign in via the credentials you set up during workspace initialization.
Generate Your dbt Project
# Generate dbt project from your Data Vault model
turbovault generate --project my_project
# Generate with custom output path
turbovault generate --project my_project --output ./my_dbt
# Generate with ZIP archive
turbovault generate --project my_project --zip
# Skip satellite v1 views
turbovault generate --project my_project --no-v1-satellites
📋 CLI Commands
| Command | Description |
|---|---|
turbovault workspace init |
Initialise directory as a workspace (creates turbovault.yml + DB) |
turbovault workspace status |
Show workspace health (DB, projects, migrations) |
turbovault project init |
Create a new project in the workspace |
turbovault project list |
List all projects in the workspace |
turbovault generate |
Generate dbt project or export model to JSON / DBML |
turbovault serve |
Start Django admin server for model management |
turbovault reset |
Reset the database |
turbovault --help |
Show all available commands |
Command Examples
# --- Workspace ---
# Initialise workspace (non-interactive)
turbovault workspace init --db-engine sqlite3 --db-name db.sqlite3 \
--stage-schema stage --rdv-schema rdv
# Check workspace health
turbovault workspace status
# --- Projects ---
# Interactive project creation
turbovault project init --interactive
# Create from YAML config
turbovault project init --config config.yml
# List all projects
turbovault project list
# --- Generation ---
# Generate dbt project with validation
turbovault generate --project sales_datavault
# Generate in lenient mode (skip invalid entities)
turbovault generate --project sales_datavault --mode lenient
# Generate with ZIP and no v1 satellites
turbovault generate -p sales_datavault --zip --no-v1-satellites
# Export Data Vault model to JSON
turbovault generate --type json --project sales_datavault
# Start admin on custom port
turbovault serve --port 9000
🗄️ Domain Model
TurboVault Engine uses a comprehensive Data Vault domain model:
Core Entities
| Entity | Description |
|---|---|
| Project | Top-level container for all metadata |
| Group | Logical grouping for organizing entities into subfolders |
| Source System | Database/schema source definitions |
| Source Table | Physical source tables with metadata |
| Hub | Data Vault hubs (standard or reference) |
| Link | Relationships between hubs (standard or non-historized) |
| Satellite | Descriptive attributes for hubs/links (6 types) |
| PIT | Point-in-Time tables for temporal joins |
| Reference Table | Reference data structures |
| Snapshot Control | Temporal snapshot configuration |
Advanced Features
- Prejoins - Define cross-table joins for link mappings
- Multi-source support - Multiple sources feeding the same entity
- Satellite variants - Standard, multi-active, effectivity, non-historized, reference, record-tracking
- Template customization - All SQL and YAML templates customizable via Admin
⚙️ Configuration
TurboVault uses two config files with clearly separated responsibilities:
{workspace}/
├── turbovault.yml ← workspace-level: database, global defaults
└── projects/
└── my_project/
└── config.yml ← project-level: schemas, naming patterns, output
turbovault.yml — Workspace Config
Created once by turbovault workspace init. Contains the database connection and optional global defaults:
# Database connection (required)
database:
engine: sqlite3 # sqlite3 | postgresql | mysql | mssql | snowflake
name: db.sqlite3
# Optional: global defaults applied to every new project
defaults:
stage_schema: stage
rdv_schema: rdv
bdv_schema: bdv
PostgreSQL example:
database:
engine: postgresql
name: turbovault_db
user: turbovault_user
password: your_password
host: localhost
port: 5432
Supported Databases:
- SQLite (default) — no extra packages needed
- PostgreSQL —
pip install psycopg2-binary - MySQL/MariaDB —
pip install mysqlclient - SQL Server —
pip install mssql-django - Oracle —
pip install cx_Oracle - Snowflake —
pip install django-snowflake
projects/<name>/config.yml — Project Config
Created once by turbovault project init. Contains everything specific to one project:
project:
name: "my_datavault"
description: "My Data Vault Implementation"
# Optional: import source metadata on project init
source:
type: excel # excel | sqlite | json
path: "./metadata/sources.xlsx"
configuration:
stage_schema: "stage"
rdv_schema: "rdv"
bdv_schema: "bdv"
output:
create_zip: false
See config.example.yml for the full set of options.
Documentation:
- Configuration Overview - Two-config system explained with folder structure
- Project Config Schema Reference - Complete
config.ymlfield reference - Database Configuration Guide - Detailed
turbovault.ymldatabase setup
📊 Anonymous Usage Statistics
TurboVault Engine collects lightweight, anonymous usage statistics (command invoked, TurboVault version, Python version, OS family, and install type) to help us understand real-world usage and improve the tool. No personal data, project names, or Data Vault model content is ever sent.
Telemetry is enabled by default. To opt out, you can either:
- Set the environment variable:
TURBOVAULT_DISABLE_TELEMETRY=1 - Add the following to your
turbovault.yml:disable_anonymous_usage_stats: true
🎨 Template Customization
All SQL and YAML templates can be customized:
- Start admin:
turbovault serve - Navigate to: Model Templates in Django Admin
- Edit any template to customize generation
- Higher priority templates are selected first
Templates are automatically populated from files during turbovault workspace init.
Manual Template Management
Advanced / contributor use: The following commands require access to the
backend/Django project.
# Populate templates from files
cd backend && python manage.py populate_templates
# Overwrite existing templates
python manage.py populate_templates --overwrite
✅ Validation
Pre-generation validation catches common errors:
| Entity | Rule | Code |
|---|---|---|
| Hub (standard) | Must have hashkey | HUB_001 |
| Hub | Must have ≥1 business key | HUB_002 |
| Link | Must have hashkey | LNK_001 |
| Link | Must reference ≥2 hubs | LNK_002 |
| Satellite | Must have parent entity | SAT_001 |
| Model | SQL generated but YAML missing | YML_001 |
Validation modes:
--mode strict(default): Stop on first error--mode lenient: Skip invalid, continue with valid--skip-validation: Skip all validation
📤 Export Formats
JSON Export
# Export full Data Vault model as JSON
turbovault generate --type json --project my_project
# Custom output path
turbovault generate --type json --project my_project --json-output ./exports/model.json
# Compact format
turbovault generate --type json --project my_project --json-format compact
Exports complete model to JSON with:
- Project metadata
- All hubs, links, satellites
- Stage definitions with hashkeys/hashdiffs
- PITs and reference tables
- Snapshot controls
JSON Import (Round-Trip)
A JSON export can be re-imported as the source for a new project, enabling project migration, backup/restore, and sharing model definitions across workspaces:
# 1. Export the model from the source workspace
turbovault generate --type json --project my_project --json-output ./model.json
# 2. Import it into a new workspace (or project name)
turbovault project init --name my_project_copy --source ./model.json
# Or use a config.yml:
# source:
# type: json
# path: "./model.json"
Everything — hubs, links, satellites, stages, snapshot controls, PITs, reference tables — is restored exactly as it was in the original project.
DBML Export
# Export Data Vault model as a DBML diagram
turbovault generate --type dbml --project my_project
# Custom output path
turbovault generate --type dbml --project my_project --dbml-output ./exports/model.dbml
Exports the model as DBML (Database Markup Language), which can be rendered in dbdiagram.io to visualize entity relationships.
dbt Project
turbovault generate --project my_project
Generates ready-to-use dbt project with:
- SQL models using datavault4dbt macros
- YAML schemas for all models
- Complete folder structure
- packages.yml with datavault4dbt dependency
🤝 Contributing
We welcome and appreciate community contributions! To keep the project sustainable while ensuring the software remains open and accessible, we follow a Dual-Licensing model.
📜 Licensing & Open Source
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0).
The AGPL is a "strong copyleft" license. If you modify this software and provide it as a service over a network (SaaS), you must make your modified source code available to your users under the same license.
✍️ Contributor License Agreement (CLA)
To contribute code, all contributors are required to sign our Contributor License Agreement (CLA).
- Why? This ensures that you have the right to contribute the code and grants us the necessary rights to include your work in future versions of the project, including potential commercial or non-AGPL distributions.
- How? FIXME
💼 Commercial Usage & Licensing
We understand that the AGPL-3.0 may not be suitable for every organization's internal policies or proprietary products.
If you wish to use this project in a commercial or proprietary setting without the "copyleft" requirements of the AGPL, we offer alternative commercial licenses. This allows you to:
- Use the software without disclosing your own source code.
- Receive dedicated support and enterprise-grade warranties.
- Support the development team.
Please contact us at contact@scalefree.com to discuss a commercial license tailored to your needs.
📚 Documentation
Getting Started
Configuration
- Configuration Overview
- Database Configuration Guide
- Project Config Schema Reference
- Environment Variables Reference
Concepts
- Architecture Overview
- Architecture Details
- Domain Model Specification
- Excel Metadata Format
- JSON Import (Round-Trip)
- Validation Rules Reference
📄 License
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0) - see the LICENSE file for details.
🙏 Acknowledgements
Built with:
- Django - Web framework
- Typer - CLI framework
- Rich - Terminal formatting
- Pydantic - Data validation
- Jinja2 - Template engine
- datavault4dbt - dbt macros
Built by Scalefree
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turbovault_engine-0.13.0.tar.gz.
File metadata
- Download URL: turbovault_engine-0.13.0.tar.gz
- Upload date:
- Size: 184.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7fc6a5310e4fdbca5ac1ce2d01ad738bd886db9240c34ca87488d5a01ee15ca
|
|
| MD5 |
cd867001645382a5beb4986772254c78
|
|
| BLAKE2b-256 |
e4b56dcb263398e8c251f7c50d9a0afeb9de2e05619a5639863ba58a087618b0
|
Provenance
The following attestation bundles were made for turbovault_engine-0.13.0.tar.gz:
Publisher:
release.yml on ScalefreeCOM/turbovault-engine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
turbovault_engine-0.13.0.tar.gz -
Subject digest:
e7fc6a5310e4fdbca5ac1ce2d01ad738bd886db9240c34ca87488d5a01ee15ca - Sigstore transparency entry: 1371076708
- Sigstore integration time:
-
Permalink:
ScalefreeCOM/turbovault-engine@55f3bb9287e0b99ebd8aa44b158d3e1ec0eee320 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ScalefreeCOM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@55f3bb9287e0b99ebd8aa44b158d3e1ec0eee320 -
Trigger Event:
push
-
Statement type:
File details
Details for the file turbovault_engine-0.13.0-py3-none-any.whl.
File metadata
- Download URL: turbovault_engine-0.13.0-py3-none-any.whl
- Upload date:
- Size: 214.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
635b89596bab0347cdf2f99016824a9121833d541ac8a6cf22043908c2c5c447
|
|
| MD5 |
f5112350074f262ff7f04afafc010d5b
|
|
| BLAKE2b-256 |
52da93534f4dc91fd1f17a27c6da9926d1a709c1a411380ebc633b9096a1239d
|
Provenance
The following attestation bundles were made for turbovault_engine-0.13.0-py3-none-any.whl:
Publisher:
release.yml on ScalefreeCOM/turbovault-engine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
turbovault_engine-0.13.0-py3-none-any.whl -
Subject digest:
635b89596bab0347cdf2f99016824a9121833d541ac8a6cf22043908c2c5c447 - Sigstore transparency entry: 1371076848
- Sigstore integration time:
-
Permalink:
ScalefreeCOM/turbovault-engine@55f3bb9287e0b99ebd8aa44b158d3e1ec0eee320 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ScalefreeCOM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@55f3bb9287e0b99ebd8aa44b158d3e1ec0eee320 -
Trigger Event:
push
-
Statement type: