Skip to main content

CLI-first Django-based engine for Data Vault modeling and dbt project generation

Project description

TurboVault Engine

Transform source metadata into production-ready Data Vault dbt projects

CI GitHub Release Python 3.12+ Django License: AGPL v3


๐ŸŽฏ What is TurboVault Engine?

TurboVault Engine is a CLI-first, Django-based automation engine that accelerates Data Vault 2.0 implementations. It:

  • Ingests source metadata from Excel files or database catalogs
  • Maps metadata into a consistent Data Vault domain model (Hubs, Links, Satellites)
  • Generates complete, production-ready dbt projects with datavault4dbt macros
  • Validates your model before generation with comprehensive error checking

Perfect for: Data Engineers looking to rapidly prototype, standardize, or automate their Data Vault implementations.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Source    โ”‚ --> โ”‚  TurboVault      โ”‚ --> โ”‚  dbt Project    โ”‚
โ”‚  Metadata   โ”‚     โ”‚  Engine          โ”‚     โ”‚  (Ready to Run) โ”‚
โ”‚  (Excel/DB) โ”‚     โ”‚                  โ”‚     โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Key Features

๐Ÿ—๏ธ Complete dbt Project Generation

  • Automatic model generation - SQL models with datavault4dbt macros
  • YAML schemas - Complete dbt documentation for all models
  • Organized structure - Clean folder hierarchy (staging, raw_vault, business_vault)
  • Template customization - Customize any template via Django Admin
  • Validation - Pre-generation checks to catch errors early

๐Ÿ“ฆ Data Vault Modeling

  • Hubs - Standard and reference hubs with business keys
  • Links - Standard and non-historized links connecting multiple hubs
  • Satellites - Standard, multi-active, non-historized, effectivity, and reference satellites
  • PITs - Point-in-Time table generation
  • Reference Tables - Reference data modeling
  • Snapshot Controls - Configurable snapshot logic for temporal tracking

๐Ÿ”ง Source Management

  • Source Systems - Define database schemas and connections
  • Source Tables - Map physical tables with record source and load date
  • Prejoins - Cross-table joins for complex link mappings
  • Stage Models - Automatic staging layer with hashkeys and hashdiffs

๐Ÿ–ฅ๏ธ Developer Experience

  • Modern CLI - Built with Typer and Rich for beautiful terminal output
  • Web Initializer - Interactive, multi-step project creation wizard
  • Django Admin - Full web interface for model and template management
  • Config-Driven - YAML configuration for automation and CI/CD
  • Comprehensive Testing - pytest test suite with 20+ tests

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.12+
  • pip (Python package manager)
  • (Optional) Database drivers if using external databases:
    • PostgreSQL: psycopg2-binary
    • MySQL: mysqlclient
    • SQL Server: mssql-django
    • Oracle: cx_Oracle
    • Snowflake: django-snowflake

Installation

Install from PyPI:

pip install turbovault-engine

Install directly from GitHub (latest development version):

pip install git+https://github.com/ScalefreeCOM/turbovault-engine.git

We recommend installing into a dedicated virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install turbovault-engine

Initialize Your Workspace & First Project

TurboVault uses a two-step setup. First, create and enter a dedicated folder for your workspace:

mkdir my-turbovault-workspace
cd my-turbovault-workspace

Step 1 โ€” Initialise the workspace (once per directory):

# Interactive (recommended for first time)
turbovault workspace init

# Or fully non-interactive:
turbovault workspace init \
  --db-engine sqlite3 --db-name db.sqlite3 \
  --stage-schema stage --rdv-schema rdv \
  --admin-username admin --admin-password changeme --admin-email admin@example.com

This creates turbovault.yml, initialises the database, runs all migrations, and populates default templates.

Step 2 โ€” Create a project (once per project):

# Interactive wizard
turbovault project init --interactive

# Non-interactive with flags (great for CI/scripts)
turbovault project init --name my_project --source ./metadata.xlsx \
  --stage-schema stage --rdv-schema rdv

# Or from a config file
turbovault project init --config config.example.yml

Populate and Maintain your Data Vault model

You can check, define, and change your Data Vault model via the Django Admin interface. To launch the web interface:

# Launch the web interface
turbovault serve

Sign in via the credentials you set up during workspace initialization.

Generate Your dbt Project

# Generate dbt project from your Data Vault model
turbovault generate --project my_project

# Generate with custom output path
turbovault generate --project my_project --output ./my_dbt

# Generate with ZIP archive
turbovault generate --project my_project --zip

# Skip satellite v1 views
turbovault generate --project my_project --no-v1-satellites

๐Ÿ“‹ CLI Commands

Command Description
turbovault workspace init Initialise directory as a workspace (creates turbovault.yml + DB)
turbovault workspace status Show workspace health (DB, projects, migrations)
turbovault project init Create a new project in the workspace
turbovault project list List all projects in the workspace
turbovault generate Generate dbt project or export model to JSON / DBML
turbovault serve Start Django admin server for model management
turbovault reset Reset the database
turbovault --help Show all available commands

Command Examples

# --- Workspace ---
# Initialise workspace (non-interactive)
turbovault workspace init --db-engine sqlite3 --db-name db.sqlite3 \
  --stage-schema stage --rdv-schema rdv

# Check workspace health
turbovault workspace status

# --- Projects ---
# Interactive project creation
turbovault project init --interactive

# Create from YAML config
turbovault project init --config config.yml

# List all projects
turbovault project list

# --- Generation ---
# Generate dbt project with validation
turbovault generate --project sales_datavault

# Generate in lenient mode (skip invalid entities)
turbovault generate --project sales_datavault --mode lenient

# Generate with ZIP and no v1 satellites
turbovault generate -p sales_datavault --zip --no-v1-satellites

# Export Data Vault model to JSON
turbovault generate --type json --project sales_datavault

# Start admin on custom port
turbovault serve --port 9000

๐Ÿ—„๏ธ Domain Model

TurboVault Engine uses a comprehensive Data Vault domain model:

Core Entities

Entity Description
Project Top-level container for all metadata
Group Logical grouping for organizing entities into subfolders
Source System Database/schema source definitions
Source Table Physical source tables with metadata
Hub Data Vault hubs (standard or reference)
Link Relationships between hubs (standard or non-historized)
Satellite Descriptive attributes for hubs/links (6 types)
PIT Point-in-Time tables for temporal joins
Reference Table Reference data structures
Snapshot Control Temporal snapshot configuration

Advanced Features

  • Prejoins - Define cross-table joins for link mappings
  • Multi-source support - Multiple sources feeding the same entity
  • Satellite variants - Standard, multi-active, effectivity, non-historized, reference, record-tracking
  • Template customization - All SQL and YAML templates customizable via Admin

โš™๏ธ Configuration

TurboVault Engine is configured via config.yml:

project:
  name: "my_datavault"
  description: "My Data Vault Implementation"

source:
  type: excel
  path: "./metadata/sources.xlsx"

# Optional: Configure external database (PostgreSQL, MySQL, etc.)
# Default is SQLite if not specified
database:
  engine: postgresql
  name: turbovault_db
  user: turbovault_user
  password: your_password
  host: localhost
  port: 5432

configuration:
  stage_schema: "stage"
  rdv_schema: "rdv"
  bdv_schema: "bdv"

output:
  dbt_project_dir: "./generated/dbt_project"
  create_zip: false

Supported Databases:

  • SQLite (default) - No configuration needed
  • PostgreSQL - pip install psycopg2-binary
  • MySQL/MariaDB - pip install mysqlclient
  • SQL Server - pip install mssql-django
  • Oracle - pip install cx_Oracle
  • Snowflake - pip install django-snowflake

See config.example.yml for a complete example.

Documentation:

๐Ÿ“Š Anonymous Usage Statistics

TurboVault Engine collects lightweight, anonymous usage statistics (command invoked, TurboVault version, Python version, OS family, and install type) to help us understand real-world usage and improve the tool. No personal data, project names, or Data Vault model content is ever sent.

Telemetry is enabled by default. To opt out, you can either:

  1. Set the environment variable: TURBOVAULT_DISABLE_TELEMETRY=1
  2. Add the following to your turbovault.yml:
    disable_anonymous_usage_stats: true
    

๐ŸŽจ Template Customization

All SQL and YAML templates can be customized:

  1. Start admin: turbovault serve
  2. Navigate to: Model Templates in Django Admin
  3. Edit any template to customize generation
  4. Higher priority templates are selected first

Templates are automatically populated from files during turbovault workspace init.

Manual Template Management

Advanced / contributor use: The following commands require access to the backend/ Django project.

# Populate templates from files
cd backend && python manage.py populate_templates

# Overwrite existing templates
python manage.py populate_templates --overwrite

โœ… Validation

Pre-generation validation catches common errors:

Entity Rule Code
Hub (standard) Must have hashkey HUB_001
Hub Must have โ‰ฅ1 business key HUB_002
Link Must have hashkey LNK_001
Link Must reference โ‰ฅ2 hubs LNK_002
Satellite Must have parent entity SAT_001
Model SQL generated but YAML missing YML_001

Validation modes:

  • --mode strict (default): Stop on first error
  • --mode lenient: Skip invalid, continue with valid
  • --skip-validation: Skip all validation

๐Ÿ“ค Export Formats

JSON Export

# Export full Data Vault model as JSON
turbovault generate --type json --project my_project

# Custom output path
turbovault generate --type json --project my_project --json-output ./exports/model.json

# Compact format
turbovault generate --type json --project my_project --json-format compact

Exports complete model to JSON with:

  • Project metadata
  • All hubs, links, satellites
  • Stage definitions with hashkeys/hashdiffs
  • PITs and reference tables
  • Snapshot controls

DBML Export

# Export Data Vault model as a DBML diagram
turbovault generate --type dbml --project my_project

# Custom output path
turbovault generate --type dbml --project my_project --dbml-output ./exports/model.dbml

Exports the model as DBML (Database Markup Language), which can be rendered in dbdiagram.io to visualize entity relationships.

dbt Project

turbovault generate --project my_project

Generates ready-to-use dbt project with:

  • SQL models using datavault4dbt macros
  • YAML schemas for all models
  • Complete folder structure
  • packages.yml with datavault4dbt dependency

๐Ÿค Contributing

We welcome and appreciate community contributions! To keep the project sustainable while ensuring the software remains open and accessible, we follow a Dual-Licensing model.

๐Ÿ“œ Licensing & Open Source

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0).

The AGPL is a "strong copyleft" license. If you modify this software and provide it as a service over a network (SaaS), you must make your modified source code available to your users under the same license.

โœ๏ธ Contributor License Agreement (CLA)

To contribute code, all contributors are required to sign our Contributor License Agreement (CLA).

  • Why? This ensures that you have the right to contribute the code and grants us the necessary rights to include your work in future versions of the project, including potential commercial or non-AGPL distributions.
  • How? FIXME

๐Ÿ’ผ Commercial Usage & Licensing

We understand that the AGPL-3.0 may not be suitable for every organization's internal policies or proprietary products.

If you wish to use this project in a commercial or proprietary setting without the "copyleft" requirements of the AGPL, we offer alternative commercial licenses. This allows you to:

  • Use the software without disclosing your own source code.
  • Receive dedicated support and enterprise-grade warranties.
  • Support the development team.

Please contact us at contact@scalefree.com to discuss a commercial license tailored to your needs.


๐Ÿ“š Documentation

Getting Started

Configuration

Concepts


๐Ÿ“„ License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0) - see the LICENSE file for details.


๐Ÿ™ Acknowledgements

Built with:


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbovault_engine-0.12.3.tar.gz (162.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turbovault_engine-0.12.3-py3-none-any.whl (189.6 kB view details)

Uploaded Python 3

File details

Details for the file turbovault_engine-0.12.3.tar.gz.

File metadata

  • Download URL: turbovault_engine-0.12.3.tar.gz
  • Upload date:
  • Size: 162.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for turbovault_engine-0.12.3.tar.gz
Algorithm Hash digest
SHA256 0c1eebb8217c410c1c0effe87a780679ce996840059e455964fd5f02ec7955bf
MD5 ea6560b79e877fb27a94e44d21f447e7
BLAKE2b-256 578565f91ac1a33c4b714ad9a7dd80247d5c4d622bc22a9122eb9f112725d999

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovault_engine-0.12.3.tar.gz:

Publisher: release.yml on ScalefreeCOM/turbovault-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turbovault_engine-0.12.3-py3-none-any.whl.

File metadata

File hashes

Hashes for turbovault_engine-0.12.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b6c8ec01a82851b18011f106a90093cd7ac5bd308ddbc2ca11a9c94042d49bfb
MD5 304f68b285a4068400573cddf85c07d4
BLAKE2b-256 2dcd2e5ca7f0b315042757e968b96faf81b868b164a5d1db1681b60eeb4cd612

See more details on using hashes here.

Provenance

The following attestation bundles were made for turbovault_engine-0.12.3-py3-none-any.whl:

Publisher: release.yml on ScalefreeCOM/turbovault-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page