Metadata-Driven Data Vault Generator for dbt
Project description
metadv - Metadata-Driven Model Generator
metadv is a Python library for generating SQL models from a declarative YAML configuration. It supports multiple data modeling approaches including Data Vault 2.0, Anchor Modeling, and Dimensional Modeling, with template packages for popular dbt libraries.
Features
- Declarative Configuration: Define your data model structure in a single YAML file
- Multiple Modeling Approaches: Support for Data Vault 2.0, Anchor Modeling, and Dimensional Modeling
- Template Packages: Works with automate_dv, datavault4dbt, and dimensional templates
- Custom Templates: Add your own template packages for different frameworks
- Validation: Validates your configuration before generating models
- CLI & Library: Use as a command-line tool or import as a Python library
Installation
pip install metadv
Quick Start
- Create a
metadv.ymlfile in your dbt project'smodels/metadv/folder - Define your targets (entities and relations) and source mappings
- Run the generator to create SQL models
See sample_metadv.yml for a complete example configuration.
Usage
Command Line
# Generate Data Vault models using automate_dv
metadv /path/to/dbt/project --package datavault-uk/automate_dv
# Validate only (don't generate)
metadv /path/to/dbt/project --package datavault-uk/automate_dv --validate-only
# Generate to custom output directory
metadv /path/to/dbt/project --package datavault-uk/automate_dv --output ./output
# Show detailed output including warnings
metadv /path/to/dbt/project --package datavault-uk/automate_dv --verbose
# Output results as JSON
metadv /path/to/dbt/project --package datavault-uk/automate_dv --json
Python Library
from metadv import MetaDVGenerator
# Initialize generator with package name
generator = MetaDVGenerator('/path/to/dbt/project', 'datavault-uk/automate_dv')
# Validate configuration
result = generator.validate()
if result.errors:
print("Validation errors:", [e.message for e in result.errors])
# Generate SQL models
success, error, files = generator.generate()
if success:
print(f"Generated {len(files)} files")
else:
print(f"Error: {error}")
Supported Packages
| Package | Description | Generated Models |
|---|---|---|
datavault-uk/automate_dv |
Data Vault 2.0 using automate_dv | Stage, Hub, Link, Satellite |
scalefreecom/datavault4dbt |
Data Vault 2.0 using datavault4dbt | Stage, Hub, Link, Satellite |
dimensional |
Dimensional Modeling | Dimension, Fact |
Configuration Reference
metadv.yml Structure
metadv:
# Optional: custom templates directory (relative to project root or absolute)
templates-dir: ./my-templates
# Optional: custom validations directory (relative to project root or absolute)
validations-dir: ./my-validations
# Define your targets (entities and relations)
targets:
- name: customer
type: entity
description: Customer business entity
- name: order
type: entity
description: Order business entity
- name: customer_order
type: relation
description: Customer to order relationship
entities:
- customer
- order
# Define source models and their column mappings
sources:
- name: raw_customers
columns:
- name: customer_id
target:
- target_name: customer # Entity key connection
- name: customer_name
target:
- attribute_of: customer # Attribute connection
- name: raw_orders
columns:
- name: order_id
target:
- target_name: order
- name: customer_id
target:
- target_name: customer_order
entity_name: customer # Which entity in the relation
- name: order_date
target:
- attribute_of: order
multiactive_key: true # Mark as multiactive key
metadv Section Options
| Field | Description |
|---|---|
templates-dir |
Optional path to custom templates directory (relative to project root or absolute). Templates here take precedence over built-in templates. |
validations-dir |
Optional path to custom validations directory (relative to project root or absolute). Custom validators with the same class name as built-in ones will override them. |
targets |
Array of target definitions (entities and relations) |
sources |
Array of source model definitions with column mappings |
Target Types
| Type | Description | Data Vault Output | Dimensional Output |
|---|---|---|---|
entity |
A business entity (e.g., Customer, Product) | Hub + Satellite | Dimension |
relation |
A relationship between entities | Link + Satellite | Fact |
Column Target Array
Each column has a target array that can contain multiple connections:
| Field | Description |
|---|---|
target_name |
Target entity/relation this column identifies (creates key) |
entity_name |
For relation connections: which entity within the relation |
entity_index |
For self-referencing relations: entity position (0-indexed) |
attribute_of |
Target this column is an attribute of (satellite/dimension payload) |
target_attribute |
Custom display name for the attribute |
multiactive_key |
Mark as multiactive key column (useful for Data Vault) |
Connection Types
-
Entity/Relation Key Connections (
target_name): Link a source column to a target. The column becomes a business key. -
Attribute Connections (
attribute_of): Link a source column as an attribute of a target. The column becomes part of the satellite or dimension payload.
Multiactive Satellites (Data Vault)
For satellites with multiple active records per business key, mark one or more columns as multiactive keys:
- name: phone_number
target:
- attribute_of: customer
multiactive_key: true # This column distinguishes active records
Multiactive key columns are:
- Used to identify unique records (can be used as child key within the satellite)
- Excluded from the payload columns
- Generate
ma_sat_models instead ofsat_models using condition in templates.yml
Validation
metadv validates your configuration and reports:
Errors (must be fixed before generating):
- Relations missing entity connections from sources
Warnings (recommendations):
- Entities without source connections
- Targets without descriptions
- Columns without any connections
Run with --validate-only to check your configuration without generating files.
Custom Template Packages
You can create custom template packages by setting templates-dir in your metadv.yml to point to a directory containing your templates. This directory should contain package folders with:
- A
templates.ymlfile defining template configurations - SQL template files using Jinja2 and Python string.Template syntax
Templates in your custom directory take precedence over built-in templates with the same package name.
Custom Validation Packages
You can create custom validators by setting validations-dir in your metadv.yml to point to a directory containing your validation Python files.
Creating Custom Validators
- Create a Python file in your validations directory (e.g.,
my_validation.py) - Import the base class from metadv
- Create a class that inherits from
BaseValidator - Implement the
validate()method
# my_validation.py
from metadv.validations.base import BaseValidator, ValidationContext, ValidationMessage
from typing import List
class MyCustomValidator(BaseValidator):
def validate(self, ctx: ValidationContext) -> List[ValidationMessage]:
messages = []
# Your validation logic here
if some_condition:
messages.append(ValidationMessage(
type='error', # or 'warning'
code='my_error_code',
message='Something is wrong'
))
return messages
Override Built-in Validators
To override a built-in validator, create a custom validator with the same class name. For example, to override the EntityNoSourceValidator:
# entity_no_source.py (in your validations-dir)
from metadv.validations.base import BaseValidator, ValidationContext, ValidationMessage
from typing import List
class EntityNoSourceValidator(BaseValidator):
"""Custom implementation that overrides the built-in validator."""
def validate(self, ctx: ValidationContext) -> List[ValidationMessage]:
# Your custom logic here
return []
templates.yml Structure
The templates.yml file defines which templates to generate for each domain (entity, relation, source):
# Templates for entity targets (e.g., Hub, Dimension)
entity:
hub: # Template key (arbitrary name)
template: hub.sql # Template file to use
filename: "hub/hub_{entity_name}.sql" # Output filename pattern
scope: entity # Generator scope (see below)
sat:
template: sat.sql
filename: "sat/sat_{entity_name}__{source_name}.sql"
scope: source # One file per source-target pair
condition: has_attributes # Only generate if condition is true
ma_sat:
template: ma_sat.sql
filename: "sat/ma_sat_{entity_name}__{source_name}.sql"
scope: source
condition: is_multiactive # Only for multiactive satellites
# Templates for relation targets (e.g., Link, Fact)
relation:
link:
template: link.sql
filename: "link/link_{relation_name}.sql"
scope: relation
sat:
template: sat.sql
filename: "sat/sat_{relation_name}__{source_name}.sql"
scope: source
condition: has_attributes
# Templates for source models (e.g., Stage)
source:
stage:
template: stage.sql
filename: "stage/stg_{source_name}.sql"
scope: source
Template Configuration Fields
| Field | Description |
|---|---|
template |
SQL template filename in the package folder |
filename |
Output path pattern with placeholders like {entity_name}, {source_name}, {relation_name} |
scope |
Determines generator type and context passed to template |
condition |
Optional condition that must be true to generate this template |
Scope Types
| Scope | Generator | Description |
|---|---|---|
entity |
TargetGenerator | One file per entity target |
relation |
TargetGenerator | One file per relation target |
source |
SourceTargetGenerator | One file per source-target pair |
attribute |
AttributeGenerator | One file per individual attribute |
source (in source domain) |
SourceGenerator | One file per source model (for staging) |
Built-in Conditions
| Condition | True when |
|---|---|
has_attributes |
Source has attribute columns for this target |
is_multiactive |
Source has multiactive key columns for this target |
Template Context Variables
Templates receive context variables based on their scope. Use Python ${variable} syntax for initial substitution, then Jinja2 {{ variable }} for dbt rendering:
Entity scope: entity_name, source_refs
Relation scope: relation_name, entities, source_refs, fk_columns
Source scope (source-target): source_name, source_model, entity_name/relation_name, attributes, key_column, columns
Attribute scope: entity_name/relation_name, source_name, source_model, attribute_name, column, key_column
Source scope (source): source_name, columns
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metadv-0.3.0.tar.gz.
File metadata
- Download URL: metadv-0.3.0.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
decc12b64b85aec78df654c98043540503eaee6a26a1d0b46f6c08ca15aceb48
|
|
| MD5 |
18fed35b9e14c5dbf3dcfc5ace8e73ea
|
|
| BLAKE2b-256 |
70fab4845d617dc25e0035d185c5430108856242bd7bcb122d91bb62fca70eb8
|
Provenance
The following attestation bundles were made for metadv-0.3.0.tar.gz:
Publisher:
publish.yml on data-diving/metadv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metadv-0.3.0.tar.gz -
Subject digest:
decc12b64b85aec78df654c98043540503eaee6a26a1d0b46f6c08ca15aceb48 - Sigstore transparency entry: 850117714
- Sigstore integration time:
-
Permalink:
data-diving/metadv@79432f098b631d0b5b6ed0ebd1df718a984ac406 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/data-diving
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79432f098b631d0b5b6ed0ebd1df718a984ac406 -
Trigger Event:
release
-
Statement type:
File details
Details for the file metadv-0.3.0-py3-none-any.whl.
File metadata
- Download URL: metadv-0.3.0-py3-none-any.whl
- Upload date:
- Size: 39.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cea29a677b59db7bd65901b3dee28f6d105d9ae350559be9b02c2b82721d8934
|
|
| MD5 |
bee06fa74d0d4c4a2311f88a125a4717
|
|
| BLAKE2b-256 |
e9662535ef0b0dcf3e4435bc00fa203f6f8af13ddbac697fb7a559d15235b2bc
|
Provenance
The following attestation bundles were made for metadv-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on data-diving/metadv
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metadv-0.3.0-py3-none-any.whl -
Subject digest:
cea29a677b59db7bd65901b3dee28f6d105d9ae350559be9b02c2b82721d8934 - Sigstore transparency entry: 850117715
- Sigstore integration time:
-
Permalink:
data-diving/metadv@79432f098b631d0b5b6ed0ebd1df718a984ac406 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/data-diving
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79432f098b631d0b5b6ed0ebd1df718a984ac406 -
Trigger Event:
release
-
Statement type: