Production-ready Dataform SQLX templates for data warehouse patterns
Project description
Dataform Warehouse Blueprints
Production-ready Dataform SQLX templates for data warehouse patterns
Generate clean, maintainable Dataform SQLX files from simple YAML configurations. Supports staging, dimension, fact, and aggregate table patterns with best practices built-in.
🎯 Features
- Multiple Table Patterns: Staging, Dimension, Fact, Aggregate/Mart tables
- Best Practices: Partitioning, clustering, incremental loading built-in
- Simple YAML Config: Define tables without writing boilerplate SQL
- CLI & Python API: Use as command-line tool or Python library
- Validation: Catch configuration errors before generation
- Batch Generation: Process multiple configs at once
- Dataform-Ready: Generates valid SQLX for Dataform projects
📦 Installation
pip install dataform-warehouse-blueprints
🚀 Quick Start
CLI Usage
1. Initialize a configuration file:
dataform-blueprints init \
-t dim_employee \
-l warehouse \
--table-type dimension \
-o dim_employee_config.yaml
2. Edit the generated YAML:
table_name: dim_employee
layer: warehouse
table_type: dimension
source_table: ${ref('stg_employees')}
columns:
- employee_id
- first_name
- last_name
- email
partition_by: updated_at
cluster_by: [employee_id]
primary_keys: [employee_id]
incremental: true
tags: [dimension, hr]
3. Generate SQLX:
dataform-blueprints generate \
-c dim_employee_config.yaml \
-o definitions/dim_employee.sqlx
4. Generated SQLX output:
config {
type: "incremental",
schema: "warehouse",
name: "dim_employee",
bigquery: {
partitionBy: "updated_at",
clusterBy: ["employee_id"],
},
tags: ["dimension", "hr"],
}
-- Dimension Table: dim_employee
-- Layer: WAREHOUSE (Gold)
SELECT
employee_id,
first_name,
last_name,
email,
CURRENT_TIMESTAMP() AS _loaded_at
FROM ${ref('stg_employees')}
${ when(incremental(), `
WHERE updated_at > (SELECT MAX(updated_at) FROM ${self()})
`) }
Python API
from dataform_blueprints import TableConfig, BlueprintGenerator, LayerType, TableType
# Define configuration
config = TableConfig(
table_name="dim_employee",
layer=LayerType.WAREHOUSE,
table_type=TableType.DIMENSION,
source_table="${ref('stg_employees')}",
columns=["employee_id", "first_name", "last_name"],
primary_keys=["employee_id"],
partition_by="updated_at",
cluster_by=["employee_id"],
incremental=True,
tags=["dimension", "hr"]
)
# Generate SQLX
generator = BlueprintGenerator(config)
sqlx_content = generator.generate_sqlx()
# Write to file
generator.write_sqlx("definitions/dim_employee.sqlx")
📚 Table Patterns
1. Staging Tables (Raw → Staging)
Clean and standardize data from raw sources.
table_name: stg_employees
layer: staging
table_type: source
source_table: ${ref('raw_employees')}
columns:
- employee_id
- first_name
- last_name
incremental: true
2. Dimension Tables (SCD Type 1)
Business entities for analytics.
table_name: dim_employee
layer: warehouse
table_type: dimension
source_table: ${ref('stg_employees')}
columns:
- employee_id
- first_name
- last_name
primary_keys: [employee_id]
partition_by: updated_at
cluster_by: [employee_id]
3. Fact Tables
Transactional business events.
table_name: fact_payroll_run
layer: warehouse
table_type: fact
source_table: ${ref('stg_payroll')}
columns:
- payroll_id
- employee_id
- gross_pay
- net_pay
primary_keys: [payroll_id]
partition_by: pay_date
cluster_by: [employee_id, pay_date]
4. Aggregate/Mart Tables
Pre-aggregated data for reporting.
table_name: mart_payroll_summary
layer: marts
table_type: aggregate
source_table: ${ref('fact_payroll_run')}
columns:
- department
- SUM(gross_pay) as total_pay
- COUNT(*) as payroll_count
primary_keys: [department]
🛠️ CLI Commands
init - Create Configuration
dataform-blueprints init -t <table_name> -l <layer> --table-type <type>
Options:
-t, --table-name: Table name (required)-l, --layer: Layer (raw/staging/warehouse/marts) (required)--table-type: Pattern (source/dimension/fact/aggregate) (required)-o, --output: Output file path (default: table_config.yaml)
generate - Generate SQLX
dataform-blueprints generate -c <config.yaml> -o <output.sqlx>
Options:
-c, --config: YAML config file (required)-o, --output: Output SQLX file--validate-only: Only validate, don't generate
batch - Batch Generate
dataform-blueprints batch -d <configs_dir> -o <output_dir>
Options:
-d, --directory: Directory with YAML configs (required)-o, --output-dir: Output directory (default: definitions)
examples - Show Examples
dataform-blueprints examples
📋 Configuration Reference
Required Fields
table_name: string # Table name
layer: string # raw | staging | warehouse | marts
table_type: string # source | dimension | fact | aggregate | view
columns: list # Column names
Optional Fields
source_table: string # Source table reference (e.g., ${ref('table')})
partition_by: string # Partition column
cluster_by: list # Clustering columns
primary_keys: list # Primary key columns
description: string # Table description
tags: list # Tags for organization
incremental: bool # Enable incremental (default: true)
dataset_id: string # Override default dataset
dependencies: list # Explicit dependencies
assertions: list # Data quality assertions
🎯 Use Cases
Data Warehouse Modernization
Generate consistent SQLX files across your entire warehouse:
# Generate all staging tables
dataform-blueprints batch -d configs/staging -o definitions/staging
# Generate all dimensions
dataform-blueprints batch -d configs/dimensions -o definitions/warehouse
# Generate all facts
dataform-blueprints batch -d configs/facts -o definitions/warehouse
Rapid Prototyping
Quickly scaffold new tables:
dataform-blueprints init -t fact_sales -l warehouse --table-type fact
# Edit config...
dataform-blueprints generate -c fact_sales.yaml
CI/CD Integration
Validate configs in CI:
dataform-blueprints generate -c dim_employee.yaml --validate-only
🔧 Development
Setup
git clone https://github.com/yourusername/dataform-warehouse-blueprints.git
cd dataform-warehouse-blueprints
pip install -e ".[dev]"
Run Tests
pytest tests/ -v --cov=dataform_blueprints
Code Quality
black src/
flake8 src/
📖 Examples
See the examples/ directory for:
dim_employee_config.yaml- Dimension tablefact_payroll_config.yaml- Fact tablemart_payroll_summary_config.yaml- Aggregate table
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Submit a pull request
📄 License
MIT License - see LICENSE file for details
🔗 Links
- PyPI: https://pypi.org/project/dataform-warehouse-blueprints/
- GitHub: https://github.com/yourusername/dataform-warehouse-blueprints
- Documentation: https://github.com/yourusername/dataform-warehouse-blueprints#readme
💡 Why Dataform Blueprints?
- Consistency: Standardize table patterns across your warehouse
- Speed: Generate tables 10x faster than writing SQL by hand
- Best Practices: Partitioning, clustering, incremental built-in
- Maintainability: YAML configs easier to review than SQL
- Reusability: Share table patterns across projects/teams
🎓 Related Projects
This module is part of the Payroll & Workforce Analytics Modernization project:
synthetic-payroll-lab- Generate test datascd2-bq-engine- SCD Type 2 automationbq-finops-cli- BigQuery cost optimization
Built with ❤️ for data engineers who value clean, maintainable SQL.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataform_warehouse_blueprints-0.1.0.tar.gz.
File metadata
- Download URL: dataform_warehouse_blueprints-0.1.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d34623ad5874dcfd5a1d51d9d28b13bed619ddc9817115f6dfbf5cec4fda5c5
|
|
| MD5 |
281ec8bc6e64f0b31dd7f9c38fd0ee0e
|
|
| BLAKE2b-256 |
fe53d333d09e0327eded745622e71400fa385ff39f8a97d5a9915d5b46ebe0af
|
File details
Details for the file dataform_warehouse_blueprints-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dataform_warehouse_blueprints-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6bc435def260979d21e6d88511d4ed77c0267b7e76566f4c352ddc277710087
|
|
| MD5 |
f73444379eb9d692df16e38a60ef16c3
|
|
| BLAKE2b-256 |
e177bee79c4440378edfc91e5464b4e1fa929b7c176f84ab25afb65542d1de7e
|