Production-ready Dataform SQLX templates for data warehouse patterns

These details have not been verified by PyPI

Project links

Homepage

Project description

Dataform Warehouse Blueprints

Production-ready Dataform SQLX templates for data warehouse patterns

Generate clean, maintainable Dataform SQLX files from simple YAML configurations. Supports staging, dimension, fact, and aggregate table patterns with best practices built-in.

🎯 Features

Multiple Table Patterns: Staging, Dimension, Fact, Aggregate/Mart tables
Best Practices: Partitioning, clustering, incremental loading built-in
Simple YAML Config: Define tables without writing boilerplate SQL
CLI & Python API: Use as command-line tool or Python library
Validation: Catch configuration errors before generation
Batch Generation: Process multiple configs at once
Dataform-Ready: Generates valid SQLX for Dataform projects

📦 Installation

pip install dataform-warehouse-blueprints

🚀 Quick Start

CLI Usage

1. Initialize a configuration file:

dataform-blueprints init \
  -t dim_employee \
  -l warehouse \
  --table-type dimension \
  -o dim_employee_config.yaml

2. Edit the generated YAML:

table_name: dim_employee
layer: warehouse
table_type: dimension
source_table: ${ref('stg_employees')}
columns:
  - employee_id
  - first_name
  - last_name
  - email
partition_by: updated_at
cluster_by: [employee_id]
primary_keys: [employee_id]
incremental: true
tags: [dimension, hr]

3. Generate SQLX:

dataform-blueprints generate \
  -c dim_employee_config.yaml \
  -o definitions/dim_employee.sqlx

4. Generated SQLX output:

config {
  type: "incremental",
  schema: "warehouse",
  name: "dim_employee",
  bigquery: {
    partitionBy: "updated_at",
    clusterBy: ["employee_id"],
  },
  tags: ["dimension", "hr"],
}

-- Dimension Table: dim_employee
-- Layer: WAREHOUSE (Gold)

SELECT
  employee_id,
  first_name,
  last_name,
  email,
  CURRENT_TIMESTAMP() AS _loaded_at
FROM ${ref('stg_employees')}

${ when(incremental(), `
  WHERE updated_at > (SELECT MAX(updated_at) FROM ${self()})
`) }

Python API

from dataform_blueprints import TableConfig, BlueprintGenerator, LayerType, TableType

# Define configuration
config = TableConfig(
    table_name="dim_employee",
    layer=LayerType.WAREHOUSE,
    table_type=TableType.DIMENSION,
    source_table="${ref('stg_employees')}",
    columns=["employee_id", "first_name", "last_name"],
    primary_keys=["employee_id"],
    partition_by="updated_at",
    cluster_by=["employee_id"],
    incremental=True,
    tags=["dimension", "hr"]
)

# Generate SQLX
generator = BlueprintGenerator(config)
sqlx_content = generator.generate_sqlx()

# Write to file
generator.write_sqlx("definitions/dim_employee.sqlx")

📚 Table Patterns

1. Staging Tables (Raw → Staging)

Clean and standardize data from raw sources.

table_name: stg_employees
layer: staging
table_type: source
source_table: ${ref('raw_employees')}
columns:
  - employee_id
  - first_name
  - last_name
incremental: true

2. Dimension Tables (SCD Type 1)

Business entities for analytics.

table_name: dim_employee
layer: warehouse
table_type: dimension
source_table: ${ref('stg_employees')}
columns:
  - employee_id
  - first_name
  - last_name
primary_keys: [employee_id]
partition_by: updated_at
cluster_by: [employee_id]

3. Fact Tables

Transactional business events.

table_name: fact_payroll_run
layer: warehouse
table_type: fact
source_table: ${ref('stg_payroll')}
columns:
  - payroll_id
  - employee_id
  - gross_pay
  - net_pay
primary_keys: [payroll_id]
partition_by: pay_date
cluster_by: [employee_id, pay_date]

4. Aggregate/Mart Tables

Pre-aggregated data for reporting.

table_name: mart_payroll_summary
layer: marts
table_type: aggregate
source_table: ${ref('fact_payroll_run')}
columns:
  - department
  - SUM(gross_pay) as total_pay
  - COUNT(*) as payroll_count
primary_keys: [department]

🛠️ CLI Commands

`init` - Create Configuration

dataform-blueprints init -t <table_name> -l <layer> --table-type <type>

Options:

-t, --table-name: Table name (required)
-l, --layer: Layer (raw/staging/warehouse/marts) (required)
--table-type: Pattern (source/dimension/fact/aggregate) (required)
-o, --output: Output file path (default: table_config.yaml)

`generate` - Generate SQLX

dataform-blueprints generate -c <config.yaml> -o <output.sqlx>

Options:

-c, --config: YAML config file (required)
-o, --output: Output SQLX file
--validate-only: Only validate, don't generate

`batch` - Batch Generate

dataform-blueprints batch -d <configs_dir> -o <output_dir>

Options:

-d, --directory: Directory with YAML configs (required)
-o, --output-dir: Output directory (default: definitions)

`examples` - Show Examples

dataform-blueprints examples

📋 Configuration Reference

Required Fields

table_name: string       # Table name
layer: string           # raw | staging | warehouse | marts
table_type: string      # source | dimension | fact | aggregate | view
columns: list           # Column names

Optional Fields

source_table: string         # Source table reference (e.g., ${ref('table')})
partition_by: string         # Partition column
cluster_by: list            # Clustering columns
primary_keys: list          # Primary key columns
description: string         # Table description
tags: list                  # Tags for organization
incremental: bool           # Enable incremental (default: true)
dataset_id: string          # Override default dataset
dependencies: list          # Explicit dependencies
assertions: list            # Data quality assertions

🎯 Use Cases

Data Warehouse Modernization

Generate consistent SQLX files across your entire warehouse:

# Generate all staging tables
dataform-blueprints batch -d configs/staging -o definitions/staging

# Generate all dimensions
dataform-blueprints batch -d configs/dimensions -o definitions/warehouse

# Generate all facts
dataform-blueprints batch -d configs/facts -o definitions/warehouse

Rapid Prototyping

Quickly scaffold new tables:

dataform-blueprints init -t fact_sales -l warehouse --table-type fact
# Edit config...
dataform-blueprints generate -c fact_sales.yaml

CI/CD Integration

Validate configs in CI:

dataform-blueprints generate -c dim_employee.yaml --validate-only

🔧 Development

Setup

git clone https://github.com/yourusername/dataform-warehouse-blueprints.git
cd dataform-warehouse-blueprints
pip install -e ".[dev]"

Run Tests

pytest tests/ -v --cov=dataform_blueprints

Code Quality

black src/
flake8 src/

📖 Examples

See the examples/ directory for:

dim_employee_config.yaml - Dimension table
fact_payroll_config.yaml - Fact table
mart_payroll_summary_config.yaml - Aggregate table

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Submit a pull request

📄 License

MIT License - see LICENSE file for details

🔗 Links

PyPI: https://pypi.org/project/dataform-warehouse-blueprints/
GitHub: https://github.com/yourusername/dataform-warehouse-blueprints
Documentation: https://github.com/yourusername/dataform-warehouse-blueprints#readme

💡 Why Dataform Blueprints?

Consistency: Standardize table patterns across your warehouse
Speed: Generate tables 10x faster than writing SQL by hand
Best Practices: Partitioning, clustering, incremental built-in
Maintainability: YAML configs easier to review than SQL
Reusability: Share table patterns across projects/teams

🎓 Related Projects

This module is part of the Payroll & Workforce Analytics Modernization project:

synthetic-payroll-lab - Generate test data
scd2-bq-engine - SCD Type 2 automation
bq-finops-cli - BigQuery cost optimization

Built with ❤️ for data engineers who value clean, maintainable SQL.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Nov 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataform_warehouse_blueprints-0.1.0.tar.gz (16.5 kB view details)

Uploaded Nov 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataform_warehouse_blueprints-0.1.0-py3-none-any.whl (14.9 kB view details)

Uploaded Nov 23, 2025 Python 3

File details

Details for the file dataform_warehouse_blueprints-0.1.0.tar.gz.

File metadata

Download URL: dataform_warehouse_blueprints-0.1.0.tar.gz
Upload date: Nov 23, 2025
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for dataform_warehouse_blueprints-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d34623ad5874dcfd5a1d51d9d28b13bed619ddc9817115f6dfbf5cec4fda5c5`
MD5	`281ec8bc6e64f0b31dd7f9c38fd0ee0e`
BLAKE2b-256	`fe53d333d09e0327eded745622e71400fa385ff39f8a97d5a9915d5b46ebe0af`

See more details on using hashes here.

File details

Details for the file dataform_warehouse_blueprints-0.1.0-py3-none-any.whl.

File metadata

Download URL: dataform_warehouse_blueprints-0.1.0-py3-none-any.whl
Upload date: Nov 23, 2025
Size: 14.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for dataform_warehouse_blueprints-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6bc435def260979d21e6d88511d4ed77c0267b7e76566f4c352ddc277710087`
MD5	`f73444379eb9d692df16e38a60ef16c3`
BLAKE2b-256	`e177bee79c4440378edfc91e5464b4e1fa929b7c176f84ab25afb65542d1de7e`

See more details on using hashes here.

dataform-warehouse-blueprints 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dataform Warehouse Blueprints

🎯 Features

📦 Installation

🚀 Quick Start

CLI Usage

Python API

📚 Table Patterns

1. Staging Tables (Raw → Staging)

2. Dimension Tables (SCD Type 1)

3. Fact Tables

4. Aggregate/Mart Tables

🛠️ CLI Commands

init - Create Configuration

generate - Generate SQLX

batch - Batch Generate

examples - Show Examples

📋 Configuration Reference

Required Fields

Optional Fields

🎯 Use Cases

Data Warehouse Modernization

Rapid Prototyping

CI/CD Integration

🔧 Development

Setup

Run Tests

Code Quality

📖 Examples

🤝 Contributing

📄 License

🔗 Links

💡 Why Dataform Blueprints?

🎓 Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`init` - Create Configuration

`generate` - Generate SQLX

`batch` - Batch Generate

`examples` - Show Examples