Skip to main content

BigQuery dbt model scaffolder from YAML data contracts. Generates Data Products with auto introspection.

Project description

Documentation PyPI - Version PyPI - License PyPI - Python Version Coverage

👻 wraith-modelgen

Sovereign dbt Scaffolding for the Ghost Stack

wraith-modelgen is a BigQuery dbt model scaffolder. Feed it a YAML data contract, get back a pair of sovereign data products: an Origin (raw) layer that passes everything through, and a Consumption (staging) layer that holds the stable line for downstream consumers.

Source schemas change. Columns get renamed. Types get tightened. New fields appear without ceremony. In most analytics codebases this is a downstream catastrophe: dashboards break, FDPs fail, the analytics team gets paged at 03:00, and someone files a ticket asking whether dbt itself is broken.

wraith-modelgen makes the data contract the change-management mechanism. The source team updates it as part of their release. wraith-modelgen regenerates the models. Downstream sees what it sees. Nothing breaks unless something it actually depended on disappears, which triggers a generation failure with the column name.


🚀 Installation

uv tool install wraith-modelgen

You'll also need Application Default Credentials for BigQuery introspection:

gcloud auth application-default login

⚙️ Workflow

modelgen gen contract.yml --raw     -o ./models/raw
modelgen gen contract.yml --staging -o ./models/staging

One layer per invocation. --raw and --staging are mutually exclusive.

modelgen validate contract.yml      # validate YAML without hitting BigQuery
modelgen gen contract.yml --raw --dry-run  # preview without writing

📄 Contract anatomy

version: "1"

event:
  name: user_signed_up                 # the entity
  unique_key: EVENT_ID                 # SOURCE column name (composite: [a, b])
  loaded_at_field: RECEIVED_AT         # SOURCE column name

  source:
    project: my-gcp-project
    dataset: landing
    table: user_signups_raw

  raw:
    dataset: raw
    incremental_strategy: merge
    dedup: true                        # row_number() partition by unique_key
    partition_by:
      field: RECEIVED_AT
      data_type: timestamp
      granularity: day
    cluster_by: [USER_ID]

  staging:
    dataset: staging
    incremental_strategy: merge
    partition_by:
      field: received_at               # staging-side name (post-rename)
      data_type: timestamp
      granularity: day
    cluster_by: [user_id]

    columns:
      - source: EVENT_ID               # name in raw (== name in source)
        name: event_id                 # name in staging
        type: STRING                   # cast target
        description: "..."
        tests: [not_null, unique]

🔄 Schema evolution

Source change What wraith-modelgen does What you do
Adds a column Flows into raw via select *. Invisible in staging. Add it to staging when consumers need it.
Renames a column Validation fails: column not found in source. Update the source: field on that column. Also unique_key / loaded_at_field if applicable.
Retypes a column Existing CAST in staging absorbs it (or fails loudly at query time if values are incompatible). Update type: if the canonical type should change too.
Drops a column staging uses Validation fails: column not found. Either restore upstream or remove from staging contract.

Validation runs as part of modelgen gen --staging. If it passes, the generated staging model still presents the same contract to downstream consumers.


🏗️ What gets generated

For --raw:

  • raw__event.sql: dbt incremental model with {{ source(...) }}, optional dedup window, partition and cluster config.
  • raw__event.yml: dbt sources entry plus model definition. Columns mirror the introspected source schema.

For --staging:

  • stg__event.sql: dbt incremental model with {{ ref('raw__event') }}. Casts and renames applied.
  • stg__event.yml: model definition with column tests from the contract.

🧪 Developer Quality Gate

# Clone and set up
git clone https://git.thomaspeoples.com/thomaspeoples/wraith-modelgen
cd wraith-modelgen
uv run poe setup        # syncs deps + installs pre-commit hooks

# The quality gate
uv run poe test         # pytest (interactive)
uv run poe test-ci      # pytest with coverage enforcement (≥80%)
uv run poe lint         # ruff check
uv run poe format       # ruff format

The test suite uses FakeIntrospector and runs without BigQuery credentials. Covers contract parsing, both layers, schema evolution scenarios, composite keys, REPEATED/RECORD types, determinism, and error surfaces.

Committing

All commits go through commitizen with the Ghost Stack convention:

👻 <type>/<ticket>: <message>
uv run cz commit

📜 Sovereign Principles

  1. One layer per invocation. Layers have different lifecycles; conflating them makes things harder to reason about.
  2. Introspection at gen time. The warehouse is the source of truth for column names and types. Drift is impossible because nothing is duplicated.
  3. Deterministic output. Same contract + same source schema = byte-identical files. CI can diff against committed output to catch drift.
  4. Strict failure on missing columns. No silent passes. If the contract references a column the source no longer has, generation fails with the column name.
  5. BigQuery only. The introspection module is BQ-native. Add a different Introspector implementation if you need another warehouse.

Part of the Ghost Stack. Sovereign. Self-hosted. No nonsense.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wraith_modelgen-0.5.0.tar.gz (121.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wraith_modelgen-0.5.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file wraith_modelgen-0.5.0.tar.gz.

File metadata

  • Download URL: wraith_modelgen-0.5.0.tar.gz
  • Upload date:
  • Size: 121.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wraith_modelgen-0.5.0.tar.gz
Algorithm Hash digest
SHA256 384ae99ba30fdd4f0fe9df4f86e24544722688bf4a180e8b8be77a5819b3f822
MD5 6a4c5e9c0b1dc1271edda3eeaec66496
BLAKE2b-256 d90db20687ec791980ae859c8911e03315765b0e3973cd449bfe12d3cce02f6c

See more details on using hashes here.

File details

Details for the file wraith_modelgen-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: wraith_modelgen-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wraith_modelgen-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c948f402949133aadd8a7a49f78f5275216862ce7919b9af856fb83e6e1d8fd2
MD5 d8ff658e7488fad677b816330539de94
BLAKE2b-256 e8a108711666dc66e2711338c85ac44abf46d78393829b19f5275def477e8e6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page