Skip to main content

BigQuery dbt model scaffolder from YAML data contracts. Generates Data Products with auto introspection.

Project description

Documentation PyPI - Version PyPI - License PyPI - Python Version

👻 wraith-modelgen

Sovereign dbt Scaffolding for the Ghost Stack

wraith-modelgen is a BigQuery dbt model scaffolder. Feed it a YAML data contract, get back a pair of sovereign data products: an Origin (raw) layer that passes everything through, and a Consumption (staging) layer that holds the stable line for downstream consumers.

Source schemas change. Columns get renamed. Types get tightened. New fields appear without ceremony. In most analytics codebases this is a downstream catastrophe: dashboards break, FDPs fail, the analytics team gets paged at 03:00, and someone files a ticket asking whether dbt itself is broken.

wraith-modelgen makes the data contract the change-management mechanism. The source team updates it as part of their release. wraith-modelgen regenerates the models. Downstream sees what it sees. Nothing breaks unless something it actually depended on disappeared — at which point you want to know.


🚀 Installation

uv tool install wraith-modelgen

You'll also need Application Default Credentials for BigQuery introspection:

gcloud auth application-default login

⚙️ Workflow

modelgen gen contract.yml --raw     -o ./models/raw
modelgen gen contract.yml --staging -o ./models/staging

One layer per invocation. --raw and --staging are mutually exclusive.

modelgen validate contract.yml      # validate YAML without hitting BigQuery
modelgen gen contract.yml --raw --dry-run  # preview without writing

📄 Contract anatomy

version: "1"

event:
  name: user_signed_up                 # the entity
  unique_key: EVENT_ID                 # SOURCE column name (composite: [a, b])
  loaded_at_field: RECEIVED_AT         # SOURCE column name

  source:
    project: my-gcp-project
    dataset: landing
    table: user_signups_raw

  raw:
    dataset: raw
    incremental_strategy: merge
    dedup: true                        # row_number() partition by unique_key
    partition_by:
      field: RECEIVED_AT
      data_type: timestamp
      granularity: day
    cluster_by: [USER_ID]

  staging:
    dataset: staging
    incremental_strategy: merge
    partition_by:
      field: received_at               # staging-side name (post-rename)
      data_type: timestamp
      granularity: day
    cluster_by: [user_id]

    columns:
      - source: EVENT_ID               # name in raw (== name in source)
        name: event_id                 # name in staging
        type: STRING                   # cast target
        description: "..."
        tests: [not_null, unique]

🔄 Schema evolution

Source change What wraith-modelgen does What you do
Adds a column Flows into raw via select *. Invisible in staging. Add it to staging when consumers need it.
Renames a column Validation fails: column not found in source. Update the source: field on that column. Also unique_key / loaded_at_field if applicable.
Retypes a column Existing CAST in staging absorbs it (or fails loudly at query time if values are incompatible). Update type: if the canonical type should change too.
Drops a column staging uses Validation fails: column not found. Either restore upstream or remove from staging contract.

Validation runs as part of modelgen gen --staging. If it passes, the generated staging model still presents the same contract to downstream consumers.


🏗️ What gets generated

For --raw:

  • raw__event.sql — dbt incremental model with {{ source(...) }}, optional dedup window, partition and cluster config.
  • raw__event.yml — dbt sources entry plus model definition. Columns mirror the introspected source schema.

For --staging:

  • stg__event.sql — dbt incremental model with {{ ref('raw__event') }}. Casts and renames applied.
  • stg__event.yml — model definition with column tests from the contract.

🧪 Developer Quality Gate

# Clone and set up
git clone https://git.thomaspeoples.com/thomaspeoples/wraith-modelgen
cd wraith-modelgen
uv run poe setup        # syncs deps + installs pre-commit hooks

# The quality gate
uv run poe test         # pytest (interactive)
uv run poe test-ci      # pytest with coverage enforcement (≥80%)
uv run poe lint         # ruff check
uv run poe format       # ruff format

The test suite uses FakeIntrospector and runs without BigQuery credentials. Covers contract parsing, both layers, schema evolution scenarios, composite keys, REPEATED/RECORD types, determinism, and error surfaces.

Committing

All commits go through commitizen with the Ghost Stack convention:

👻 <type>/<ticket>: <message>
uv run cz commit

📜 Sovereign Principles

  1. One layer per invocation. Layers have different lifecycles; conflating them makes things harder to reason about.
  2. Introspection at gen time. The warehouse is the source of truth for column names and types. Drift is impossible because nothing is duplicated.
  3. Deterministic output. Same contract + same source schema = byte-identical files. CI can diff against committed output to catch drift.
  4. Strict failure on missing columns. No silent passes. If the contract references a column the source no longer has, generation fails with the column name.
  5. BigQuery only. The introspection module is BQ-native. Add a different Introspector implementation if you need another warehouse.

Part of the Ghost Stack. Sovereign. Self-hosted. No nonsense.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wraith_modelgen-0.4.0.tar.gz (116.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wraith_modelgen-0.4.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file wraith_modelgen-0.4.0.tar.gz.

File metadata

  • Download URL: wraith_modelgen-0.4.0.tar.gz
  • Upload date:
  • Size: 116.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wraith_modelgen-0.4.0.tar.gz
Algorithm Hash digest
SHA256 55586c41a1bbc5c6b6d12454b3c9e97561ca2fc9c909fb8900a3d15fa60b4237
MD5 52fdc61e625fdb537030b377aa64386f
BLAKE2b-256 f8326a67563e67b0d680b0bd4d1fe9e24bd664f6b124bbd0ac188c822b44c097

See more details on using hashes here.

File details

Details for the file wraith_modelgen-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: wraith_modelgen-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wraith_modelgen-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d1178a4f7cd5dd8caa7957aa7fad17eea21b64d8dc78304a23789417b0ea76c
MD5 4bb8610698e94803fb950f568dec72fd
BLAKE2b-256 9229c8b4f26d2bd3741ea4b6d0a619467811a2ef6aab81bc14897299d6ad1bb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page