Skip to main content

Draft dbt schema.yml, tests, and docs with an LLM, then prune candidates against real warehouse data so only signal-bearing artifacts ship.

Project description

codecov docs

SignalForge

LLM-drafted dbt schema.yml, tests, and docs — pruned against real warehouse data so only signal-bearing tests ship.

Status: v0.1 alpha. Eleven issues shipped — single-model draft + warehouse prune, BigQuery adapter, signalforge CLI, signalforge init-demo for first-run UX. Designing in the open on the dev branch.

Why this exists

Authoring schema.yml, tests, and documentation is the most-cited drudgery in the dbt ecosystem. AI tools that generate them already exist — dbt Copilot, dbt-codegen, Paradime DinoAI, Altimate datapilot — but their output is consistently described the same way: noise. Hundreds of not_null and unique tests that always pass. Generic docstrings that paraphrase the column name. Schemas that drift from the SELECT.

SignalForge generates the same artifacts, then asks a different question: does this test produce signal? Every candidate test is run against your real warehouse data. Tests that always pass are dropped. Docs are graded against a project-specific rubric. Only signal-bearing artifacts are written to disk.

And you don't have to start from SignalForge's own drafts. Point it at a schema.yml that dbt Copilot, dbt-codegen, DinoAI, datapilot — or your own hands — already produced, and it prunes that: signalforge prune-existing <model> --schema <path> runs the same warehouse-backed prune over your existing tests, no LLM call required.

What it does

  • Drafts schema.yml from your model SQL using an LLM with project-aware context (manifest, sibling models, your team's terminology).
  • Generates testsnot_null, unique, accepted_values, relationships, plus dbt-expectations-style data tests where appropriate.
  • Prunes the noise. Each candidate test runs against warehouse samples; tests that pass on every row of historical data add no signal and are dropped before they reach your repo.
  • Generates documentation — column-level descriptions and model-level overviews — graded by an LLM-as-judge against a configurable rubric.
  • Reports what was kept and what was dropped, with a one-line "why" per artifact. No black-box generation.
  • Prunes tests you already have (v0.2). Point it at an existing schema.yml — from dbt-codegen, dbt Copilot, DinoAI, datapilot, or hand-written — and the warehouse tells you which of those tests add no signal. Same prune step, no LLM call (signalforge prune-existing).

How it works

┌──────────────┐    ┌─────────────┐    ┌──────────────┐    ┌──────────────┐
│ model.sql +  │ -> │ LLM drafts  │ -> │ Run tests    │ -> │ Quality-     │
│ manifest +   │    │ candidate   │    │ against the  │    │ graded YAML  │
│ project ctx  │    │ artifacts   │    │ warehouse    │    │ + diff       │
└──────────────┘    └─────────────┘    └──────────────┘    └──────────────┘
                                              │
                                              v
                                       Drop always-pass tests;
                                       drop tests that fail on
                                       known-clean data.

The grading layer reuses clauditor's LLM-as-judge methodology, applied to a new artifact class.

There's a second entry point that skips the LLM entirely. If you already have a schema.yml (from another generator or written by hand), signalforge prune-existing reads its tests and runs them straight through the prune step:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ existing     │ -> │ Run tests    │ -> │ diff: which  │
│ schema.yml   │    │ against the  │    │ tests add    │
│ + manifest   │    │ warehouse    │    │ signal       │
└──────────────┘    └──────────────┘    └──────────────┘

No draft, no grade, no LLM call — just "which of these tests earn their place?" Tests SignalForge can't evaluate (custom / dbt-expectations / namespaced generics) are reported as skipped, never silently dropped.

Status (v0.1): Live on PyPI — pip install signalforge-dbt. See Quick start.

Quick start

The wheel ships a minimal dbt demo project (Austin bikeshare staging model against the public bigquery-public-data.austin_bikeshare.bikeshare_trips dataset), copied out of the install via signalforge init-demo, so you can run signalforge end-to-end against a real warehouse with no infrastructure beyond a Google Cloud billing project and an Anthropic API key. A run scans ~200–500 MB of BigQuery (well under $0.01 at on-demand pricing) plus ~$0.13 of Anthropic spend (one draft call + ~84 grade calls on Sonnet 4.6); end-to-end wall-clock is roughly 5–6 minutes.

1. Install

SignalForge requires Python 3.11+.

pip install signalforge-dbt

Verify the CLI is on your PATH:

signalforge --version

The PyPI distribution name is signalforge-dbt (the bare signalforge name is held by an unrelated DSP package); the import package and CLI command are both signalforge.

Prefer an isolated CLI install? uv tool install signalforge-dbt (or pipx install signalforge-dbt) puts the signalforge command on your PATH without adding it to a project environment.

Working from a clone (contributing)? Install the dev toolchain with uv sync --dev — see CONTRIBUTING.md for the full workflow.

2. Authenticate to BigQuery and Anthropic

gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT=<your-billing-project>   # any GCP project you have query access to
export ANTHROPIC_API_KEY=sk-ant-...

Use a fresh shell session (or unset ANTHROPIC_API_KEY after the run) so the key doesn't persist in your bash history.

3. Minimum signalforge.yml

The fixture ships a working config; a minimum that exercises the full pipeline is:

# signalforge.yml — alongside dbt_project.yml
llm:
  model: claude-sonnet-4-6
safety:
  mode: aggregate-only   # schema-only is the default; aggregate-only sends column profiles, never row data
prune:
  sample_strategy: materialised   # v0.2 default; one temp-table CTAS feeds every per-test query
grade:
  min_pass_rate: 0.95
  min_mean_score: 0.95
  fail_on_below_threshold: false   # report-only; flip to true to exit 2 on flagged artifacts

Full reference: docs/safety-ops.md, docs/prune-ops.md, docs/grade-ops.md.

4. Prepare the fixture

Copy the bundled demo project to a writable directory and run signalforge against it:

signalforge init-demo /tmp/sf-austin

5. Pre-flight check (signalforge lint)

Before paying for an LLM call, run the pre-flight validator. It loads signalforge.yml (every per-stage block) and the dbt manifest — no warehouse calls, no Anthropic calls, no network — and reports every failure in one shot. Sub-second; catches typos like safety: { mdoel: ... } that the extra="forbid" config models would otherwise surface only after a billable generate run, plus manifest schema-version mismatches (e.g. dbt 1.13 → v13, outside the supported v9–v12 range) that would otherwise surface mid-pipeline:

signalforge lint --project-dir /tmp/sf-austin

On success, stdout is silent (git-style) and the exit code is 0. Failures are listed on stderr with the offending block(s) named — single-failure runs use the ERROR: <message> shape; multi-failure runs emit a header + one bullet per block. Pass --model <name> to also confirm a specific model resolves in the manifest (accepts a bare name, a unique_id, or a file path). See docs/cli-ops.md § signalforge lint for the full contract.

6. First run

signalforge generate models/staging/stg_bikeshare_trips.sql --project-dir /tmp/sf-austin

The bundled profiles.yml reads GOOGLE_CLOUD_PROJECT from your environment, so no profile editing is required. signalforge init-demo prints a next-steps message naming the env vars and the exact commands to run; pass --force to atomically replace a non-empty destination (refuses /, $HOME, and the current working directory as a blast-radius guard).

Want to preview cost first? signalforge generate --estimate <model> prints the projected USD + warehouse bytes without making any billable Anthropic or warehouse call (one count_tokens round-trip per prompt plus a single BigQuery dryRun). See docs/cli-ops.md § --estimate for the full contract.

7. Expected output

The diff lists drafted column descriptions and signal-bearing tests alongside dropped tests with a one-line "why". Every artifact lands in one of four tiers — kept (survived prune with positive evidence), kept-uncertain (kept, but the warehouse couldn't be reached to evaluate it — e.g. a budget or connectivity issue), dropped (prune found it adds no signal), and flagged (kept, but graded below the quality threshold). The table looks like this (truncated):

diff: model.austin.stg_bikeshare_trips  kept=8  kept-uncertain=0  dropped=2  flagged=1

TIER      ARTIFACT                      TEST            REASON                  SCORE    WHY
kept      column.trip_id.description                                            0.97     Description added; passed all grading criteria.
kept      test.column.trip_id.not_null  not_null                                —        Test returned non-zero failing rows on the warehouse sample.
dropped   test.column.region.not_null   not_null        always-passes           —        Test returned zero failing rows on the representative sample.
flagged   column.bike_id.description                                            0.45     Grading score 0.45 below threshold 0.95.
...

At least one dropped row with always-passes is mathematically guaranteed — the fixture's staging SQL aliases a literal 'austin' AS region column, so any LLM-drafted not_null on it must always-pass and the prune engine drops it. The strict 0.95 grade thresholds in the fixture config typically surface at least one flagged artifact.

A high drop rate is the working state, not the failure state. A typical staging model drops ~60-80% of the LLM-drafted tests as always-passes — the LLM proposes broadly and the prune layer trims the ones the warehouse data doesn't contradict. Internal testing on bigquery-public-data.austin_bikeshare.bikeshare_trips shows 5 of 8 drafted tests dropped (62.5%); see docs/prune-ops.md § Expected drop rates for the per-test-type breakdown.

Two durable artefacts land under /tmp/sf-austin/.signalforge/: grade.json (per-criterion LLM-judge scores) and diff.json (the full rendered diff). The committed .gitignore covers .signalforge/.

Troubleshooting

Symptom Likely cause Fix
User does not have bigquery.jobs.create permission in project bigquery-public-data GOOGLE_CLOUD_PROJECT not set; SDK fell back to the source project Export GOOGLE_CLOUD_PROJECT=<billing-project> where you have the BigQuery Job User role
Query exceeded max_bytes_billed (limit=100000000, ...) Editing the profile dropped or lowered maximum_bytes_billed Keep maximum_bytes_billed: 1000000000 (1 GB) — the bundled demo profiles.yml ships this cap intentionally so the materialised-sample scan clears the adapter's 100 MB default
Manifest not found / dbt_project.yml not found at ... CLI walked up from the wrong cwd, or --project-dir doesn't directly contain dbt_project.yml Either cd into the project root, or pass --project-dir <abs-path> pointing at the directory holding dbt_project.yml
aggregate_complete=False in grade.json Network blip during a grade call exhausted retries Re-run; if it persists, raise grade.total_budget_seconds in signalforge.yml
LLM response did not match the CandidateSchema shape Anthropic response shape drifted vs. the parser Set ANTHROPIC_LOG=info and inspect ~/.anthropic-debug/; file an issue

Full per-flag reference, exit-code taxonomy, and environment variables: docs/cli-ops.md. For multi-model dbt projects, see Running across many models for the --select flag and shell-loop pattern. Maintainer-only walkthrough of the same flow as a gated test (pytest -m e2e --no-cov): docs/e2e-smoke-test.md.

Prune the tests you already have

If you already have a schema.yml — written by hand, or generated by dbt-codegen / dbt Copilot / DinoAI / datapilot — you don't need SignalForge to redraft it. Point prune-existing at it and the warehouse tells you which of those tests add signal. There's no LLM call, so the only requirement is warehouse access (a dbt profile).

Availability: prune-existing is a v0.2 feature, in development on the dev branch — it is not in the current pip install signalforge-dbt (v0.1) release. To use it now, install from source: pip install "signalforge-dbt @ git+https://github.com/wjduenow/SignalForge.git@dev".

# From inside your dbt project (with target/manifest.json present):
signalforge prune-existing customers --schema models/marts/schema.yml

What you get on stdout is a diff of your file: a kept / kept-uncertain / dropped table with a one-line "why" per test, plus a unified diff showing exactly which tests to remove. Tests SignalForge doesn't yet evaluate — custom generics, dbt_utils.*, dbt_expectations.* — are summarised on stderr as skipped (run with --verbose for the per-test breakdown), never silently dropped.

It is read-only by design: there is no --write flag, so your hand-authored file is never overwritten. The rendered diff goes to stdout and a machine-readable copy to .signalforge/diff.json (--dry-run suppresses even that). Apply the removals yourself from the diff. See docs/cli-ops.md § signalforge prune-existing for the full flag set and docs/ingest-ops.md for which dbt test shapes are supported vs. skipped.

CLI

The CLI exposes five subcommands (the first four ship in the v0.1 PyPI release; prune-existing is on the in-development v0.2 line — see Prune the tests you already have):

signalforge generate <model>                     # full draft -> prune -> grade -> diff pipeline for one model
signalforge prune-existing <model> --schema <p>  # prune an existing schema.yml's tests (ingest -> prune -> diff, no LLM) [v0.2]
signalforge init-demo [<dest>]                   # copy the bundled Austin demo project into <dest>
signalforge lint                                 # validate signalforge.yml config blocks (no LLM/warehouse calls)
signalforge version                              # print the SignalForge version

Key generate flags: --project-dir, --manifest, --profiles-dir (point at the project / manifest / profile); --mode {schema-only,aggregate-only,sample} and --min-score (pipeline behaviour); --write / --dry-run and --format {ansi,markdown,json} (output); --estimate (cost preview, no billable calls); --select <expr> (run across many models); --scope, --sample-strategy; and the --quiet / --verbose / --no-color observability triad. prune-existing takes the required --schema <path> plus --project-dir, --manifest, --profiles-dir, --scope, --sample-strategy, --format {ansi,markdown,json}, --dry-run, and the --quiet / --verbose / --no-color triad — it is read-only by design (no --write) and makes no LLM call. init-demo takes --force; lint takes --config, --manifest, --model, --project-dir.

signalforge --help prints the top-level help; each subcommand has its own --help page. See docs/cli-ops.md for the full reference, exit-code taxonomy, and environment variables.

Configuration

Configuring the BigQuery adapter

SignalForge reads your dbt profile and instantiates a BigQueryAdapter via WarehouseAdapter.from_profile(profile). See docs/warehouse-adapter-ops.md for ADC setup, cost defaults, sampling strategy (and the TABLESAMPLE cost-asterisk), PartitionFilter use, and the typed-error reference.

Data safety

Schema-only is the default. The LLM never sees row data unless you explicitly opt in via safety.mode: sample in signalforge.yml (or the --mode sample CLI flag). Even column names that match the built-in PII patterns (*email, *phone, *ssn) — or that you flag via dbt tags: ["pii"] / meta.contains_pii: true / meta.signalforge.sample: false — are replaced with stable hashed placeholders (col_<8 hex>) before reaching the LLM.

Note: the prune step runs warehouse SQL on every invocation regardless of safety.mode. To skip prune entirely (no warehouse contact), set prune.enabled: false in signalforge.yml — see docs/prune-ops.md.

Every LLM call produces one structured record at .signalforge/audit.jsonl (default; configurable via safety.audit_path). The file contains plaintext column-name metadata and should be treated as sensitive: this repo's .gitignore already covers .signalforge/; the writer creates the directory at 0o700 and the audit file at 0o600. The audit writer is fail-closed — if the write fails, the LLM call is aborted (no silent drafts without an audit trail). See docs/safety-ops.md for the JSONL schema.

Full reference — mode semantics, the four opt-out signals and their precedence, the signalforge.yml schema, the audit schema, debugging, and the typed-error reference — is in docs/safety-ops.md.

LLM drafting

How drafting works

signalforge.draft.draft_schema takes a manifest model + warehouse adapter + safety policy and returns a DraftOutcome carrying the parsed CandidateSchema, the typed LLMRequest that was sent, and the LLMResult from the LLM. One LLM call per model; pre-send token counting, the full retry taxonomy, prompt caching, and a fail-closed response audit are all owned by the layer.

Manifest + Model + LLMRequest (from safety layer)
  -> render_prompt  (system + cached manifest summary + dynamic per-model SQL)
  -> call_anthropic (1 SDK seam, full retry taxonomy, prompt caching)
  -> parse_draft_response (JSON + anchor-contract validator)
  -> write_response_event (fail-closed JSONL audit)
  -> DraftOutcome(candidate, request, result)

Auditability

Two parallel audit streams sit under policy.audit_path.parent:

  • audit.jsonl (safety layer) records WHAT data went to the LLM — columns sent, redactions applied, sampling mode in effect.
  • llm_responses.jsonl (draft layer) records WHAT the LLM produced — hashes of the response text, the parsed schema, and the SQL sent; token usage including cache creation/read; the prompt_version.

Both streams are fail-closed: an audit-write failure aborts the call, the partial work is dropped, and an unaudited LLM call cannot silently happen. A reviewer correlates the two streams by model_unique_id + timestamp window. See docs/draft-ops.md for the response-audit schema, the retry taxonomy, the cache pre-send checks, and the typed-error reference.

Roadmap

Version Scope
v0.1 Single-model draft + warehouse prune; first warehouse adapter (BigQuery); CLI only
v0.2 Prune externally-authored tests (prune-existing); additional warehouse adapters (Snowflake, Postgres); project-wide drift detection
v0.3 GitHub Action with PR comment integration
v0.4 Rubric customization; organization-wide style profiles
v1.0 dbt Fusion engine compatibility; dbt MCP server consumption

The architecture is warehouse-agnostic — adapters plug in behind a thin sampling/profiling interface. BigQuery is the v0.1 target because of its generous query-bytes pricing for sampled reads and its first-class INFORMATION_SCHEMA.JOBS history for downstream cost analysis. Snowflake, Databricks, Postgres, and Redshift are all on the roadmap; PRs welcome.

Detail is tracked in GitHub Issues against this repo.

Design principles

  1. Signal over volume. A test that always passes is worse than no test — it consumes review attention without catching anything. SignalForge's job is to produce fewer, better artifacts.
  2. Evaluation in the loop. Generation without grading is what produced the current "AI-test fatigue." Every artifact SignalForge ships has been scored.
  3. OSS-first, Core-friendly. No dependency on dbt Cloud. Runs against any dbt-core project, locally or in CI.
  4. Explainable diffs. Every kept and dropped artifact has a one-line "why." Reviewers see what changed and what the tool's reasoning was.
  5. Permissive license. Apache-2.0. Use it commercially, vendor it, embed it.

Related projects

  • clauditor — the LLM-graded evaluation framework SignalForge's quality layer is built on.
  • dbt-codegen — the rule-based YAML scaffolder SignalForge complements (codegen scaffolds; SignalForge drafts, prunes, and grades).
  • dbt-osmosis — schema.yml management and propagation; orthogonal concern.
  • Recce — PR-time data diff for dbt; complementary, addresses a different pain point.

License

Apache-2.0. See LICENSE.

Contributing

Pre-alpha — issues welcome to shape the design. Open one against the dev branch describing the use case you'd like SignalForge to handle. Code contributions will open with the v0.1 milestone.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforge_dbt-0.2.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforge_dbt-0.2.0-py3-none-any.whl (394.5 kB view details)

Uploaded Python 3

File details

Details for the file signalforge_dbt-0.2.0.tar.gz.

File metadata

  • Download URL: signalforge_dbt-0.2.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for signalforge_dbt-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d8a149e321bc1bd135a29b54b33550eb01cdec0cfe8d240556ba1255e91e4e1d
MD5 d1c825d599c3886b1febdfeb79e7b314
BLAKE2b-256 1f40f1c6f5a51d2d0db43210f2e9c55e89dd6cf12f49ca41647d4de74b44f1d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforge_dbt-0.2.0.tar.gz:

Publisher: publish.yml on wjduenow/SignalForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signalforge_dbt-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: signalforge_dbt-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 394.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for signalforge_dbt-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dc63191cedc0c961ee0f04f50c95f67eaa2571b4791a3249e92d7ebfd4a13ef
MD5 026772fbe8aeedd29dd3c2ae53750c17
BLAKE2b-256 c310ca5c9070fab7d533798c589d42b825643f751c5d218f91981464e897d2a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for signalforge_dbt-0.2.0-py3-none-any.whl:

Publisher: publish.yml on wjduenow/SignalForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page