Read-only BigQuery cost-audit tool — single-user, gcloud ADC only, no GCS / no GitHub / no dbt installation.
Project description
governor-audit
Read-only BigQuery cost-audit tool for single-user production audits. v0.4.15.
Posture: Single-user. gcloud ADC only. No GCS, no GitHub, no service-account JSON, no dbt installation, no shadow validation. The only thing it talks to over the network is BigQuery — and only to query
INFORMATION_SCHEMA.JOBS_BY_PROJECT,INFORMATION_SCHEMA.COLUMNS,INFORMATION_SCHEMA.TABLE_STORAGE, andINFORMATION_SCHEMA.SCHEMATA_OPTIONS.
When to use this vs. the other governor packages
governor-audit(this package): you have read access to a prod BigQuery project. You want a fast cost audit + detection findings without touching the dbt source code, running dbt, or setting up cloud infrastructure.governor-cli: you have the dbt project source on your machine and want to run dbt + propose fixes locally.governor-web: you operate the platform; you want shared infrastructure (GCS-backed manifests, GitHub PRs, scheduled syncs) for a team.
What you get
- Setup wizard — first run walks you through ADC sign-in (
gcloud auth application-default login) → pick BigQuery project + region + lookback (defaults to 24 hours) → first scan kicks off in the background. You land on the configurations page with a spinner card and an auto-refresh, then auto-redirect to the dashboard once the scan completes. - Permission-gated project picker — the picker auto-filters to projects where the active gcloud principal has both
bigquery.jobs.listAll(forINFORMATION_SCHEMA.JOBS_BY_PROJECT) andbigquery.tables.list(forINFORMATION_SCHEMA.TABLE_STORAGE). Probes run in parallel viaProjectsClient.test_iam_permissions(~5–10s for 700-project orgs on first render) and cache per-principal in-process, so org admins aren't drowning in unauditable entries. - Dashboard — Total / Build / Consumption / Flagged spend KPI cards, top-20 spenders bar chart (click a bar to open its top opportunity), and a Top Cost Drivers with Issues and Suggestions Found table with click-to-sort columns plus per-row issue and suggestion counts. Plus a Storage Optimization panel listing per-dataset physical-billing opportunities with copyable
ALTER SCHEMAactions. - Detection engine — every enabled rule from
governor_core.opportunities.rulesruns against each cached job. Each detection candidate persists as its ownOpportunityrow, deduped by(rule_type, affected_table):- Issues (real cost / performance problems):
slot_contention,join_explosion,partition_pruning,shuffle_spill,storage_billing_optimization. Also collect SQL-rewrite suggestions that fire on the same destination table and surface them inline as additional cards. - Suggestions (code-quality SQL rewrites):
dead_cte,dead_column,dead_window_expression,unused_aggregation_output,redundant_order_by,unused_join,select_star,cross_join_unaggregated,self_join_anti_pattern. Suggestions also surface as standalone opportunities so models with no underlying issue still appear in/opportunities.
- Issues (real cost / performance problems):
- Opportunities workspace (
/opportunities) — listing aggregated to one row peraffected_table(one entry per object, not per rule). Filter row: Search, Dataset, Author, Issue (issue rule types only), binary Workload (Build / Consumption), binary Suggestions (With / Without). The Issue column lists every rule that fired on the table when ≤ 3 rules; for 4+ it shows the first three plus+ N morewith the full list in the tooltip. Improvement-only rows show "—" in the Issue column. Click any row to open the detail page. - Job history view (
/jobs/{project.dataset.table}) — every cached execution of a recurring job rolled up into one page. Cost-trend line chart (one point per UTC day with ≥ 1 execution; days without are skipped) with markers showing when the query body changed mid-window. Query versions section lists each distinct query body chronologically with a unified diff against the previous one. Available improvements surface only the LATEST version's suggestions; older versions get aresolvedchip when the latest version no longer triggers a rule, and anew in this versionchip when a later version regresses. Drill into a single execution via any chart point. See spec 146. - Opportunity detail — one consistent layout regardless of whether the row is an issue or a standalone suggestion. Header strip with
N issues/N suggestionspills + author + dataset, then a 4-up stat strip (Issues found with rule names listed, Cost (this execution), Bytes Processed, Slot Time), a collapsible Cost trend for this table chart (linking out to/jobs/{table}for the full cohort view), an Issues found section with one bordered card per issue rule paired side-by-side with its Why this fired / Recommended fix description, an Available improvements section with each deterministic SQL rewrite as a tabbed Diff / Original SQL card (rendered fromgovernor_core.solutions.templates), and finally Evidence + BigQuery query lookup panels at the bottom. No severity scores anywhere — the audit dropped them as noise. - Settings — three rule pages plus account / appearance / LLM:
- Issues (
/admin/settings/issues) — toggle the four query-side issue rules (slot_contention,join_explosion,partition_pruning,shuffle_spill). - Suggestions (
/admin/settings/suggestions) — toggle the suggestion rules. - Storage Billing (
/admin/settings/storage-billing) — toggle thestorage_billing_optimizationrule and tweak its pricing parameters. - Plus Account (gcloud principal + ADC probe), Appearance (light / dark / system), AI / LLM (Gemini API key, optional — reviewer code lands in a future release).
- Audit owns its own enable defaults: every rule is on unless you've explicitly toggled it off. The cloud catalog's per-rule defaults are intentionally ignored here.
- Issues (
- Scan query preview — the configurations page shows the exact
INFORMATION_SCHEMASQL the next scan will run, with resolved timestamps, on a Jobs / Columns / Storage tabbed panel and a copy button. - Scan history — every scan run logged with timestamp, status (running / succeeded / failed), project, region, lookback, and job count. Failed scans carry a tooltip with the failure reason.
- Background scans — clicking Run scan on the configurations page (or submitting the setup wizard) kicks off the scan in a daemon thread and lands you back on the page immediately with a spinner card and a 5-second meta-refresh until completion. Storage and column syncs are wrapped in outer guards: if
INFORMATION_SCHEMA.TABLE_STORAGEis empty or the principal lacksbigquery.tables.listfor that step, the scan logs a warning and continues — the dashboard storage panel just renders empty. - Optimised scan pipeline (spec 147) — the three independent
INFORMATION_SCHEMAqueries (JOBS_BY_PROJECT,COLUMNS,TABLE_STORAGE+SCHEMATA_OPTIONS) run concurrently in a 3-thread pool so total network wall-time is the slowest of the three rather than their sum. Detection wraps the rule loop in a sqlglot AST cache so identical query bodies (the common case for recurring dbt rebuilds — every job sharing aquery_hashes.normalized_literals) parse exactly once across all rules instead of once per (rule, job). Every scan emits ascan timing: jobs_fetch=… columns_fetch=… storage_fetch=… persist=… detection=… total=…INFO line on completion so the operator can see where the time went.
Quickstart
gcloud auth application-default login
uv tool install governor-audit
governor-audit start
# open http://localhost:8765 — first run goes through the setup wizard
The wizard saves config + auto-runs your first scan; you don't need to call init or scan manually unless you prefer the CLI.
CLI alternatives:
governor-audit init --project prod-warehouse-123 --region us
governor-audit scan # uses config defaults (24-hour lookback)
governor-audit scan --days 7 # override the lookback for this run
governor-audit status # connection / cache snapshot
governor-audit reset -y # wipe cached scan data; keep config
governor-audit reset-config -y # drop config.json so next start re-enters setup
governor-audit stop # terminate the managed web server
See the spec quickstarts for the full first-audit walkthrough:
- spec 141 quickstart — original audit MVP
- spec 144 — query-only rule catalog and synthetic manifest
- spec 145 — TABLE_STORAGE ingestion + dashboard storage panel
- spec 146 —
/jobs/{table}cohort view with cost-trend chart and per-version detection - spec 147 — parallel
INFORMATION_SCHEMAfetch, shared sqlglot AST cache, phase-by-phase timing
Architecture
- Storage: SQLite at
~/.governor-audit/state.dbviagovernor_core.db.sqlite_compat. The connect-time hook bumpsSQLITE_LIMIT_VARIABLE_NUMBERfrom the default 999 to 100_000 soWHERE col IN (?, ?, …)queries past the cap (busy 24h scans routinely exceed 999 jobs) don't tripOperationalError: too many SQL variables. Persisted shapes:BigQueryJob— raw INFORMATION_SCHEMA rows.TableColumnMetadata— column lists forSELECT *expansion.TableStorageMetric— per-table byte counts + per-dataset billing model — feeds the storage-billing rule.Opportunity— detection findings.ScanRun(audit-only) — every scan attempt with status / timing / project / region / lookback. Drives the Scan history table and the auto-refresh state.AuditOpportunityMetadata(audit-only) — dbt attribution + query hash sidecar tied toOpportunity.
- Scan replaces previous data per (project, region): each new scan wipes the prior scan's
BigQueryJob/Opportunity/TableColumnMetadata/TableStorageMetricrows for the sameconfig_idbefore persisting fresh ones.ScanRunhistory is preserved (it's the audit log). - Project switching:
(project, region)hashes to a deterministicconfig_id(governor_audit.scan.sentinels.config_id_for). Every dashboard / opportunities / job-detail read is scoped to the active config_id, so switching projects in the configurations form shows only the new project's data. - Auth: gcloud Application Default Credentials only —
google.auth.default(). No service-account JSON. No browser OAuth. The configurations route additionally probesbigquery.jobs.listAll+bigquery.tables.listper project via Resource Manager'stest_iam_permissionsand only lists projects where both are granted; results cache per-principal in-process so the probe runs once per session. - Workload classification: manifest-free heuristic — dbt-originated CTAS / MERGE / INSERT / UPDATE / DELETE →
build; non-dbt SELECT →consumption; ambiguous →other. Driven by the/* {"app": "dbt"comment-prefix the dbt-bigquery adapter prepends. - Synthetic manifest (spec 144): audit reuses every
governor_corerule unmodified by building a synthetic dbt-shaped manifest fromBigQueryJobrows. Each row becomes a model node keyed by destination table; CTAS / MERGE wrappers are stripped so manifest-driven analyzers see the innerSELECT. The same pattern feedsgovernor_core.solutions.templatesfor deterministic before/after SQL diffs. - Non-fatal storage step:
INFORMATION_SCHEMA.TABLE_STORAGErequiresbigquery.tables.list(granted byroles/bigquery.metadataViewerorroles/bigquery.dataViewer), which is a separate role from theroles/bigquery.resourceViewerthat powers the cost scan. The picker filter normally drops projects missing it, but the orchestrator also wrapssync_table_storageandsync_table_columnsin an outer try/except: any failure (empty result, permission denied, region typo, network blip) logs a warning and continues — the rest of the scan still completes and the dashboard storage panel renders empty. - Config migrations:
~/.governor-audit/config.jsoncarries aschema_versionand runs forward-only migrations on every load (currently v1 → v5). Each migration is a no-op when the payload is already at-or-above its target. The current migration set drops the deprecated v1manifestfield, resets stale per-rule overrides, and force-enables suggestion rules.governor-audit reset-configis the escape hatch when you want a totally fresh config. - Loopback only: the FastAPI app rejects any request whose
Host:header isn't a localhost variant. Not a public service.
Versioning
governor-audit ships on its own version track, decoupled from the cloud bundle (governor-core / governor-web / governor-cli / governor-bq). Audit v0.4.x and cloud v0.7.x coexist. See scripts/release-audit.sh for the release flow.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file governor_audit-0.4.15.tar.gz.
File metadata
- Download URL: governor_audit-0.4.15.tar.gz
- Upload date:
- Size: 185.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e2438d4e0f20755da87bf8435214409c1fd4e529daa69eb20a9b096a32407ae
|
|
| MD5 |
ca37db3aba867b93bbaf88b85355c9f6
|
|
| BLAKE2b-256 |
e380e413e6b507f5453d79782432387292fdb52a4839dfc212b84e132b925136
|
File details
Details for the file governor_audit-0.4.15-py3-none-any.whl.
File metadata
- Download URL: governor_audit-0.4.15-py3-none-any.whl
- Upload date:
- Size: 229.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
194ab5f1b9840234d3feb069b71c913cd2864efd113d41d63237bbe4c8b22d99
|
|
| MD5 |
ce4fcd42cf0b20396ef0cbbccae057af
|
|
| BLAKE2b-256 |
94c547700b6ae198b49eda267d2d4f908ecf1197549276ed08d9de1e69afaf11
|