Skip to main content

Read-only cross-cutting analysis, metrics, and reporting engine for the Golden Suite

Project description

GoldenAnalysis

Measure and report across the Golden Suite. A read-only, cross-cutting analysis / metrics / reporting engine: it consumes any stage's typed artifacts (or a raw DataFrame) and emits a unified, exportable AnalysisReport.

GoldenAnalysis ships the generic frame path plus suite adapters (GoldenMatch / GoldenCheck / GoldenFlow / GoldenPipe), cross-run trend + regression detection, an edge-safe TypeScript port (optional WASM), and an optional Rust accelerator for the heavy aggregation primitives — all documented below. See docs/superpowers/specs/2026-06-08-goldenanalysis-cross-cutting-analysis-engine-design.md for the design rationale.

Install

pip install goldenanalysis

Zero suite dependencies for the generic path — it works on any polars DataFrame even with no other Golden package installed.

Quickstart

import polars as pl
import goldenanalysis as ga

df = pl.read_parquet("customers.parquet")

report = ga.analyze(df, analyzers=["frame.summary"])
print(report.to_markdown())

report.to_json("report.json")
report.to_parquet("report.parquet")   # long-form metric frame + table sidecars

CLI:

goldenanalysis report customers.parquet --analyzers frame.summary --format markdown
goldenanalysis report report.json --format markdown      # re-render a saved report

trend and regressions operate over a saved run history (see Cross-run below).

Over the suite

With the relevant extra installed (pip install goldenanalysis[match,check,flow,pipe]):

# A GoldenMatch dedupe result -> match.rates + cluster.distribution
report = ga.analyze_match(dedupe_result)

# A whole-pipeline manifest -> every analyzer whose artifacts are present
report = ga.analyze_pipeline(pipe_result)

match.rates emits match.recall_estimate when GoldenMatch ran dedupe_df(..., certify=True) (it attaches an unsupervised RecallEstimate), and match.recall_safe_bound when you pass an audit-calibrated certificate (analyze_match(result, certificate=...)) — the safe bound needs a labelled sample, so it can't be computed automatically. Both degrade silently when absent.

Cross-run — trend + regression detection

Store reports over time, then trend a metric or detect regressions without ground truth:

hist = ga.ReportHistory(backend="jsonl", path=".golden/analysis.jsonl")  # or backend="sqlite"
hist.append(report)                                  # keyed by (dataset, run_id)

hist.trend("cluster.singleton_ratio", "customers")   # -> TrendSeries

policy = ga.RegressionPolicy(default_pct=10.0, per_metric={"match.recall_safe_bound": 2.0})
regs = hist.detect_regressions("customers", baseline="rolling_median", policy=policy)
print(report.to_markdown(regs))                      # callout + Δ-vs-baseline column

The Baseline is a strategy (rolling_median default — immune to one noisy night — plus previous / last_known_good), and RegressionPolicy thresholds are per-metric and respect each metric's direction (a higher_better metric only flags on a drop). CLI:

goldenanalysis trend --metric cluster.singleton_ratio --dataset customers --history .golden/analysis.jsonl
goldenanalysis regressions --dataset customers --history .golden/analysis.jsonl \
  --policy "match.recall_safe_bound=2" --fail-on-regression   # exit 1 on a flagged regression (CI gate)

GoldenCheck vs GoldenAnalysis

They are easy to confuse and are deliberately distinct:

GoldenCheck GoldenAnalysis
Scope Profiles a single input dataset at ingest Cross-cutting over any stage's outputs
Direction A producer of artifacts (scan findings) A consumer of artifacts (incl. GoldenCheck's)
Across runs? No — one dataset, one scan Yes — trend / drift / regression over a run history
Writes data? Suggests/applies fixes Never — read-only by construction

The hard line: GoldenAnalysis depends on other packages' types; never the reverse. It sits beside the pipeline as a reporting step, consuming GoldenCheck / GoldenFlow / GoldenMatch / GoldenPipe / InferMap outputs — it does not replace GoldenCheck's ingest-time profiling, and GoldenCheck does not import GoldenAnalysis.

Native accelerator (optional, goldenanalysis[native])

An optional Rust accelerator for the heavy aggregation primitives, gated exactly like goldenmatch[native] / goldencheck[native]:

pip install goldenanalysis[native]   # pulls the separate goldenanalysis-native wheel

The pure-Python path stays the default and the byte-identical reference. The compiled kernel (analysis-core pyo3-free + analysis-native abi3 wheel) mirrors core/aggregate.py's histogram / quantile value-for-value, reading input as a Float64 Arrow array (zero-copy). The loader gate (core/_native_loader.py, GOLDENANALYSIS_NATIVE=auto|0|1) uses a primitive only once it's in _GATED_ON — which holds histogram and quantile: both proved byte-identical (tests/core/test_native_parity.py) and measured 5.8–9.9x faster than the pure Python loop on Linux x86_64 at 1M–10M rows, including the list→Arrow conversion the dispatch pays (benchmarks/aggregate_benchmark.py + bench-analysis-native.yml). A new primitive joins only after the same two gates clear — "it's Rust" is never enough (the goldencheck composite-key kernel was 2.5x slower until the gate caught it). With goldenanalysis[native] installed, the auto default uses the native path automatically; GOLDENANALYSIS_NATIVE=0 forces pure. In-tree dev build: uv run python scripts/build_analysis_native.py.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goldenanalysis-0.1.0.tar.gz (49.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goldenanalysis-0.1.0-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file goldenanalysis-0.1.0.tar.gz.

File metadata

  • Download URL: goldenanalysis-0.1.0.tar.gz
  • Upload date:
  • Size: 49.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for goldenanalysis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fce50a58b0ff00901662742d3959d7ccba83154b208bdeeea484da64180d48ce
MD5 175c1005792557645b15ec7c66920c62
BLAKE2b-256 e697bd157acb1dd5e4db0145d31e0bd718958f0da8293e49bef54d687aebd78c

See more details on using hashes here.

File details

Details for the file goldenanalysis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: goldenanalysis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for goldenanalysis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7232e0d18dfe2e8f699d057b4d485498141538ad6ccfecd57388ccd8eba5aa7
MD5 61bbbf62ab25cf3921971a1c6c3748d4
BLAKE2b-256 2c063bc31e411f248d94e1ad064a3d35ed807d542120232c574bd2d4b492e336

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page