Skip to main content

A statistically rigorous CI gate for AI: treats model outputs as distributions, penalizes unreliable judges, and decides ship / hold / regression.

Project description

regression-substrate

A statistically rigorous CI gate for AI systems. It treats model outputs as distributions, penalizes unreliable judges, and returns a SHIP / HOLD / REGRESSION verdict you can block a pull request on.

Install

pip install regression-substrate            # core (numpy, scipy)
pip install "regression-substrate[clustering]"   # + auto_cluster (scikit-learn)
pip install "regression-substrate[langsmith]"    # + LangSmith adapter

For development (editable install with test dependencies):

git clone <repo-url>
cd regression-substrate
pip install -e ".[dev]"

CLI (drop into CI)

regsub --data evals.csv --gold gold.jsonl --version-a v1 --version-b v2 --out out/
# exit 0 = SHIP / SHIP_WITH_FLAGS ; 1 = REGRESSION / HOLD ; 2 = JUDGE_INADMISSIBLE

One line in your CI pipeline blocks the PR on a regression.

Library

from regression_substrate import gate, load_from_csv, Judge

judge = Judge(my_llm_scorer)            # any (input, response) -> [0,1]
cal = judge.calibrate(gold_records)     # -> kappa, error_sd
sa, sb, cids, meta = load_from_csv("evals.csv", "v1", "v2")
decision = gate(sa, sb, cids, judge_error_sd=cal["error_sd"], kappa=cal["kappa"])
print(decision.verdict)

What's inside

Module Purpose
diff_engine Offline gate: variance components, bootstrap CI, cluster scan, BH/e-BH
ingest Loaders (JSONL, CSV), judge harness, auto-clustering
sequential_gate Always-valid martingale monitor for continuous deployment
gold Rolling gold set, drift detection, forced sampling for labeling
adapters Vendor flatteners (LangSmith preset)
otel_exporter OTel-aligned span capture path
cli The regsub console command

Running tests

pip install -e ".[dev]"
pytest

See examples/ for a runnable dataset and CHANGES.md for design decisions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regression_substrate-0.1.0.tar.gz (50.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

regression_substrate-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file regression_substrate-0.1.0.tar.gz.

File metadata

  • Download URL: regression_substrate-0.1.0.tar.gz
  • Upload date:
  • Size: 50.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for regression_substrate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 11e7aafb7084bf85972e35ebbd63fb77847fa5ef7bbe7a1fa406597423799ee4
MD5 9296fbe52329476eafbd27513865a80e
BLAKE2b-256 4edb2800b7c9235aaebce953db590c6f3fd6b795f9e1a5e7539ef60a0174c734

See more details on using hashes here.

File details

Details for the file regression_substrate-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for regression_substrate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 032c18c1b350a64f981a871ac4a00a23fa053708abebd31b21763e70eba2645c
MD5 48b292e8f7661f41d8e316aeb5a092f4
BLAKE2b-256 04fd1bedd6b7a0b6f7c3498b4515027439d62c526b9d465800626cb16615a085

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page