Skip to main content

Audit gate for tuned candidates: stress boundaries, hard constraints, walk-forward validation, and append-only trails.

Project description

omega-lock

The best score is lying to you — and your optimizer can't catch it. omega-lock is the gate that runs after your tuner, takes its "winning" candidate, and tells you whether that score is real or just luck — before it ships.

PyPI Python License

pip install omega-lock
omega-lock demo   # 60s, offline: watch a "winning" score collapse -74% on held-out data

Keywords: hyperparameter overfitting · eval / prompt regression testing · walk-forward validation · validate an Optuna study · holdout transfer check in CI.


The 30-second version

You ran a hyperparameter sweep, a prompt search, or a threshold tuner. It came back proud and pointed at the winner — the highest score on the data you tuned against.

That is exactly the number you can't trust. When you try hundreds of candidates and keep only the single best one, you don't just keep the most skillful one — you keep the luckiest one. And luck doesn't repeat. The moment you test that winner on data it has never seen, the lucky streak is gone:

on the data it was picked from   →   5.967   (real skill  +  a lucky streak)
on brand-new, held-out data      →   1.527   (only the real skill that was left)   ▼ -74.4%

This is overfitting from selection, and no optimizer protects you from it — finding the max is its whole job. omega-lock is your second opinion. It re-tests the winner on a slice the search never touched and returns a flat verdict: PASS (ship it) or FAIL (block it).


See it fail a lucky winner — 60 seconds, nothing to set up

omega-lock demo

A fully offline case study: a search picks a candidate that looks brilliant in training, then omega-lock re-scores it on a held-out slice.

candidate: best-by-score (selected from 125 trials)
  train score    5.967
  holdout score  1.527     ▼ -74.4%
  walk-forward transfer gate ............ FAIL   (train↔holdout correlation 0.179 < 0.3)
  hard-constraint feasibility ........... FAIL   (best_feasible ≠ best_any)

VERDICT: BLOCK — the winning score did not transfer. Selection concentrated luck.

The optimizer was thrilled with 5.967. The reality was 1.527. omega-lock stamps FAIL and your pipeline stops the deploy. That collapse is the whole product in one screen.


Drop it into CI

Point omega-lock at two score files — the scores your optimizer reported on the data it tuned against, and the scores of the same candidates re-evaluated on a held-out slice. It exits 0 (ship) or 1 (block):

omega-lock gate --train train_scores.json --holdout holdout_scores.json
# .github/workflows/overfit-gate.yml
name: overfit-gate
on: [pull_request]

jobs:
  guard:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install omega-lock

      # your tuner runs here and writes train_scores.json + holdout_scores.json
      - run: python tune.py

      # the gate: a non-zero exit fails the check and blocks the merge
      - run: omega-lock gate --train train_scores.json --holdout holdout_scores.json

When the held-out score doesn't track the tuned score, the step fails red and the PR can't merge. Every run also writes an append-only audit trail, so you can prove later exactly what was gated, when, and why.

Prefer Python? The same decision is one call:

from omega_lock.simple import gate_scores

result = gate_scores(train="train_scores.json", holdout="holdout_scores.json")
assert result.passed, result.reason   # fail your test suite on a bad candidate

Already have an Optuna study? Gate it in 3 lines

import optuna
from omega_lock import audit_optuna_study

study  = optuna.load_study(study_name="my-sweep", storage="sqlite:///sweep.db")
report = audit_optuna_study(study, holdout_evaluate=score_on_holdout)  # walk-forward + feasibility on study.best_trial
print(report.passed, report.gated_best)   # False, and the candidate it WILL certify (or None)

No new study, no rewrite of your objective, no config DSL. It also works on bare lists (Ax, Ray Tune, Hyperopt, GridSearchCV, or a hand-rolled sweep) — any leaderboard is enough.


What the gate actually checks

Three independent checks on the candidate your search already chose. Any one can block it.

Check Plain English Blocks when
Walk-forward transfer gate Does the score earned on the tuned data carry over to a held-out slice it never saw? The held-out result decorrelates from the tuned ranking — the winner was a fluke.
Hard-constraint feasibility Is the highest-scoring candidate also a valid one (passes your latency / cost / risk limits), or did you win on a config you can't run? best_feasible ≠ best_any — the top score violates a constraint you declared.
Append-only audit trail Can you reconstruct the decision months later? Never blocks — always records the verdict, inputs, and thresholds, tamper-evident.

Core insight: the highest score is the most suspicious number you own. A real edge survives a slice it was never shown. Luck does not.


omega-lock is NOT another optimizer

It does not search, sample, or propose anything. It is the gate you bolt onto the search you already have — keep Optuna, keep your sweep, keep your eval loop, and let omega-lock judge the output.

Your optimizer (Optuna / Ax / sweep) omega-lock
Job Finds the best score Tells you if that score deploys
Runs during the search after it, on the result
Looks at the data the search consumed a held-out slice it never saw
Output a leaderboard + a winner PASS / FAIL + the certified candidate

Where it sits next to the tools you know

Tool Its job Overlap with omega-lock
Optuna / Ax / Ray Tune search the space, return a winner (constrained optimization) none — omega-lock audits their winner
MLflow / Weights & Biases track what you ran none — omega-lock is a pass/fail gate, not a tracker
promptfoo / DSPy / your eval harness score prompt & model outputs none — omega-lock catches the prompt that aced the eval but won't generalize

The empty seat omega-lock fills: an output-side overfit gate. The rule of thumb — if a number was chosen by trying many options and keeping the best, it belongs behind this gate. For where omega-lock does and does not fit the wider toolbox, see docs/TOOLKIT_POSITIONING.md.


Install

pip install omega-lock

omega-lock demo                 # 60s offline walkthrough — watch a lucky winner collapse
omega-lock gate --help          # the CI gate (exit 0 = ship, 1 = block)

Generate a shareable dark-themed scorecard from any gate run with render_html — attach it to a PR or archive it.


READMEs: Easy / plain-English README · 한국어 README · 쉬운 한국어 READMEDocs: How the transfer gate works · Power API for integrators · Trust & audit model · Toolkit positioning · CHANGELOG

Same idea, other surfaces (the Hibou04 toolkit — a held-out gate where AI workflows skip one): omegaprompt gates an overfit prompt · antemortem gates an AI agent's code plan with verified file:line citations.

Badge and download analytics boundaries. The badges above are static or registry-served links; they do not prove release readiness, correctness, trustworthiness, adoption, or package quality. Downloads or stars may indicate visibility, not skill — stars/downloads must not be used as audit evidence or release approval. No PyPI or GitHub download analytics are asserted here. Only the gate's PASS/FAIL on held-out data is evidence.

Note on terms. This page uses plain language; the public Python API keeps its established symbols for backward compatibility (other repos import them). In code you may see: run_p1 / P1Config (run the gate + its config), check_kc4 / KCThresholds (the walk-forward transfer check + its pass thresholds, e.g. minimum transfer correlation), measure_stress (rank parameters by perturbation sensitivity), ParamSpec (a tunable parameter's range), EvalResult (one scored candidate). You never need these to use omega-lock demo, omega-lock gate, or omega_lock.simple.gate_scores(). Full reference in docs/API.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omega_lock-0.3.7.tar.gz (233.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omega_lock-0.3.7-py3-none-any.whl (103.0 kB view details)

Uploaded Python 3

File details

Details for the file omega_lock-0.3.7.tar.gz.

File metadata

  • Download URL: omega_lock-0.3.7.tar.gz
  • Upload date:
  • Size: 233.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omega_lock-0.3.7.tar.gz
Algorithm Hash digest
SHA256 213b11857323a760c15fafffe37b573d47a46b02df87c12d6b5e8ee124787a6f
MD5 6c39469d48d0389832e9b70d11881142
BLAKE2b-256 b2b06a8ff14fdd9a35e59a3b16ef8b7603a4e813f88c1204aa4be76f244addc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for omega_lock-0.3.7.tar.gz:

Publisher: publish.yml on hibou04-ops/omega-lock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omega_lock-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: omega_lock-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 103.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omega_lock-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e70b2faee3ea4c5ae2da07bc66323f88e3c432b2877cff233ed2f74cd08abb37
MD5 47eaaa815a7fa31d2930cf89e33c3906
BLAKE2b-256 6abdf945e133f18b7882e35fd774ff2b564f6ac7ad9263649acee80a8a6b4086

See more details on using hashes here.

Provenance

The following attestation bundles were made for omega_lock-0.3.7-py3-none-any.whl:

Publisher: publish.yml on hibou04-ops/omega-lock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page