Skip to main content

Empirical preflight probes for omegaprompt calibration: judge consistency, endpoint schema reliability, context-budget margin, latency, noise floor. Emits PreflightReport records the omegaprompt pipeline consumes via derive_adaptation_plan.

Project description

mini-omega-lock

Empirical preflight probes for omegaprompt calibration. Measures judge consistency, endpoint schema reliability, context-budget margin, latency, and noise floor — emits PreflightReport records that omegaprompt's derive_adaptation_plan consumes.

CI PyPI License: Apache 2.0 Python Tests Parent

Part of the omegaprompt toolkitomegaprompt (calibration engine) · omega-lock (audit framework) · antemortem-cli (pre-implementation recon CLI) · mini-omega-lock (empirical preflight, this repo) · mini-antemortem-cli (analytical preflight) · Antemortem (methodology). Cross-toolkit cookbook: AGENT_TRIGGERS.md.

pip install omegaprompt mini-omega-lock

MCP server. This package also exposes its five probes (empirical_preflight, measure_judge_consistency, compute_context_margin, noise_floor_estimate, project_performance) as agent-callable MCP tools. Run pip install "mini-omega-lock[mcp]" then python -m mini_omega_lock.mcp (stdio, default for Claude Code). See AGENT_TRIGGERS.md scenario 2.


TL;DR

omegaprompt ships a plugin interface for preflight probes (omegaprompt.preflight.contracts + omegaprompt.preflight.adaptation) but no probe implementation. This package fills that gap with five empirical measurements, then hands the result to omegaprompt's adaptation layer:

  • Judge consistency — same (response, rubric) scored N times → 1 - CV. Low = noisy judge, need rescore_count > 1.
  • Schema reliability — STRICT_SCHEMA probe success rate. < 0.9 triggers JSON_OBJECT fallback automatically.
  • Context budget margin1 - (longest_call_tokens / context_window). Negative = guaranteed overflow.
  • Performance projection — probe latency × calibration scale → wall-time estimate before launching.
  • Noise floor — fitness stdev under identical params → adaptive min_kc4 threshold.

One call (empirical_preflight()) returns the three measurement records omegaprompt's derive_adaptation_plan() consumes, plus a warnings list naming every field that fell back to a fail-closed default (e.g. schema_reliability=0.0 when the strict-schema probe was not supplied).

Looking for the analytical (no-API, deterministic) preflight? See sibling tool mini-antemortem-cli — same plugin interface, deterministic rule-based classifier instead of LLM probes.


Quick start (3-minute)

from omegaprompt import make_provider, PreflightReport, derive_adaptation_plan
from omegaprompt.domain.dataset import DatasetItem
from omegaprompt.domain.judge import Dimension, JudgeRubric
from omegaprompt.judges.llm_judge import LLMJudge
from mini_omega_lock import empirical_preflight

judge_provider = make_provider("anthropic")
judge = LLMJudge(provider=judge_provider)
rubric = JudgeRubric(dimensions=[Dimension(name="accuracy", description="x", weight=1.0)])
probe_item = DatasetItem(id="probe", input="2+2", reference="4")

# One call → five measurements → adaptation plan
judge_quality, endpoint, performance, warnings = empirical_preflight(
    judge=judge, rubric=rubric, probe_item=probe_item,
    probe_response="4", consistency_repeats=3,
)
for w in warnings:
    print(f"[mini-omega-lock] {w}")

report = PreflightReport(judge_quality=judge_quality, endpoint=endpoint, performance=performance)
plan = derive_adaptation_plan(report)
print(plan.recommendations)

👋 Simpler intro: EASY_README.md (English) · EASY_README_KR.md


Why this is separate from omegaprompt

omegaprompt ships a plugin interface (omegaprompt.preflight.contracts + omegaprompt.preflight.adaptation) but no probe code. Standalone users do not need preflight probes — they run calibration with declared defaults. Users who want adaptive thresholds tuned to their actual infrastructure install this package alongside:

pip install omegaprompt mini-omega-lock

What it measures

Measurement Function What it tells you
Judge consistency measure_judge_consistency Same (response, rubric) scored N times; 1 - CV. Low = noisy judge, need rescore_count > 1.
Endpoint schema reliability probe_strict_schema STRICT_SCHEMA probe success fraction. < 0.9 triggers JSON_OBJECT fallback.
Context budget margin compute_context_margin 1 - (longest_call_tokens / context_window). Negative = overflow.
Performance projection project_performance Mean probe latency → projected calibration wall time.
Noise floor noise_floor_estimate Stdev of fitness under identical parameters. Sets adaptive min_kc4.

The composite entry point is empirical_preflight(), which runs all five in one call and returns a 4-tuple — three measurement records omegaprompt's adaptation layer consumes plus a warnings list. Any unmeasured field is fail-closed (e.g. schema_reliability=0.0 rather than 1.0) and named in the warnings; CI gates should treat the warnings list as load-bearing, not cosmetic.

Usage

from omegaprompt import make_provider, PreflightReport, derive_adaptation_plan
from omegaprompt.domain.dataset import DatasetItem
from omegaprompt.domain.judge import Dimension, JudgeRubric
from omegaprompt.judges.llm_judge import LLMJudge
from mini_omega_lock import empirical_preflight

judge_provider = make_provider("anthropic")
judge = LLMJudge(provider=judge_provider)
rubric = JudgeRubric(dimensions=[Dimension(name="accuracy", description="x", weight=1.0)])
probe_item = DatasetItem(id="probe", input="2+2", reference="4")

judge_quality, endpoint, performance, warnings = empirical_preflight(
    judge=judge,
    rubric=rubric,
    probe_item=probe_item,
    probe_response="4",
    consistency_repeats=3,
    dataset_size_hint=10,
    candidates_expected=20,
)

# Surface fail-closed warnings before trusting the measurements.
for w in warnings:
    print(f"[mini-omega-lock] {w}")

report = PreflightReport(
    judge_quality=judge_quality,
    endpoint=endpoint,
    performance=performance,
)
plan = derive_adaptation_plan(report=report)
# plan.min_kc4_override, plan.rescore_count, etc.

Design principles

  • No fabricated success. Unmeasured fields fail closed (schema_reliability=0.0, noise_floor=0.0, scale_monotonic=False) and emit explicit warnings — agents can tell "measured zero" from "we never ran the probe". The context-margin probe runs as a length-based projection by default (compute_context_margin, chars_per_token=3.8); pass real texts and a token_counter to upgrade to a tokenizer-exact measurement (compute_context_margin_from_texts).
  • Minimal probe budget. Default 3 consistency repeats + 3 schema probes + 1 context-margin compute = 7-10 API calls per preflight. Worth < $0.01 on frontier tiers.
  • Protocol-conformant output. Emits omegaprompt.preflight.contracts.JudgeQualityMeasurement / EndpointMeasurement / PerformanceMeasurement exactly. No shape drift.
  • Composable. Can run alongside mini-antemortem-cli (analytical preflight) into the same PreflightReport.

Validation

All adapter tests mock the provider SDK; no network, no API credits, fully offline. Run with pytest -q.

Relation to the family

  • omega-lock — parameter-calibration framework. The naming "mini-omega-lock" echoes this family; the sensitivity + walk-forward + KC-4 discipline comes from there.
  • omegaprompt — prompt calibration engine. This package feeds its preflight plugin interface.
  • mini-antemortem-cli — analytical sibling. Runs deterministic trap classification over config before calibration.

License

Apache 2.0. See LICENSE.

License history. PyPI distributions of version 0.1.0 were shipped with an MIT LICENSE file. The repository was relicensed to Apache 2.0 on 2026-04-22 (commit ff489a9); 0.2.0 (2026-04-28) and all later versions ship under Apache 2.0. Anyone who installed 0.1.0 holds an MIT license to that copy — license changes do not apply retroactively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_omega_lock-0.4.0.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mini_omega_lock-0.4.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file mini_omega_lock-0.4.0.tar.gz.

File metadata

  • Download URL: mini_omega_lock-0.4.0.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mini_omega_lock-0.4.0.tar.gz
Algorithm Hash digest
SHA256 78a293ac03b043d0a61bb9e708f3c4d256f8b5c3e0a8b51bae99d58625fdf95f
MD5 48ac9bb73055068a3fc3c91d955b5c9a
BLAKE2b-256 8dd9e499785632680452096b8e3374b838b03dc16b6031de3e490c73821ed10a

See more details on using hashes here.

File details

Details for the file mini_omega_lock-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mini_omega_lock-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e50d65985a9b502cca113f626759f0c3d7423465718b7997c3c835b1883b290e
MD5 d28e79f527d60b16c102fe4c609cc3ff
BLAKE2b-256 046c5ee381fa9a7ed62d41737171329cdee24dd9078d2fb1a07333352156f289

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page