Skip to main content

Analytical preflight for omegaprompt calibration: deterministic classifier over eleven source-backed calibration trap patterns. Emits AnalyticalFinding records the omegaprompt pipeline consumes via derive_adaptation_plan.

Project description

mini-antemortem-cli

A deterministic linter that catches the silent traps in your prompt-eval setup — train/test leakage, judge bias, homogeneous variants — before they fake a passing score. It reads omegaprompt calibration config inputs, classifies 11 source-backed built-in trap patterns, and emits AnalyticalFinding records without provider calls or network access. Works with omegaprompt; useful standalone as a config linter.

CI PyPI License: Apache 2.0 Python

pip install mini-antemortem-cli

Repository: hibou04-ops/mini-antemortem-cli · PyPI: mini-antemortem-cli · import: mini_antemortem_cli · CLI: mini-antemortem-cli · MCP: mini-antemortem-cli-mcp with mini-antemortem-cli[mcp]

What's New in 0.10.0

  • Two new trap rules — the count is now 11 (was 9). few_shot_leakage_into_test flags few-shot examples (baked into every prompt variant) whose input/output text matches a held-out test or train item — the model is handed the answer at inference time, so the score reflects memorisation, not generalisation (REAL/HIGH on test overlap, REAL/MEDIUM on train-only). rubric_dead_weight_dimension flags a rubric dimension with zero weight while another dimension is weighted — the dead axis is still sent to the judge (spending tokens and attention) but contributes nothing to fitness (REAL/MEDIUM). Both are deterministic and reachable through the real omegaprompt domain objects; the false-positive corpus grew to 53 cases (0/53) and golden cases cover both new traps.
  • Claim-drift correction (fix). The package previously described its trap count inconsistently — pyproject said nine while the GitHub repository description said seven. The source of truth is analytical_traps(), which now returns 11; every reference (pyproject, all four READMEs, the __init__ docstring, the MCP server instructions, the generated claim docs) is regenerated from that single source, and scripts/check_repo_consistency.py fails the build on any future drift. The GitHub repository description is also corrected to match.
  • Version-agnostic publish workflow (fix). .github/workflows/publish.yml now reads the version from pyproject.toml (via tomllib) and asserts that the release tag and __init__.__version__ match it before building, so a tag/metadata mismatch fails fast instead of silently publishing the wrong version.

What's New in 0.9.1

  • One-line config-load errors (fix). A bad-schema --train / --test row made Dataset.from_jsonl raise a multi-line pydantic ValueError, so the stderr message spanned several lines — but the docs promised one line. It is now truncated to a single line. Exit code is unchanged (2); a regression test locks the single-line behavior.

What's New in 0.9.0

  • Text-mode verdict line (C1): the default (text) output of check now leads with one grep-friendly Summary: line that surfaces the existing native 5-level status (PASS / ADVISORY / HOLD / BLOCK / NEEDS_MORE_EVIDENCE). The core verdict — previously visible only to --json consumers — is now visible to the default user. ... check | head -1 becomes the CI signal.
  • Config-referenced citations (C2): high-signal findings now carry the computed value the trap fired on in the cite field (overlapping ids, dominant rubric dimension and weight, max pairwise variant Jaccard, undersized test slice). These reference the supplied calibration config, not on-disk source files.
  • Clean input errors (C3): input-file load failures (missing file, malformed JSON, schema mismatch) are now reported as a one-line stderr message naming the file and the error class, then exit code 2 (config error, distinct from the policy-gate exit 1) — no more raw tracebacks.
  • list-traps --json (H1): emit the trap registry as a {id, hypothesis} JSON array. Text remains the default.
  • Exact train/test ID overlap = BLOCKER (H2, behavior change): exact train/test ID overlap now fires at BLOCKER severity (was high) — the held-out set is not held out, a hard leak. Within-slice duplicates stay medium. This is a public severity-contract change; --fail-on-severity high still catches BLOCKER, so no CI gate regresses.

This release moves the project to Development Status :: 4 - Beta, with a commitment to additive-only changes to the CLI / JSON / MCP surface going forward.

Trust / Verification Links

Use It When

  • You are about to run an omegaprompt calibration and want a deterministic structural check first.
  • You want CI to flag calibration configs with same-vendor judge bias, weak held-out power, train/test leakage, or opaque routed-provider family risk.
  • You need machine-readable AnalyticalFinding output that can feed derive_adaptation_plan.

Verification Loop

python scripts/generate_readme_claims.py --check
python scripts/check_repo_consistency.py
python examples/demo_replay.py
python scripts/run_golden_cases.py --check
python scripts/run_false_positive_audit.py --check
python scripts/verify_fixture_integrity.py

These commands are no-network by design. They verify that public claims, generated docs, demo fixtures, golden cases, false-positive corpus, and artifact digests still match local source of truth.

False-Positive Audit

benchmarks/false_positive/benign_cases.json carries a labeled corpus of configurations (53 cases across all 11 traps, with both nominal and boundary inputs) that the analytical preflight must not flag. scripts/run_false_positive_audit.py replays the corpus through the same deterministic classifier and reports the per-trap false-positive rate; the same script runs as a CI gate, so a regression that flips a benign case to REAL / NEW / UNRESOLVED fails the build. As of 0.10.0 the measured rate is 0/53 (0.00%). Known classifier limitations can be recorded in the manifest's acknowledged_false_positives block so the gate distinguishes regressions from documented behavior.

Deterministic Demo

python examples/demo_replay.py

The demo loads JSONL/JSON fixtures from examples/demo_config/, runs mini-antemortem-cli check in text and JSON modes, and compares the replay against examples/_demo_output.txt. It uses no API keys and makes no network calls.

How Is This Different?

Dimension mini-antemortem-cli mini-omega-lock antemortem-cli omegaprompt default path Ad-hoc review prompts
Core role Deterministic analytical preflight over calibration config. Empirical preflight over live or mocked provider behavior. Broader pre-diff recon and implementation-risk CLI. Calibration engine that consumes preflight outputs. Free-form human/LLM review of a config.
Deterministic no-network behavior Yes by default. Mock mode can be deterministic; live mode is provider-dependent. Not the default when provider recon is enabled. Core calibration can call configured providers. No guarantee.
Trap classification Yes, over built-in calibration traps. Measures endpoint/judge behavior rather than this static trap registry. Can reason over broader risk lists. Consumes PreflightReport; does not ship this classifier. Prompt-dependent.
Explicit trap IDs Yes: each finding has a stable trap_id. Not this trap ID registry. Uses its own evidence/recon structures. Preserves supplied analytical findings. Usually absent unless manually requested.
Source-backed trap count Yes, generated from analytical_traps(). Not applicable to this trap registry. Not applicable to this mini package. Not applicable. No.
Train/test split discipline Flags missing held-out slice and train/test ID overlap. Can probe empirical behavior but does not replace split integrity checks. Can inspect source/artifacts when configured. Uses whatever datasets caller supplies. Usually easy to miss.
Routed-provider opacity Flags routed-provider family ambiguity as UNRESOLVED. Can probe actual endpoint behavior when live calls are allowed. Can gather external evidence when configured. Does not infer provider family. Often hidden by provider labels.
Same-vendor judge bias Flags same-family target/judge pairs. Can measure judge consistency but does not make this static config claim. Can analyze broader judge-risk context. Consumes findings if provided. Often subjective.
CLI/MCP availability CLI: mini-antemortem-cli; MCP: mini-antemortem-cli-mcp via [mcp]. Separate sibling package. Separate broader CLI. Library API. None unless built by the user.
Reads source files No. It reads calibration input files only. No by default. Yes, for disk-backed recon and citations. No source recon by default. Only if pasted or tool-enabled.
Live empirical probes No. Yes in live mode. Yes when configured. Provider calls during calibration. Maybe, but not reproducible by default.
Disk-verified file:line citations No (disk citations). Fixture integrity only. As of 0.9.0, findings carry config-referenced citations (the computed value the trap fired on, e.g. the overlapping ids or dominant rubric dimension) in the cite field — see the cite row below. No. Yes, where that tool implements evidence-bound citations. No. No.
Config-referenced citations (cite field) Yes, as of 0.9.0, for the high-signal traps where the firing value is computed (overlapping ids, dominant rubric dimension and weight, max pairwise variant Jaccard, undersized test slice). These reference the calibration config the user supplied, not on-disk source files. No. Distinct mechanism (disk-backed). No. No.
What it does not prove It does not prove provider quality, prompt superiority, statistical validity, production adoption, or external validation. It does not prove analytical trap absence. It does not prove this mini package's trap count. It does not perform this preflight unless supplied. It proves nothing mechanically.

Built-In Trap Patterns

Source of truth: src/mini_antemortem_cli/traps.py via analytical_traps().

Trap ID What it checks
self_agreement_bias Target and judge share vendor family or exact model, creating self-agreement risk.
small_sample_kc4_power Held-out sample size is too small for KC-4/Pearson signal to carry useful power.
variants_homogeneous Prompt variants are too similar to create meaningful sensitivity signal.
rubric_weight_concentration One rubric dimension dominates the weighted fitness.
judge_budget_too_small SMALL judge output budget is likely insufficient for rubric dimensions and gates.
empty_reference_with_strict_rubric Rubric implies ground-truth comparison but dataset references are absent.
no_held_out_slice No test slice is provided, so walk-forward validation cannot run.
train_test_id_overlap Train/test IDs overlap or duplicate IDs make per-item correlation unreliable.
routed_provider_opaque_family A routed provider obscures the underlying served-model family.
few_shot_leakage_into_test A baked-in few-shot example shares its input/output with a held-out item, inflating the score by memorisation.
rubric_dead_weight_dimension A rubric dimension carries zero weight while another is weighted, so it is judged but never counted.

Each finding is one of REAL, GHOST, NEW, or UNRESOLVED and carries severity blocker, high, medium, or low.

CLI

mini-antemortem-cli list-traps
mini-antemortem-cli check \
  --target-provider openai \
  --target-model gpt-4o \
  --judge-provider anthropic \
  --judge-model claude-opus-4-7 \
  --train examples/demo_config/train.jsonl \
  --test examples/demo_config/test.jsonl \
  --rubric examples/demo_config/rubric.json \
  --variants examples/demo_config/variants.json \
  --judge-output-budget small

Use --json for machine-readable output. Use --fail-on-severity high when CI should fail on high-or-worse REAL/UNRESOLVED findings (this catches BLOCKER too). The deprecated --fail-on-blocker alias remains for backward compatibility; as of 0.9.0 it trips on a real failure because exact train/test ID overlap emits BLOCKER. list-traps accepts --json for a machine-readable {id, hypothesis} array.

Python API

from mini_antemortem_cli import analytical_preflight, analytical_traps

analytical_preflight(...) returns omegaprompt.preflight.contracts.AnalyticalFinding objects. The output is compatible with omegaprompt.preflight.PreflightReport and derive_adaptation_plan.

MCP

pip install "mini-antemortem-cli[mcp]"
mini-antemortem-cli-mcp
# or
python -m mini_antemortem_cli.mcp

The MCP server exposes analytical_preflight and list_traps. Path inputs are bounded by MINI_ANTEMORTEM_WORKSPACE_ROOT or the current working directory; inline JSON objects do not touch the filesystem.

Release Hygiene

python scripts/release_audit.py --no-network
python -m build
python scripts/wheel_smoke_install.py dist/*.whl
python scripts/publish_readiness.py --no-network

These scripts do not publish, tag, or create GitHub releases. Publishing is only wired through .github/workflows/publish.yml on v*.*.* tags or manual dispatch, using PyPI Trusted Publishing / GitHub OIDC with no token secret. Setup and sequencing are documented in docs/release_checklist.md.

License

Apache 2.0. See LICENSE.

License history: PyPI distributions of version 0.1.0 were shipped with an MIT LICENSE file. The repository was relicensed to Apache 2.0 on 2026-04-22 (commit d2d7eb7); 0.2.0 and later versions ship under Apache 2.0. Anyone who installed 0.1.0 holds an MIT license to that copy; license changes do not apply retroactively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_antemortem_cli-0.10.0.tar.gz (88.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mini_antemortem_cli-0.10.0-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file mini_antemortem_cli-0.10.0.tar.gz.

File metadata

  • Download URL: mini_antemortem_cli-0.10.0.tar.gz
  • Upload date:
  • Size: 88.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mini_antemortem_cli-0.10.0.tar.gz
Algorithm Hash digest
SHA256 11d4f2c15f7c590b39add280671dd8ebfd095975d9646042c45183d02b4c2109
MD5 8fd884ea27ec7ac9980f52fc25a7375a
BLAKE2b-256 e77ca9724036a81e2c13674d1e2d9df776f88279a67f24e8b55abcba55bb3c66

See more details on using hashes here.

Provenance

The following attestation bundles were made for mini_antemortem_cli-0.10.0.tar.gz:

Publisher: publish.yml on hibou04-ops/mini-antemortem-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mini_antemortem_cli-0.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mini_antemortem_cli-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63fc6d99cb0f09827d51f2d4755de03ff14482fe25e95bf6ec71b0a7e2828bcf
MD5 811a52b3f7ad97f1cef96c39fe8ae5df
BLAKE2b-256 99045027c35b816cf0387e99f105a218f032754c637cc5e12df6617f8e810456

See more details on using hashes here.

Provenance

The following attestation bundles were made for mini_antemortem_cli-0.10.0-py3-none-any.whl:

Publisher: publish.yml on hibou04-ops/mini-antemortem-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page