An agent regression firewall: replay saved agent traces and flag regressions by checking requirements, not text diffs. PASS / FAIL / UNCERTAIN.
Project description
reqfence
An agent regression firewall. When you change a prompt, model, or tool,
reqfencereplays saved agent traces and flags regressions by checking whether outputs still satisfy their requirements — not by text-diffing. Every check returns PASS / FAIL / UNCERTAIN.
Standalone package. Does not depend on ariadx or fie-sdk: the requirement
critic + trace schema are vendored and dependency-cleaned. Milestone 1 (CLI).
Why two tiers
Validated by the Milestone 0 derisking experiment
(🟢 GREEN): text-diff can't tell a harmless reword from a confidently-wrong
answer. reqfence uses two tiers over developer-declared requirements:
Each declared requirement has exactly one owner, decided by decidability:
| Tier | Owns | Role |
|---|---|---|
Deterministic (checks.py) |
every checkable item (JSON-valid, field-present, tool-called, word-count, …) | Primary hard gate, ~100% precision by construction |
Semantic (semantic.py) |
only the uncheckable items (factual correctness) | Catches confidently-wrong outputs; abstains (UNCERTAIN) when the judge isn't unanimous |
The semantic judge is never asked to grade a checkable item — that alone removed the false alarms an earlier "grade everything" design produced (the LLM can't reliably count words). See RESULTS.md.
Final verdict (engine.py, schema.combine): each requirement resolves to
one PASS/FAIL/UNCERTAIN; the candidate FAILs if any requirement fails,
PASSes iff all pass, else UNCERTAIN. A semantic UNCERTAIN never fails the
build; a deterministic FAIL always does.
Install
pip install -e ".[groq]" # or ".[anthropic]"; core installs with just pydantic+click
Python ≥ 3.11 (uses stdlib tomllib).
The three commands
reqfence init
Scaffolds reqfence.toml + empty fixtures.jsonl / candidates.jsonl.
reqfence record — save a baseline
Stores a frozen baseline trace + its developer-declared requirement checklist. Ingests an already-captured trace (it does not execute an agent):
# requirements.json: [{"id":"json","desc":"valid JSON","check":{"type":"valid_json"}}, ...]
reqfence record --id weather --task "Return weather as JSON" \
--requirements requirements.json --from-trace baseline_trace.json
# or convert a framework trace:
reqfence record --id t1 --task "..." --requirements reqs.json --from-langgraph messages.json
reqfence record --id t1 --task "..." --requirements reqs.json --from-openai steps.json --openai-format run_steps
reqfence check — gate a change
Replays candidate traces against baselines, runs both tiers, prints a per-requirement table, and exits non-zero if any FAIL (UNCERTAIN does not):
reqfence check # uses paths from reqfence.toml
reqfence check --no-semantic # deterministic gate only (no API key needed)
The semantic tier runs only when enabled and a key is in the environment
(GROQ_API_KEY / ANTHROPIC_API_KEY). Keys are read from the environment only;
check will also read a nearby .env for convenience but never prints or writes it.
Requirement checks (catalog)
Core six (the reliable gate, unit-tested for precision):
valid_json, contains_substring (+ regex), max_words, contains_field,
tool_called, no_tool_error.
Extended (thin, tested): min_words, min_sources, json_array_len, file_written.
Special: semantic — always abstains deterministically; only the LLM tier judges it.
Fixtures format
Versioned JSONL, one record per line (fixtures.jsonl = baselines + checklists,
candidates.jsonl = labeled candidate traces). The Milestone 0 benchmark is
migrated in under fixtures/ via python scripts/migrate_m0.py.
The format is a first-class artifact designed to grow.
Tests
pip install -e ".[dev]" && pytest # 26 tests: checks, union/abstention, fixtures, CLI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reqfence-0.1.0.tar.gz.
File metadata
- Download URL: reqfence-0.1.0.tar.gz
- Upload date:
- Size: 29.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0119255eca8dd9480ecc3e8f7ce30b5b9d6f77673278c0343e33da779b261528
|
|
| MD5 |
6271e34eaede0ffd80bc7505348b40c2
|
|
| BLAKE2b-256 |
cd090cc4663a28982c54e18514fc8dca085e8160e1415652606348ec2cc689e0
|
File details
Details for the file reqfence-0.1.0-py3-none-any.whl.
File metadata
- Download URL: reqfence-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
397c16bbceed2ff7191355499df7e01485e13721a46cb9491d8b8330360065fc
|
|
| MD5 |
e515c97ee8641a0683dae58fac7085c2
|
|
| BLAKE2b-256 |
7a46e82bd8ff7a330e4ba33c754b93f887e92e778366953fe1cb12d39c65275c
|