Decision-contracts library: deterministic verdict/gate/failure/promotion logic for autoresearch loops.
Project description
autoresearch-core
A tiny, pure-Python decision-contracts library for autoresearch / agentic loops: a deterministic verdict (metric / comparator / target), failure classification, gates, promotion record shapes, and — since 0.4.3 — life-harness round contracts (evidence-driven self-improvement policy) — the disciplined decision core, with zero runtime dependencies and no I/O.
You bring the loop, the retrieval, the runner, and the storage; you bind them to
the library's Protocols and call measure / decide / should_promote_dead_end
at your decision points. The verdict logic is parity-tested against the
GRD autoresearch loop.
Why
Agentic research loops fail in a predictable way: the model grades its own
homework. An LLM proposes a hypothesis, runs an experiment, then judges
whether the result supports the hypothesis — and judgment drifts.
autoresearch-core removes the judge from the control path:
- Every hypothesis must carry a machine-readable contract
(
MetricSpec: which metric, which comparator, which target). - Experiments report results through a machine-readable line
(
__RESULT__ {"accuracy": 0.93}on stdout). - The verdict is computed, not judged: metric vs target →
supported/refuted/inconclusive. - Only a deterministic refutation may auto-promote a dead-end. Anything judged by an LLM or inferred from an exit code is advisory.
Install
pip install autoresearch-core
Requires Python 3.11+. No runtime dependencies. Fully typed (py.typed).
Quickstart
from autoresearch_core import (
MetricSpec, ExperimentResult, measure, parse_metrics_line, should_promote_dead_end,
)
spec = MetricSpec(metric_key="recall_at_10", comparator=">=", target=0.8)
# An experiment prints `__RESULT__ {"recall_at_10": 0.83}` on stdout:
metrics = parse_metrics_line(stdout) # -> {"recall_at_10": 0.83}
verdict = measure(spec, ExperimentResult(metrics=metrics, exit_code=0))
verdict.verdict # "supported" | "refuted" | "inconclusive" (deterministic)
verdict.evidence_level # "deterministic"
should_promote_dead_end(verdict) # True only for a deterministic refutation
Documentation
- QUICKSTART — zero to a working deterministic verdict in five minutes, with a complete runnable script.
- TUTORIAL — build a full hypothesis → experiment → measure → learn loop on top of the library: contracts, failure classes, gates, dead-end promotion, infrastructure ports, and custom verdict strategies.
- RELEASING — ownership, versioning policy, release procedure, and the shared parity-vector contract with GRD.
- CHANGELOG
What it owns (and what it doesn't)
Owns — the decision discipline:
| Module | Public surface | Job |
|---|---|---|
types |
MetricSpec, ExperimentResult, VerdictRecord, Hypothesis, Takeaway, GateState, GateCheck |
Frozen dataclasses; pure data, no logic |
contract |
parse_metrics_line, validate_metric_spec |
The __RESULT__ {json} experiment-result contract |
verdict |
compare, DeterministicVerdict, VerdictStrategy |
Metric vs target → supported / refuted / inconclusive |
failures |
classify_run_failure |
stderr → H2 (missing dep) / H3 (missing file / permission) / H4 (timeout / runtime) / none |
gates |
resolve_gates, check_gate |
Approval gates (execute, kg_write) resolved from config |
policy |
measure, decide, decide_branch, should_terminate, detect_plateau, should_promote_dead_end |
The loop's branch / terminate / promote decisions |
promote |
DeadEndRecord, KnowhowRecord, approach_hash, build_dead_end_record, should_skip |
Promotion record shapes + approach dedupe |
rounds |
resolve_autonomy, select_evidence, validate_round_patch, patch_hash, should_apply, decide_round (+ Finding, PatchEntry, RoundPatch, EvalReport, AutonomyState, RoundRecord in types) |
Life-harness rounds: the policy for patching a harness's own primitives from session evidence |
Doesn't own — bind these via ports.py Protocols to your own infra:
Spawn (LLM call), Retriever, KnowledgeGraph, ExperimentRunner, Store,
and for rounds: FindingsSource, PatchProposer, RoundEvaluator, Applier,
RoundStore. No implementations ship in this package; the
tutorial
shows minimal bindings.
Verdict authority
DeterministicVerdict is the default and the reason this package exists. Other
strategies (an LLM judge, an exit-code check) can be plugged in via the
VerdictStrategy protocol, but only a deterministic refutation auto-promotes a
dead-end — non-deterministic verdicts are advisory. Every verdict records its
strategy and evidence_level, so the decision trail stays auditable.
Life-harness rounds (0.4.3)
The same discipline, pointed at the harness itself: a round turns session
evidence (e.g. Tesserae-compiled takeaways) into one eval-gated, reversible
patch to the host's own primitives. The kernel owns the decisions — evidence
selection, patch validation (path guards + a self-protection deny-list: a round
can never patch its own driver or autonomy config), dedupe, and the apply gate
(kill switch > eval > review-mode > confidence). Hosts bind the I/O through
five protocols and keep the forge in git: one commit per round, revert =
git revert.
from autoresearch_core import (
AutonomyState, EvalCheck, EvalReport, PatchEntry, RoundPatch, decide_round,
)
patch = RoundPatch(round_id="r1", entries=(
PatchEntry(path="commands/execute-phase.md", kind="markdown", op="modify",
content="...", rationale="executor keeps forgetting to commit"),
), summary="commit reminder in executor prompt", confidence=0.9)
decide_round(patch, AutonomyState(), set(), EvalReport(checks=(EvalCheck("lint", 0),)))
# ('evaluated', 'awaiting review (harness_review)') — review mode is the default
Reference host: GRD's gd harness round. Design:
life-harness rounds spec.
Note: the package version is locked to GRD's version line from 0.4.3 onward
(see RELEASING).
Development
pip install -e ".[dev]"
pytest -q --cov=autoresearch_core
The test suite includes a parity suite (tests/test_parity.py) that pins
behaviour to the GRD TypeScript implementation.
License
MIT © Cameleon X — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoresearch_core-0.4.4.tar.gz.
File metadata
- Download URL: autoresearch_core-0.4.4.tar.gz
- Upload date:
- Size: 44.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
942f4e045797112ffc0595123d7e68b94132cb5688997743e530e7ceecb1aaf0
|
|
| MD5 |
321b266cef69693566d0b823d79d342d
|
|
| BLAKE2b-256 |
0ecdfa54caf8e453623688265997feccf77649b6f6fb03150ad53b6e25c41bbd
|
Provenance
The following attestation bundles were made for autoresearch_core-0.4.4.tar.gz:
Publisher:
publish.yml on ca1773130n/autoresearch-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autoresearch_core-0.4.4.tar.gz -
Subject digest:
942f4e045797112ffc0595123d7e68b94132cb5688997743e530e7ceecb1aaf0 - Sigstore transparency entry: 1753275180
- Sigstore integration time:
-
Permalink:
ca1773130n/autoresearch-core@94e9973396b26b84722c0d1d765419ccf0bd7739 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/ca1773130n
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@94e9973396b26b84722c0d1d765419ccf0bd7739 -
Trigger Event:
push
-
Statement type:
File details
Details for the file autoresearch_core-0.4.4-py3-none-any.whl.
File metadata
- Download URL: autoresearch_core-0.4.4-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3b763acc371198b572ccbfb86db7a4e1783c3ae5c75682a42b94a0072802db3
|
|
| MD5 |
372f75a74c68084236f82c5ae944e4d4
|
|
| BLAKE2b-256 |
218e09a6c3478f4486bb4551e2bf1ac2363b7c392cec62019ffb7d9bdd80898f
|
Provenance
The following attestation bundles were made for autoresearch_core-0.4.4-py3-none-any.whl:
Publisher:
publish.yml on ca1773130n/autoresearch-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autoresearch_core-0.4.4-py3-none-any.whl -
Subject digest:
f3b763acc371198b572ccbfb86db7a4e1783c3ae5c75682a42b94a0072802db3 - Sigstore transparency entry: 1753275258
- Sigstore integration time:
-
Permalink:
ca1773130n/autoresearch-core@94e9973396b26b84722c0d1d765419ccf0bd7739 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/ca1773130n
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@94e9973396b26b84722c0d1d765419ccf0bd7739 -
Trigger Event:
push
-
Statement type: