Skip to main content

Decision-contracts library: deterministic verdict/gate/failure/promotion logic for autoresearch loops.

Project description

autoresearch-core

CI PyPI Python License: MIT

A tiny, pure-Python decision-contracts library for autoresearch / agentic loops: a deterministic verdict (metric / comparator / target), failure classification, gates, and promotion record shapes — the disciplined decision core, with zero runtime dependencies and no I/O.

You bring the loop, the retrieval, the runner, and the storage; you bind them to the library's Protocols and call measure / decide / should_promote_dead_end at your decision points. The verdict logic is parity-tested against the GRD autoresearch loop.

Why

Agentic research loops fail in a predictable way: the model grades its own homework. An LLM proposes a hypothesis, runs an experiment, then judges whether the result supports the hypothesis — and judgment drifts. autoresearch-core removes the judge from the control path:

  1. Every hypothesis must carry a machine-readable contract (MetricSpec: which metric, which comparator, which target).
  2. Experiments report results through a machine-readable line (__RESULT__ {"accuracy": 0.93} on stdout).
  3. The verdict is computed, not judged: metric vs target → supported / refuted / inconclusive.
  4. Only a deterministic refutation may auto-promote a dead-end. Anything judged by an LLM or inferred from an exit code is advisory.

Install

pip install autoresearch-core

Requires Python 3.11+. No runtime dependencies. Fully typed (py.typed).

Quickstart

from autoresearch_core import (
    MetricSpec, ExperimentResult, measure, parse_metrics_line, should_promote_dead_end,
)

spec = MetricSpec(metric_key="recall_at_10", comparator=">=", target=0.8)

# An experiment prints `__RESULT__ {"recall_at_10": 0.83}` on stdout:
metrics = parse_metrics_line(stdout)        # -> {"recall_at_10": 0.83}
verdict = measure(spec, ExperimentResult(metrics=metrics, exit_code=0))

verdict.verdict          # "supported" | "refuted" | "inconclusive"  (deterministic)
verdict.evidence_level   # "deterministic"
should_promote_dead_end(verdict)            # True only for a deterministic refutation

Documentation

  • QUICKSTART — zero to a working deterministic verdict in five minutes, with a complete runnable script.
  • TUTORIAL — build a full hypothesis → experiment → measure → learn loop on top of the library: contracts, failure classes, gates, dead-end promotion, infrastructure ports, and custom verdict strategies.
  • RELEASING — ownership, versioning policy, release procedure, and the shared parity-vector contract with GRD.
  • CHANGELOG

What it owns (and what it doesn't)

Owns — the decision discipline:

Module Public surface Job
types MetricSpec, ExperimentResult, VerdictRecord, Hypothesis, Takeaway, GateState, GateCheck Frozen dataclasses; pure data, no logic
contract parse_metrics_line, validate_metric_spec The __RESULT__ {json} experiment-result contract
verdict compare, DeterministicVerdict, VerdictStrategy Metric vs target → supported / refuted / inconclusive
failures classify_run_failure stderr → H2 (missing dep) / H3 (missing file / permission) / H4 (timeout / runtime) / none
gates resolve_gates, check_gate Approval gates (execute, kg_write) resolved from config
policy measure, decide, decide_branch, should_terminate, detect_plateau, should_promote_dead_end The loop's branch / terminate / promote decisions
promote DeadEndRecord, KnowhowRecord, approach_hash, build_dead_end_record, should_skip Promotion record shapes + approach dedupe

Doesn't own — bind these via ports.py Protocols to your own infra: Spawn (LLM call), Retriever, KnowledgeGraph, ExperimentRunner, Store. No implementations ship in this package; the tutorial shows minimal bindings.

Verdict authority

DeterministicVerdict is the default and the reason this package exists. Other strategies (an LLM judge, an exit-code check) can be plugged in via the VerdictStrategy protocol, but only a deterministic refutation auto-promotes a dead-end — non-deterministic verdicts are advisory. Every verdict records its strategy and evidence_level, so the decision trail stays auditable.

Development

pip install -e ".[dev]"
pytest -q --cov=autoresearch_core

The test suite includes a parity suite (tests/test_parity.py) that pins behaviour to the GRD TypeScript implementation.

License

MIT © Cameleon X — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoresearch_core-0.4.3.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoresearch_core-0.4.3-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file autoresearch_core-0.4.3.tar.gz.

File metadata

  • Download URL: autoresearch_core-0.4.3.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autoresearch_core-0.4.3.tar.gz
Algorithm Hash digest
SHA256 a3ecbd90d0ec84573251c974c3420661ee01da52fe773af9b292afafe5fb495a
MD5 163963538a6d0998fcb984992635b439
BLAKE2b-256 a910b4693f8e0a71d55d48ca12fe2e25a196743f10a86ea9a862ffd7bc183616

See more details on using hashes here.

Provenance

The following attestation bundles were made for autoresearch_core-0.4.3.tar.gz:

Publisher: publish.yml on ca1773130n/autoresearch-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autoresearch_core-0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for autoresearch_core-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cb73356ae0b0589a12d0b8bd55271fdf9abaed4e8a41afec4545ec33f95b2117
MD5 96929a8ea5de28d03760c1d34c84447d
BLAKE2b-256 b3dc0dd7e9a83f55d3eb778642f877e236f04ddd835ef005098c10df5e8865b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for autoresearch_core-0.4.3-py3-none-any.whl:

Publisher: publish.yml on ca1773130n/autoresearch-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page