Skip to main content

Structural reliability critics for the OpenHands Agent SDK — certified stagnation detection, backed by the Operon categorical framework.

Project description

operon-openhands-gates

In-loop structural reliability critics for the OpenHands Agent SDK — drop-in, cert-emitting.

OpenHands' own docs flag an architectural gap in iterative refinement:

"the current implementation relies solely on threshold/iteration limits rather than monitoring improvement velocity or convergence rates — suggesting this is an architectural gap where monitoring logic could plug in."https://docs.openhands.dev/sdk/guides/iterative-refinement

This package ships the missing monitor as a CriticBase subclass. It replaces an LLM-judged success score with a Bayesian stagnation signal computed over the conversation's message history. When the agent goes in circles, the critic's score drops below threshold, iterative refinement terminates, and a replayable behavioral_stability certificate is emitted.

At a glance:

  • OperonStagnationCriticepiplexic_integral-based detection (Paper 4 §4.3, 0.960 convergence accuracy with real embeddings) that plugs directly into Agent(critic=...).
  • One certificate per detection transition, self-verifiable via certificate.verify().
  • Zero-dep NGramEmbedder default — bring your own neural embedder for paraphrase-robust detection.

Install

pip install operon-openhands-gates

Requires operon-ai>=0.34.4 and openhands-sdk>=1.15.

Quickstart

from openhands.sdk import Agent, Conversation, LLM
from openhands.sdk.critic.base import IterativeRefinementConfig
from operon_openhands_gates import OperonStagnationCritic

critic = OperonStagnationCritic(
    threshold=0.2,
    critical_duration=3,
    iterative_refinement=IterativeRefinementConfig(
        success_threshold=0.2,  # match the critic's threshold
        max_iterations=5,
    ),
)

agent = Agent(llm=LLM(model="anthropic/claude-sonnet-4-5"), tools=[...], critic=critic)
conversation = Conversation(agent=agent, workspace=workspace)
conversation.send_message("Fix the failing test in ...")
conversation.run()  # iterative refinement terminates on sustained stagnation

if critic.certificate is not None:
    # Replayable evidence of what the gate saw.
    verification = critic.certificate.verify()
    assert verification.holds

Why the non-default success_threshold

OpenHands' default success_threshold=0.6 is tuned for LLM probability-of-success scores. OperonStagnationCritic returns the epiplexic_integral directly — in [0, 1] where low = stagnant. Paper 4 §4.3 uses δ=0.2 as the stagnation threshold, so match it on the refinement config.

Sibling package

  • operon-langgraph-gates — same Paper 4 substrate, same behavioral_stability_windowed certificate, targeted at LangGraph's StateGraph with .wrap() / .edge() node APIs. Two packages, one core — this is the framework-portability claim from Paper 5 §3 in code.

Certificate theorem name and verification

Certificates emitted by this package carry the theorem name behavioral_stability_windowed (not the core's shared behavioral_stability). The two differ in how they verify:

  • behavioral_stability (shared core): mean(severities) < threshold. Loses the per-window structure that rolling-integral detection operates on.
  • behavioral_stability_windowed (shared core, since operon-ai 0.36.0): max(per_window_severity_means) <= stability_threshold. Mirrors detection exactly.

Both verifiers are registered in operon_ai.core.certificate._THEOREM_FN_PATHS, so deserialized certificates resolve through _resolve_verify_fn without this package needing to be imported. Any consumer with operon-ai>=0.36.0 can round-trip a behavioral_stability_windowed certificate correctly.

Breaking change from pre-alpha prototypes

Earlier pre-release builds emitted certificates with theorem name behavioral_stability (the shared core name), bound to a locally-attached _verify_fn. That shape was semantically wrong — the shared verifier is flat-mean-based, so any cert round-tripped through serialization would silently revert to the wrong replay logic. Consumers that key on certificate.theorem == "behavioral_stability" or metadata["certificate_theorem"] == "behavioral_stability" must update to "behavioral_stability_windowed". No migration path is provided; alpha.

Citations

Backed by Paper 4 §4.3: convergence/false-stagnation accuracy 0.960 with real sentence embeddings (all-MiniLM-L6-v2, N = 300 trials). Full numbers and reproduction commands in the Operon repo at eval/results/benchmarks_real_embeddings/multi_model_summary.json. Paper 5 §3 establishes the preservation-under-compilation framework that the certificate follows.

Status

Alpha. API may change before 0.1.0 stable. Feedback welcome via Issues.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

operon_openhands_gates-0.1.0a1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

operon_openhands_gates-0.1.0a1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file operon_openhands_gates-0.1.0a1.tar.gz.

File metadata

  • Download URL: operon_openhands_gates-0.1.0a1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for operon_openhands_gates-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 204040b5f3e05ba429f8a33822e934af7a171d92c9e9b06807de7bac848e55ca
MD5 6e1a2d74688652ff5a631249f26e4057
BLAKE2b-256 403f0b2510e97e498596df5053acf25449b93d406a8d286830e68cafa637feb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a1.tar.gz:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file operon_openhands_gates-0.1.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for operon_openhands_gates-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 547ab5763ff0cab0ff31c6c53a9152eb49ad664a39da24ef8d68be24aed8be1e
MD5 88ff43dd8658a8e1c4dcf720699a8314
BLAKE2b-256 a12d1e4d8e5d8ebfbb090f5272b6ff69b952d3772ec4534e19cdcb213443088a

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a1-py3-none-any.whl:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page