Skip to main content

Structural reliability critics for the OpenHands Agent SDK — certified stagnation detection, backed by the Operon categorical framework.

Project description

operon-openhands-gates

In-loop structural reliability critics for the OpenHands Agent SDK — drop-in, cert-emitting.

OpenHands' own docs flag an architectural gap in iterative refinement:

"the current implementation relies solely on threshold/iteration limits rather than monitoring improvement velocity or convergence rates — suggesting this is an architectural gap where monitoring logic could plug in."https://docs.openhands.dev/sdk/guides/iterative-refinement

This package ships the missing monitor as a CriticBase subclass. It replaces an LLM-judged success score with a Bayesian stagnation signal computed over the conversation's message history. When the agent goes in circles, the critic's score drops below threshold, iterative refinement terminates, and a replayable behavioral_stability certificate is emitted.

At a glance:

  • OperonStagnationCriticepiplexic_integral-based detection (Paper 4 §4.3, 0.960 convergence accuracy with real embeddings) that plugs directly into Agent(critic=...).
  • One certificate per detection transition, self-verifiable via certificate.verify().
  • Zero-dep NGramEmbedder default — bring your own neural embedder for paraphrase-robust detection.

Install

pip install operon-openhands-gates

Requires operon-ai>=0.34.4 and openhands-sdk>=1.15.

Quickstart

from openhands.sdk import Agent, Conversation, LLM
from openhands.sdk.critic.base import IterativeRefinementConfig
from operon_openhands_gates import OperonStagnationCritic

critic = OperonStagnationCritic(
    threshold=0.2,
    critical_duration=3,
    iterative_refinement=IterativeRefinementConfig(
        success_threshold=0.2,  # match the critic's threshold
        max_iterations=5,
    ),
)

agent = Agent(llm=LLM(model="anthropic/claude-sonnet-4-5"), tools=[...], critic=critic)
conversation = Conversation(agent=agent, workspace=workspace)
conversation.send_message("Fix the failing test in ...")
conversation.run()  # iterative refinement terminates on sustained stagnation

if critic.certificate is not None:
    # Replayable evidence of what the gate saw.
    verification = critic.certificate.verify()
    assert verification.holds

Why the non-default success_threshold

OpenHands' default success_threshold=0.6 is tuned for LLM probability-of-success scores. OperonStagnationCritic returns the epiplexic_integral directly — in [0, 1] where low = stagnant. Paper 4 §4.3 uses δ=0.2 as the stagnation threshold, so match it on the refinement config.

Sibling package

  • operon-langgraph-gates — same Paper 4 substrate, same behavioral_stability_windowed certificate, targeted at LangGraph's StateGraph with .wrap() / .edge() node APIs. Two packages, one core — this is the framework-portability claim from Paper 5 §3 in code.

Certificate theorem name and verification

Certificates emitted by this package carry the theorem name behavioral_stability_windowed (not the core's shared behavioral_stability). The two differ in how they verify:

  • behavioral_stability (shared core): mean(severities) < threshold. Loses the per-window structure that rolling-integral detection operates on.
  • behavioral_stability_windowed (shared core, since operon-ai 0.36.0): max(per_window_severity_means) <= stability_threshold. Mirrors detection exactly.

Both verifiers are registered in operon_ai.core.certificate._THEOREM_FN_PATHS, so deserialized certificates resolve through _resolve_verify_fn without this package needing to be imported. Any consumer with operon-ai>=0.36.0 can round-trip a behavioral_stability_windowed certificate correctly.

Breaking change from pre-alpha prototypes

Earlier pre-release builds emitted certificates with theorem name behavioral_stability (the shared core name), bound to a locally-attached _verify_fn. That shape was semantically wrong — the shared verifier is flat-mean-based, so any cert round-tripped through serialization would silently revert to the wrong replay logic. Consumers that key on certificate.theorem == "behavioral_stability" or metadata["certificate_theorem"] == "behavioral_stability" must update to "behavioral_stability_windowed". No migration path is provided; alpha.

Citations

Backed by Paper 4 §4.3: convergence/false-stagnation accuracy 0.960 with real sentence embeddings (all-MiniLM-L6-v2, N = 300 trials). Full numbers and reproduction commands in the Operon repo at eval/results/benchmarks_real_embeddings/multi_model_summary.json. Paper 5 §3 establishes the preservation-under-compilation framework that the certificate follows.

Status

Alpha. API may change before 0.1.0 stable. Feedback welcome via Issues.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

operon_openhands_gates-0.1.0a2.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

operon_openhands_gates-0.1.0a2-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file operon_openhands_gates-0.1.0a2.tar.gz.

File metadata

  • Download URL: operon_openhands_gates-0.1.0a2.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for operon_openhands_gates-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 072127b702b9fa1a5709e486a4da3f9f7994e0f21998cccfd38ee126cec1912c
MD5 58fceace0ae4eb7892cd547b623777e0
BLAKE2b-256 16844444745bdffe897088a688e086239ab4aac9b839dea437ea227128892337

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a2.tar.gz:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file operon_openhands_gates-0.1.0a2-py3-none-any.whl.

File metadata

File hashes

Hashes for operon_openhands_gates-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f915bd15ec9ca870a7fb40358f47018390a0574c6661e1306d62fd7fb9b8c4
MD5 9a4f11805e4df77d48316d6df2d3bd87
BLAKE2b-256 b563f6d0a56fb6cfe5c7c9db5c6142b27c0d4441a2c7bb88540858c02c743dc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a2-py3-none-any.whl:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page