Skip to main content

Structural reliability critics for the OpenHands Agent SDK — certified stagnation detection, backed by the Operon categorical framework.

Project description

operon-openhands-gates

In-loop structural reliability critics for the OpenHands Agent SDK — drop-in, cert-emitting.

OpenHands' own docs flag an architectural gap in iterative refinement:

"the current implementation relies solely on threshold/iteration limits rather than monitoring improvement velocity or convergence rates — suggesting this is an architectural gap where monitoring logic could plug in."https://docs.openhands.dev/sdk/guides/iterative-refinement

This package ships the missing monitor as a CriticBase subclass. It replaces an LLM-judged success score with a Bayesian stagnation signal computed over the conversation's message history. When the agent goes in circles, the critic's score drops below threshold, iterative refinement terminates, and a replayable behavioral_stability certificate is emitted.

At a glance:

  • OperonStagnationCriticepiplexic_integral-based detection (Paper 4 §4.3, 0.960 convergence accuracy with real embeddings) that plugs directly into Agent(critic=...).
  • One certificate per detection transition, self-verifiable via certificate.verify().
  • Zero-dep NGramEmbedder default — bring your own neural embedder for paraphrase-robust detection.

Install

pip install operon-openhands-gates

Requires operon-ai>=0.34.4 and openhands-sdk>=1.15.

Quickstart

from openhands.sdk import Agent, Conversation, LLM
from openhands.sdk.critic.base import IterativeRefinementConfig
from operon_openhands_gates import OperonStagnationCritic

critic = OperonStagnationCritic(
    threshold=0.2,
    critical_duration=3,
    iterative_refinement=IterativeRefinementConfig(
        success_threshold=0.2,  # match the critic's threshold
        max_iterations=5,
    ),
)

agent = Agent(llm=LLM(model="anthropic/claude-sonnet-4-5"), tools=[...], critic=critic)
conversation = Conversation(agent=agent, workspace=workspace)
conversation.send_message("Fix the failing test in ...")
conversation.run()  # iterative refinement terminates on sustained stagnation

if critic.certificate is not None:
    # Replayable evidence of what the gate saw.
    verification = critic.certificate.verify()
    assert verification.holds

Why the non-default success_threshold

OpenHands' default success_threshold=0.6 is tuned for LLM probability-of-success scores. OperonStagnationCritic returns the epiplexic_integral directly — in [0, 1] where low = stagnant. Paper 4 §4.3 uses δ=0.2 as the stagnation threshold, so match it on the refinement config.

Sibling package

  • operon-langgraph-gates — same Paper 4 substrate, same behavioral_stability_windowed certificate, targeted at LangGraph's StateGraph with .wrap() / .edge() node APIs. Two packages, one core — this is the framework-portability claim from Paper 5 §3 in code.

Certificate theorem name and verification

Certificates emitted by this package carry the theorem name behavioral_stability_windowed (not the core's shared behavioral_stability). The two differ in how they verify:

  • behavioral_stability (shared core): mean(severities) < threshold. Loses the per-window structure that rolling-integral detection operates on.
  • behavioral_stability_windowed (shared core, since operon-ai 0.36.0): max(per_window_severity_means) <= stability_threshold. Mirrors detection exactly.

Both verifiers are registered in operon_ai.core.certificate._THEOREM_FN_PATHS, so deserialized certificates resolve through _resolve_verify_fn without this package needing to be imported. Any consumer with operon-ai>=0.36.0 can round-trip a behavioral_stability_windowed certificate correctly.

Breaking change from pre-alpha prototypes

Earlier pre-release builds emitted certificates with theorem name behavioral_stability (the shared core name), bound to a locally-attached _verify_fn. That shape was semantically wrong — the shared verifier is flat-mean-based, so any cert round-tripped through serialization would silently revert to the wrong replay logic. Consumers that key on certificate.theorem == "behavioral_stability" or metadata["certificate_theorem"] == "behavioral_stability" must update to "behavioral_stability_windowed". No migration path is provided; alpha.

Citations

Backed by Paper 4 §4.3: convergence/false-stagnation accuracy 0.960 with real sentence embeddings (all-MiniLM-L6-v2, N = 300 trials). Full numbers and reproduction commands in the Operon repo at eval/results/benchmarks_real_embeddings/multi_model_summary.json. Paper 5 §3 establishes the preservation-under-compilation framework that the certificate follows.

Status

Alpha. API may change before 0.1.0 stable. Feedback welcome via Issues.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

operon_openhands_gates-0.1.0a3.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

operon_openhands_gates-0.1.0a3-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file operon_openhands_gates-0.1.0a3.tar.gz.

File metadata

  • Download URL: operon_openhands_gates-0.1.0a3.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for operon_openhands_gates-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 fd8c3a9b6351babfc2e6493eac18b43948c4fe580bfd5fc58f7bd1faee9c663d
MD5 a61721ae00907a42cefccfb33ab0d144
BLAKE2b-256 5602f652116d56e20c2dec6391d8acd9f5d484ddc4e4a0236889ab318729cf50

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a3.tar.gz:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file operon_openhands_gates-0.1.0a3-py3-none-any.whl.

File metadata

File hashes

Hashes for operon_openhands_gates-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 88c115c236c8827f490bb6bbf57c8072594c686c9dcae2a55b470114217afd22
MD5 8c1dc9da7e17cbd1eb35da52be99e948
BLAKE2b-256 bd059896a4cbd1c2b360c9570cc734bbe1a8fef98ac4bf07128a08957c307f11

See more details on using hashes here.

Provenance

The following attestation bundles were made for operon_openhands_gates-0.1.0a3-py3-none-any.whl:

Publisher: publish.yml on coredipper/operon-openhands-gates

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page