Structural reliability critics for the OpenHands Agent SDK — certified stagnation detection, backed by the Operon categorical framework.
Project description
operon-openhands-gates
In-loop structural reliability critics for the OpenHands Agent SDK — drop-in, cert-emitting.
OpenHands' own docs flag an architectural gap in iterative refinement:
"the current implementation relies solely on threshold/iteration limits rather than monitoring improvement velocity or convergence rates — suggesting this is an architectural gap where monitoring logic could plug in." — https://docs.openhands.dev/sdk/guides/iterative-refinement
This package ships the missing monitor as a CriticBase subclass. It replaces an LLM-judged success score with a Bayesian stagnation signal computed over the conversation's message history. When the agent goes in circles, the critic's score drops below threshold, iterative refinement terminates, and a replayable behavioral_stability certificate is emitted.
At a glance:
OperonStagnationCritic—epiplexic_integral-based detection (Paper 4 §4.3, 0.960 convergence accuracy with real embeddings) that plugs directly intoAgent(critic=...).- One certificate per detection transition, self-verifiable via
certificate.verify(). - Zero-dep
NGramEmbedderdefault — bring your own neural embedder for paraphrase-robust detection.
Install
pip install operon-openhands-gates
Requires operon-ai>=0.34.4 and openhands-sdk>=1.15.
Quickstart
from openhands.sdk import Agent, Conversation, LLM
from openhands.sdk.critic.base import IterativeRefinementConfig
from operon_openhands_gates import OperonStagnationCritic
critic = OperonStagnationCritic(
threshold=0.2,
critical_duration=3,
iterative_refinement=IterativeRefinementConfig(
success_threshold=0.2, # match the critic's threshold
max_iterations=5,
),
)
agent = Agent(llm=LLM(model="anthropic/claude-sonnet-4-5"), tools=[...], critic=critic)
conversation = Conversation(agent=agent, workspace=workspace)
conversation.send_message("Fix the failing test in ...")
conversation.run() # iterative refinement terminates on sustained stagnation
if critic.certificate is not None:
# Replayable evidence of what the gate saw.
verification = critic.certificate.verify()
assert verification.holds
Why the non-default success_threshold
OpenHands' default success_threshold=0.6 is tuned for LLM probability-of-success scores. OperonStagnationCritic returns the epiplexic_integral directly — in [0, 1] where low = stagnant. Paper 4 §4.3 uses δ=0.2 as the stagnation threshold, so match it on the refinement config.
Sibling package
operon-langgraph-gates— same Paper 4 substrate, samebehavioral_stability_windowedcertificate, targeted at LangGraph'sStateGraphwith.wrap()/.edge()node APIs. Two packages, one core — this is the framework-portability claim from Paper 5 §3 in code.
Certificate theorem name and verification
Certificates emitted by this package carry the theorem name behavioral_stability_windowed (not the core's shared behavioral_stability). The two differ in how they verify:
behavioral_stability(shared core):mean(severities) < threshold. Loses the per-window structure that rolling-integral detection operates on.behavioral_stability_windowed(shared core, since operon-ai 0.36.0):max(per_window_severity_means) <= stability_threshold. Mirrors detection exactly.
Both verifiers are registered in operon_ai.core.certificate._THEOREM_FN_PATHS, so deserialized certificates resolve through _resolve_verify_fn without this package needing to be imported. Any consumer with operon-ai>=0.36.0 can round-trip a behavioral_stability_windowed certificate correctly.
Breaking change from pre-alpha prototypes
Earlier pre-release builds emitted certificates with theorem name behavioral_stability (the shared core name), bound to a locally-attached _verify_fn. That shape was semantically wrong — the shared verifier is flat-mean-based, so any cert round-tripped through serialization would silently revert to the wrong replay logic. Consumers that key on certificate.theorem == "behavioral_stability" or metadata["certificate_theorem"] == "behavioral_stability" must update to "behavioral_stability_windowed". No migration path is provided; alpha.
Citations
Backed by Paper 4 §4.3: convergence/false-stagnation accuracy 0.960 with real sentence embeddings (all-MiniLM-L6-v2, N = 300 trials). Full numbers and reproduction commands in the Operon repo at eval/results/benchmarks_real_embeddings/multi_model_summary.json. Paper 5 §3 establishes the preservation-under-compilation framework that the certificate follows.
Status
Alpha. API may change before 0.1.0 stable. Feedback welcome via Issues.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file operon_openhands_gates-0.1.0a2.tar.gz.
File metadata
- Download URL: operon_openhands_gates-0.1.0a2.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
072127b702b9fa1a5709e486a4da3f9f7994e0f21998cccfd38ee126cec1912c
|
|
| MD5 |
58fceace0ae4eb7892cd547b623777e0
|
|
| BLAKE2b-256 |
16844444745bdffe897088a688e086239ab4aac9b839dea437ea227128892337
|
Provenance
The following attestation bundles were made for operon_openhands_gates-0.1.0a2.tar.gz:
Publisher:
publish.yml on coredipper/operon-openhands-gates
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
operon_openhands_gates-0.1.0a2.tar.gz -
Subject digest:
072127b702b9fa1a5709e486a4da3f9f7994e0f21998cccfd38ee126cec1912c - Sigstore transparency entry: 1343427388
- Sigstore integration time:
-
Permalink:
coredipper/operon-openhands-gates@034adb4b48a103613e434a0088daf8f96fe5a4a6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/coredipper
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@034adb4b48a103613e434a0088daf8f96fe5a4a6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file operon_openhands_gates-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: operon_openhands_gates-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0f915bd15ec9ca870a7fb40358f47018390a0574c6661e1306d62fd7fb9b8c4
|
|
| MD5 |
9a4f11805e4df77d48316d6df2d3bd87
|
|
| BLAKE2b-256 |
b563f6d0a56fb6cfe5c7c9db5c6142b27c0d4441a2c7bb88540858c02c743dc6
|
Provenance
The following attestation bundles were made for operon_openhands_gates-0.1.0a2-py3-none-any.whl:
Publisher:
publish.yml on coredipper/operon-openhands-gates
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
operon_openhands_gates-0.1.0a2-py3-none-any.whl -
Subject digest:
b0f915bd15ec9ca870a7fb40358f47018390a0574c6661e1306d62fd7fb9b8c4 - Sigstore transparency entry: 1343427394
- Sigstore integration time:
-
Permalink:
coredipper/operon-openhands-gates@034adb4b48a103613e434a0088daf8f96fe5a4a6 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/coredipper
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@034adb4b48a103613e434a0088daf8f96fe5a4a6 -
Trigger Event:
release
-
Statement type: