Drop-in agent observability + steering for LangChain, LangGraph, OpenAI Agents SDK, Claude Agent SDK, and the Anthropic Messages API.
Project description
ReasonBlocks SDK
Python SDK that integrates ReasonBlocks into LangChain 1.0 agents as middleware. Scores each agent step, injects steering signals and E-traces, and optionally switches models based on difficulty.
Installation
pip install reasonblocks
Quick Start
from reasonblocks import ReasonBlocks
rb = ReasonBlocks(api_key="rb_live_...")
agent = create_agent(
model="anthropic:claude-sonnet-4-20250514",
tools=[...],
system_prompt="...",
middleware=[rb.middleware()],
)
One import. One init. One middleware addition.
Tagging runs for the dashboard
rb.middleware() accepts identifying metadata that lands on the dashboard runs row:
agent = create_agent(
...,
middleware=[rb.middleware(
org_id="6d3f...", # uuid; "default" if omitted
project_id="a91b...", # uuid; "default" if omitted
run_id="my-run-1", # auto-generated if omitted
agent_name="bugfixer", # free-form filter key
task="fix the TypeError",
model="claude-sonnet-4-20250514",
framework="langchain",
codebase_id="myrepo@sha:abc123",
)],
)
The dashboard's Quickstart page (/platform/dashboard/quickstart) shows the user's actual org_id / project_id next to a copy-pasteable snippet — the easiest way to get the values without hardcoding.
Configuration
rb = ReasonBlocks(
api_key="rb-...",
token_budget=100_000,
monitor_names=["loop", "confidence", "evidence", "budget", "strategy_exhaustion"],
fsm_thresholds={
"fast_threshold": 0.2,
"slow_threshold": 0.6,
"skip_threshold": 0.85,
},
model_routing={
"FAST": "anthropic:claude-haiku-4-5-20251001",
"SLOW": "anthropic:claude-sonnet-4-20250514",
},
e_traces_enabled=True,
)
Validated modes
Presets that wire one of the configurations measured on a real benchmark. Each mode is a one-liner that mounts the exact middleware stack that produced the published number — same thresholds, same rule pack, same priority order.
mode="code_review"
The SWE-bench Pro D-arm stack — code-review reactive monitor (7 rules), tool-output compression (head+tail at 1800 chars, keep most-recent 2), early-exit nudge.
Validated headline (paired n=75, claude-sonnet-4-6, real Docker
grading; see swebench-pro-bench/results/compare_a_cv1_d.json):
| arm | pass rate | mean input tokens | vs baseline |
|---|---|---|---|
| baseline | 25.3% | 1,257,316 | — |
mode="code_review" (D arm) |
25.4% | 606,212 | −51.8% tokens, flat accuracy |
(alt) enable_general_monitor=True (C_v1) |
36.0% | 1,136,946 | +10.7pp accuracy, −9.6% tokens |
Use mode="code_review" when you want the maximum cost cut at unchanged
success; use the C_v1 monitor when you want the accuracy lift.
One-liner install — drop into any LangChain agent:
from reasonblocks import for_code_review
agent = create_agent(
model="claude-sonnet-4-6",
tools=[bash_tool, ...],
middleware=for_code_review(
fail_to_pass_tests=task.fail_to_pass, # SWE-bench Pro metadata
max_tool_calls=50, # the budget the monitor reasons about
),
)
Or via the unified config (composes with E-traces, routing, etc.):
from reasonblocks import ReasonBlocksConfig, build_middleware
cfg = ReasonBlocksConfig.from_mode(
"code_review",
cr_fail_to_pass_tests=task.fail_to_pass,
cr_max_tool_calls=50,
)
middleware = build_middleware(cfg)
What the stack does on each step:
- Detects the v1 failure patterns (semantic loop, repeated error, edits
without test, edits inside
site-packages, missed FAIL_TO_PASS tests, half-budget no-edits, final-10% over-editing) and injects a short corrective hint when one fires. - Compresses any
ToolMessagewhose content exceeds 1800 characters using head+tail truncation; leaves the most-recent 2 tool messages untouched so the agent keeps full visibility into the step it's actively reasoning about. - Once past call index 40, if the monitor signals overthinking (rules 1/3/4), prepends a "stop investigating, submit your best answer" nudge.
Bash-tool name autodetection covers bash, shell, run_command,
execute, and run_bash out of the box; pass
bash_tool_names=("your_tool",) to override.
Architecture
The middleware hooks into two points in the LangChain agent loop:
- before_model -- scores the agent's last reasoning step, updates the FSM state, runs all monitors, retrieves E-traces, and injects steering signals as a system message.
- wrap_model_call -- overrides the model based on FSM state (if routing is configured) and tracks token usage.
Development
pip install -e ".[dev]"
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reasonblocks-0.2.5.tar.gz.
File metadata
- Download URL: reasonblocks-0.2.5.tar.gz
- Upload date:
- Size: 230.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aad38541af11063e08f17cefcb0047e78c6888a9f13b936e1e0d3c5ae49e0398
|
|
| MD5 |
e9346b1d337436fd9c36a099d735f44a
|
|
| BLAKE2b-256 |
bb858207c0635fd215d0cabe477f0e1e9b859fa6dda987aed695ec71a064c3a9
|
File details
Details for the file reasonblocks-0.2.5-py3-none-any.whl.
File metadata
- Download URL: reasonblocks-0.2.5-py3-none-any.whl
- Upload date:
- Size: 280.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b51d27a3137f1bff0b009be8c8285370f869f39dc1d2b620f2c866c1dc7457a
|
|
| MD5 |
4dc85f9292807703ecd722e8c521ea89
|
|
| BLAKE2b-256 |
6f9212cceb77b4da1bbccc66e32b3e6309ceb96aa15e72f7df74c05cf2204389
|