Skip to main content

Drop-in agent observability + steering for LangChain, LangGraph, OpenAI Agents SDK, Claude Agent SDK, and the Anthropic Messages API.

Project description

ReasonBlocks SDK

Python SDK that integrates ReasonBlocks into LangChain 1.0 agents as middleware. Scores each agent step, injects steering signals and E-traces, and optionally switches models based on difficulty.

Installation

pip install reasonblocks

Quick Start

from reasonblocks import ReasonBlocks

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[...],
    system_prompt="...",
    middleware=[rb.middleware()],
)

One import. One init. One middleware addition.

Tagging runs for the dashboard

rb.middleware() accepts identifying metadata that lands on the dashboard runs row:

agent = create_agent(
    ...,
    middleware=[rb.middleware(
        org_id="6d3f...",         # uuid; "default" if omitted
        project_id="a91b...",     # uuid; "default" if omitted
        run_id="my-run-1",        # auto-generated if omitted
        agent_name="bugfixer",    # free-form filter key
        task="fix the TypeError",
        model="claude-sonnet-4-20250514",
        framework="langchain",
        codebase_id="myrepo@sha:abc123",
    )],
)

The dashboard's Quickstart page (/platform/dashboard/quickstart) shows the user's actual org_id / project_id next to a copy-pasteable snippet — the easiest way to get the values without hardcoding.

Configuration

rb = ReasonBlocks(
    api_key="rb-...",
    token_budget=100_000,
    monitor_names=["loop", "confidence", "evidence", "budget", "strategy_exhaustion"],
    fsm_thresholds={
        "fast_threshold": 0.2,
        "slow_threshold": 0.6,
        "skip_threshold": 0.85,
    },
    model_routing={
        "FAST": "anthropic:claude-haiku-4-5-20251001",
        "SLOW": "anthropic:claude-sonnet-4-20250514",
    },
    e_traces_enabled=True,
)

Validated modes

Presets that wire one of the configurations measured on a real benchmark. Each mode is a one-liner that mounts the exact middleware stack that produced the published number — same thresholds, same rule pack, same priority order.

mode="code_review"

The SWE-bench Pro D-arm stack — code-review reactive monitor (7 rules), tool-output compression (head+tail at 1800 chars, keep most-recent 2), early-exit nudge.

Validated headline (paired n=75, claude-sonnet-4-6, real Docker grading; see swebench-pro-bench/results/compare_a_cv1_d.json):

arm pass rate mean input tokens vs baseline
baseline 25.3% 1,257,316
mode="code_review" (D arm) 25.4% 606,212 −51.8% tokens, flat accuracy
(alt) enable_general_monitor=True (C_v1) 36.0% 1,136,946 +10.7pp accuracy, −9.6% tokens

Use mode="code_review" when you want the maximum cost cut at unchanged success; use the C_v1 monitor when you want the accuracy lift.

One-liner install — drop into any LangChain agent:

from reasonblocks import for_code_review

agent = create_agent(
    model="claude-sonnet-4-6",
    tools=[bash_tool, ...],
    middleware=for_code_review(
        fail_to_pass_tests=task.fail_to_pass,  # SWE-bench Pro metadata
        max_tool_calls=50,                     # the budget the monitor reasons about
    ),
)

Or via the unified config (composes with E-traces, routing, etc.):

from reasonblocks import ReasonBlocksConfig, build_middleware

cfg = ReasonBlocksConfig.from_mode(
    "code_review",
    cr_fail_to_pass_tests=task.fail_to_pass,
    cr_max_tool_calls=50,
)
middleware = build_middleware(cfg)

What the stack does on each step:

  1. Detects the v1 failure patterns (semantic loop, repeated error, edits without test, edits inside site-packages, missed FAIL_TO_PASS tests, half-budget no-edits, final-10% over-editing) and injects a short corrective hint when one fires.
  2. Compresses any ToolMessage whose content exceeds 1800 characters using head+tail truncation; leaves the most-recent 2 tool messages untouched so the agent keeps full visibility into the step it's actively reasoning about.
  3. Once past call index 40, if the monitor signals overthinking (rules 1/3/4), prepends a "stop investigating, submit your best answer" nudge.

Bash-tool name autodetection covers bash, shell, run_command, execute, and run_bash out of the box; pass bash_tool_names=("your_tool",) to override.

Architecture

The middleware hooks into two points in the LangChain agent loop:

  • before_model -- scores the agent's last reasoning step, updates the FSM state, runs all monitors, retrieves E-traces, and injects steering signals as a system message.
  • wrap_model_call -- overrides the model based on FSM state (if routing is configured) and tracks token usage.

Development

pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reasonblocks-0.2.5.tar.gz (230.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reasonblocks-0.2.5-py3-none-any.whl (280.5 kB view details)

Uploaded Python 3

File details

Details for the file reasonblocks-0.2.5.tar.gz.

File metadata

  • Download URL: reasonblocks-0.2.5.tar.gz
  • Upload date:
  • Size: 230.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.5.tar.gz
Algorithm Hash digest
SHA256 aad38541af11063e08f17cefcb0047e78c6888a9f13b936e1e0d3c5ae49e0398
MD5 e9346b1d337436fd9c36a099d735f44a
BLAKE2b-256 bb858207c0635fd215d0cabe477f0e1e9b859fa6dda987aed695ec71a064c3a9

See more details on using hashes here.

File details

Details for the file reasonblocks-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: reasonblocks-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 280.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9b51d27a3137f1bff0b009be8c8285370f869f39dc1d2b620f2c866c1dc7457a
MD5 4dc85f9292807703ecd722e8c521ea89
BLAKE2b-256 6f9212cceb77b4da1bbccc66e32b3e6309ceb96aa15e72f7df74c05cf2204389

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page