Skip to main content

Drop-in agent observability + steering for LangChain, LangGraph, OpenAI Agents SDK, Claude Agent SDK, and the Anthropic Messages API.

Project description

ReasonBlocks SDK

Python SDK that integrates ReasonBlocks into LangChain 1.0 agents as middleware. Scores each agent step, injects steering signals and E-traces, and optionally switches models based on difficulty.

Installation

pip install reasonblocks

Quick Start

from reasonblocks import ReasonBlocks

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[...],
    system_prompt="...",
    middleware=[rb.middleware()],
)

One import. One init. One middleware addition.

Tagging runs for the dashboard

rb.middleware() accepts identifying metadata that lands on the dashboard runs row:

agent = create_agent(
    ...,
    middleware=[rb.middleware(
        org_id="6d3f...",         # uuid; "default" if omitted
        project_id="a91b...",     # uuid; "default" if omitted
        run_id="my-run-1",        # auto-generated if omitted
        agent_name="bugfixer",    # free-form filter key
        task="fix the TypeError",
        model="claude-sonnet-4-20250514",
        framework="langchain",
        codebase_id="myrepo@sha:abc123",
    )],
)

The dashboard's Quickstart page (/platform/dashboard/quickstart) shows the user's actual org_id / project_id next to a copy-pasteable snippet — the easiest way to get the values without hardcoding.

Configuration

rb = ReasonBlocks(
    api_key="rb-...",
    token_budget=100_000,
    monitor_names=["loop", "confidence", "evidence", "budget", "strategy_exhaustion"],
    fsm_thresholds={
        "fast_threshold": 0.2,
        "slow_threshold": 0.6,
        "skip_threshold": 0.85,
    },
    model_routing={
        "FAST": "anthropic:claude-haiku-4-5-20251001",
        "SLOW": "anthropic:claude-sonnet-4-20250514",
    },
    e_traces_enabled=True,
)

Validated modes

Presets that wire one of the configurations measured on a real benchmark. Each mode is a one-liner that mounts the exact middleware stack that produced the published number — same thresholds, same rule pack, same priority order.

mode="code_review"

The SWE-bench Pro D-arm stack — code-review reactive monitor (7 rules), tool-output compression (head+tail at 1800 chars, keep most-recent 2), early-exit nudge.

Validated headline (paired n=75, claude-sonnet-4-6, real Docker grading; see swebench-pro-bench/results/compare_a_cv1_d.json):

arm pass rate mean input tokens vs baseline
baseline 25.3% 1,257,316
mode="code_review" (D arm) 25.4% 606,212 −51.8% tokens, flat accuracy
(alt) enable_general_monitor=True (C_v1) 36.0% 1,136,946 +10.7pp accuracy, −9.6% tokens

Use mode="code_review" when you want the maximum cost cut at unchanged success; use the C_v1 monitor when you want the accuracy lift.

One-liner install — drop into any LangChain agent:

from reasonblocks import for_code_review

agent = create_agent(
    model="claude-sonnet-4-6",
    tools=[bash_tool, ...],
    middleware=for_code_review(
        fail_to_pass_tests=task.fail_to_pass,  # SWE-bench Pro metadata
        max_tool_calls=50,                     # the budget the monitor reasons about
    ),
)

Or via the unified config (composes with E-traces, routing, etc.):

from reasonblocks import ReasonBlocksConfig, build_middleware

cfg = ReasonBlocksConfig.from_mode(
    "code_review",
    cr_fail_to_pass_tests=task.fail_to_pass,
    cr_max_tool_calls=50,
)
middleware = build_middleware(cfg)

What the stack does on each step:

  1. Detects the v1 failure patterns (semantic loop, repeated error, edits without test, edits inside site-packages, missed FAIL_TO_PASS tests, half-budget no-edits, final-10% over-editing) and injects a short corrective hint when one fires.
  2. Compresses any ToolMessage whose content exceeds 1800 characters using head+tail truncation; leaves the most-recent 2 tool messages untouched so the agent keeps full visibility into the step it's actively reasoning about.
  3. Once past call index 40, if the monitor signals overthinking (rules 1/3/4), prepends a "stop investigating, submit your best answer" nudge.

Bash-tool name autodetection covers bash, shell, run_command, execute, and run_bash out of the box; pass bash_tool_names=("your_tool",) to override.

Architecture

The middleware hooks into two points in the LangChain agent loop:

  • before_model -- scores the agent's last reasoning step, updates the FSM state, runs all monitors, retrieves E-traces, and injects steering signals as a system message.
  • wrap_model_call -- overrides the model based on FSM state (if routing is configured) and tracks token usage.

Development

pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reasonblocks-0.2.4.tar.gz (148.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reasonblocks-0.2.4-py3-none-any.whl (177.1 kB view details)

Uploaded Python 3

File details

Details for the file reasonblocks-0.2.4.tar.gz.

File metadata

  • Download URL: reasonblocks-0.2.4.tar.gz
  • Upload date:
  • Size: 148.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.4.tar.gz
Algorithm Hash digest
SHA256 aa44dd0541a2e4dc2e5289ed70f1b54773f07d5cf7f1a63e1faef4018f7ab05d
MD5 9e65aaae00ea77035ec132fe9a66953d
BLAKE2b-256 172891601b59f348ee96b94e3d719f3bb598b6f5bc33653da090ec1080fe9f49

See more details on using hashes here.

File details

Details for the file reasonblocks-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: reasonblocks-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 177.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ce622b8f65e3de796b145870c76c256ba24d3eee4deb611b32d642918dc0dfa0
MD5 7f26442b11bd45c23f3a1b5d782cfcfc
BLAKE2b-256 f15962e937f292b4643d0b9c95b30d67e12d8044309ca4a4ff4d9d1fcafe5841

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page