Skip to main content

Drop-in agent observability + steering for LangChain, LangGraph, OpenAI Agents SDK, Claude Agent SDK, and the Anthropic Messages API.

Project description

ReasonBlocks SDK

Python SDK that integrates ReasonBlocks into LangChain 1.0 agents as middleware. Scores each agent step, injects steering signals and E-traces, and optionally switches models based on difficulty.

Installation

pip install reasonblocks

Quick Start

from reasonblocks import ReasonBlocks

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[...],
    system_prompt="...",
    middleware=[rb.middleware()],
)

One import. One init. One middleware addition.

Tagging runs for the dashboard

rb.middleware() accepts identifying metadata that lands on the dashboard runs row:

agent = create_agent(
    ...,
    middleware=[rb.middleware(
        org_id="6d3f...",         # uuid; "default" if omitted
        project_id="a91b...",     # uuid; "default" if omitted
        run_id="my-run-1",        # auto-generated if omitted
        agent_name="bugfixer",    # free-form filter key
        task="fix the TypeError",
        model="claude-sonnet-4-20250514",
        framework="langchain",
        codebase_id="myrepo@sha:abc123",
    )],
)

The dashboard's Quickstart page (/platform/dashboard/quickstart) shows the user's actual org_id / project_id next to a copy-pasteable snippet — the easiest way to get the values without hardcoding.

Configuration

rb = ReasonBlocks(
    api_key="rb-...",
    token_budget=100_000,
    monitor_names=["loop", "confidence", "evidence", "budget", "strategy_exhaustion"],
    fsm_thresholds={
        "fast_threshold": 0.2,
        "slow_threshold": 0.6,
        "skip_threshold": 0.85,
    },
    model_routing={
        "FAST": "anthropic:claude-haiku-4-5-20251001",
        "SLOW": "anthropic:claude-sonnet-4-20250514",
    },
    e_traces_enabled=True,
)

Validated modes

Presets that wire one of the configurations measured on a real benchmark. Each mode is a one-liner that mounts the exact middleware stack that produced the published number — same thresholds, same rule pack, same priority order.

mode="code_review"

The SWE-bench Pro D-arm stack — code-review reactive monitor (7 rules), tool-output compression (head+tail at 1800 chars, keep most-recent 2), early-exit nudge.

Validated headline (paired n=75, claude-sonnet-4-6, real Docker grading; see swebench-pro-bench/results/compare_a_cv1_d.json):

arm pass rate mean input tokens vs baseline
baseline 25.3% 1,257,316
mode="code_review" (D arm) 25.4% 606,212 −51.8% tokens, flat accuracy
(alt) enable_general_monitor=True (C_v1) 36.0% 1,136,946 +10.7pp accuracy, −9.6% tokens

Use mode="code_review" when you want the maximum cost cut at unchanged success; use the C_v1 monitor when you want the accuracy lift.

One-liner install — drop into any LangChain agent:

from reasonblocks import for_code_review

agent = create_agent(
    model="claude-sonnet-4-6",
    tools=[bash_tool, ...],
    middleware=for_code_review(
        fail_to_pass_tests=task.fail_to_pass,  # SWE-bench Pro metadata
        max_tool_calls=50,                     # the budget the monitor reasons about
    ),
)

Or via the unified config (composes with E-traces, routing, etc.):

from reasonblocks import ReasonBlocksConfig, build_middleware

cfg = ReasonBlocksConfig.from_mode(
    "code_review",
    cr_fail_to_pass_tests=task.fail_to_pass,
    cr_max_tool_calls=50,
)
middleware = build_middleware(cfg)

What the stack does on each step:

  1. Detects the v1 failure patterns (semantic loop, repeated error, edits without test, edits inside site-packages, missed FAIL_TO_PASS tests, half-budget no-edits, final-10% over-editing) and injects a short corrective hint when one fires.
  2. Compresses any ToolMessage whose content exceeds 1800 characters using head+tail truncation; leaves the most-recent 2 tool messages untouched so the agent keeps full visibility into the step it's actively reasoning about.
  3. Once past call index 40, if the monitor signals overthinking (rules 1/3/4), prepends a "stop investigating, submit your best answer" nudge.

Bash-tool name autodetection covers bash, shell, run_command, execute, and run_bash out of the box; pass bash_tool_names=("your_tool",) to override.

Architecture

The middleware hooks into two points in the LangChain agent loop:

  • before_model -- scores the agent's last reasoning step, updates the FSM state, runs all monitors, retrieves E-traces, and injects steering signals as a system message.
  • wrap_model_call -- overrides the model based on FSM state (if routing is configured) and tracks token usage.

Development

pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reasonblocks-0.2.3.tar.gz (147.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reasonblocks-0.2.3-py3-none-any.whl (175.7 kB view details)

Uploaded Python 3

File details

Details for the file reasonblocks-0.2.3.tar.gz.

File metadata

  • Download URL: reasonblocks-0.2.3.tar.gz
  • Upload date:
  • Size: 147.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.3.tar.gz
Algorithm Hash digest
SHA256 7ed8a7b261bd0aa9ab67b1be65e8e45c4c8bb0a09f8fa34b248710df4e936acb
MD5 1171bd7a93505a4570a55b970d6a28f0
BLAKE2b-256 61297f447398e571e80c4aafa2be73ab6ed7b3575c9234eb7fe658969694b2c0

See more details on using hashes here.

File details

Details for the file reasonblocks-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: reasonblocks-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 175.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for reasonblocks-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f041ff365f3c708c5fb08cf8321e2adf289f14c578f82154c15d9df0c512f7f0
MD5 0e8c15f1911b5ed792919bebab928c5f
BLAKE2b-256 c527f891692f0ed072d276f0b09ceb21bc0ac0e3fc37990825b184656e6df526

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page