Skip to main content

Anthropic's Advisor Strategy as a drop-in DeepAgents middleware — pair a powerful advisor with a fast executor

Project description


advisor-middleware

Anthropic's Advisor Strategy as a drop-in DeepAgents middleware.

PyPI License Python DeepAgents

ProblemHow it worksQuick startConfigurationBenchmark


Open-source implementation of Anthropic's Advisor Strategy — a pattern that pairs a fast, cheap executor model with a powerful advisor model. The executor runs end-to-end; the advisor is consulted only on critical decisions. Result: better performance at lower cost.

The Advisor Strategy

advisor-middleware makes this a single import for DeepAgents. It handles provider detection, native API routing, fallback invocation, cost guardrails, and context curation — so you just plug it in and your agents get smarter.


The Problem

Traditional sub-agent pattern Advisor Strategy
Large orchestrator decomposes work into tasks Small executor drives end-to-end
Expensive model runs every turn Expensive model consulted only when needed
Worker pools + orchestration overhead Zero orchestration — just a tool call
Hard to predict costs max_uses guardrail caps advisor spend

How It Works

flowchart TD
    A["Executor (Sonnet/Haiku)"] -->|"Runs every turn"| B{"Stuck on a\nhard decision?"}
    B -->|No| C["Continue executing\n(read, write, search, execute)"]
    C --> A
    B -->|Yes| D["Call advisor tool"]
    D --> E["Advisor (Opus)\nReviews shared context"]
    E -->|"Returns plan/correction/stop"| F["Executor resumes\nwith guidance"]
    F --> A

    style A fill:#fff,stroke:#333,color:#333
    style E fill:#c0392b,color:#fff
    style C fill:#2d6a4f,color:#fff
    style F fill:#2d6a4f,color:#fff

The middleware operates in two modes depending on the executor's provider:

  • Native mode (Anthropic executor): Injects the advisor_20260301 server-side tool spec. The API handles everything internally — zero extra round-trips, zero overhead on simple turns.
  • Fallback mode (any provider): Exposes an advisor tool backed by a direct LLM call to the advisor model. Works with any executor/advisor combination.

Quick Start

Install

pip install advisor-middleware

# Or from source
pip install git+https://github.com/emanueleielo/advisor-middleware.git

Minimal — zero config

from deepagents import create_deep_agent
from advisor_middleware import AdvisorMiddleware

mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    system_prompt="You are a senior software engineer.",
    backend=backend,
    middleware=[mw],
)

That's it. Sonnet executes, Opus advises. The middleware auto-detects Anthropic and uses the native API tool — no extra configuration needed.

Cross-provider

from advisor_middleware import AdvisorMiddleware, AdvisorConfig

mw = AdvisorMiddleware(
    config=AdvisorConfig(
        advisor_model="anthropic:claude-opus-4-6",
        prefer_native=False,  # force fallback mode
        max_uses_per_turn=2,
    ),
)

agent = create_deep_agent(
    model="openai:gpt-4o",  # any provider as executor
    middleware=[mw],
)

With compact-middleware

from advisor_middleware import AdvisorMiddleware
from compact_middleware import CompactionMiddleware, CompactionToolMiddleware

advisor_mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")
compact_mw = CompactionMiddleware(model="anthropic:claude-sonnet-4-6", backend=backend)
compact_tool_mw = CompactionToolMiddleware(compact_mw)

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    backend=backend,
    middleware=[advisor_mw, compact_mw, compact_tool_mw],
)

Configuration

AdvisorConfig

Parameter Type Default Description
advisor_model str | BaseChatModel "claude-opus-4-6" Advisor model ID or resolved instance
max_uses_per_turn int 3 Max advisor calls per agent turn
max_uses_per_session int | None None Lifetime cap (None = unlimited)
prefer_native bool True Use native advisor_20260301 when possible
max_tokens int 1024 Max tokens the advisor can generate per consultation
temperature float 1.0 Advisor temperature (fallback mode only)
advisor_system_prompt str | None None Override advisor prompt (fallback only)
context ContextCurationConfig (see below) Controls context forwarded to advisor

ContextCurationConfig

Parameter Type Default Description
include_system_prompt bool True Forward executor's system prompt
include_tool_results bool True Include tool results in context
max_context_messages int | None None Limit messages sent to advisor
max_context_chars int | None None Hard character budget for context

Cost control example

config = AdvisorConfig(
    max_uses_per_turn=2,        # max 2 consultations per turn
    max_uses_per_session=10,    # max 10 total in the session
    context=ContextCurationConfig(
        max_context_messages=10,  # only last 10 messages
        include_tool_results=False,  # skip bulky tool outputs
    ),
)

Native vs Fallback

Native (advisor_20260301) Fallback (LLM call)
When Anthropic executor + prefer_native=True Any other executor, or prefer_native=False
How Server-side tool spec injected into API call Direct LLM call to advisor model
Round-trips 0 extra (handled by API) 1 per consultation
Overhead Zero on simple turns Minimal (only when called)
Model freedom Anthropic advisor only Any model as advisor
Context curation Handled by API Configurable via ContextCurationConfig

The middleware auto-detects the executor's provider and routes accordingly. You can force fallback mode with prefer_native=False for full control over context curation.


Benchmark

We tested with a real debugging task: a 4-file async task queue system (connection pool + circuit breaker + rate limiter + retry logic) with interacting bugs that cause tasks to be silently dropped under load. The agent must read all files, diagnose cross-component interactions, and fix every bug.

python examples/benchmark.py

Results: Haiku solo vs Haiku + Opus Advisor

Haiku Solo Haiku + Opus Advisor
Tests passing 11/12 12/12
Turns 11 6
File writes 7 (3 rewrites) 3 (all correct first try)
Advisor calls 0 1
Duration 210.7s 90.3s

What happened: Haiku solo rewrote connection.py four times, going in circles trying to fix the semaphore leak. It never solved the circuit breaker recovery issue.

Haiku + Advisor consulted Opus once after reading all files. Opus confirmed the bug diagnosis, corrected a proposed fix, and flagged an issue Haiku missed. Haiku then wrote all three fixes correctly on the first attempt.

Why it works

The advisor doesn't help on simple tasks — Haiku handles routine reads, writes, and obvious fixes alone. The value shows on cross-file reasoning where Haiku gets stuck in trial-and-error loops:

  • Opus identified that a semaphore release was needed for each discarded connection, not just one
  • Opus correctly noted the circuit breaker race condition is a non-issue under asyncio's GIL (avoiding an unnecessary lock)
  • Opus flagged that rate-limit timeouts should requeue tasks without consuming retry attempts

One well-timed consultation eliminated multiple cycles of incorrect rewrites.

Anthropic's benchmarks

From the original blog post:

Config SWE-bench Multilingual Cost per task
Sonnet + Opus Advisor +2.7pp vs Sonnet solo -11.9%
Haiku + Opus Advisor (BrowseComp) 41.2% vs 19.7% solo 85% cheaper than Sonnet

Introspection

Track advisor usage programmatically:

mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")

# ... after agent execution ...

print(f"Total consultations: {mw.get_total_uses()}")
print(f"Total advisor tokens: {mw.get_total_advisor_tokens()}")

for event in mw.get_events():
    print(f"  Turn {event['turn']}: {event['strategy']}{event['advisor_tokens']} tokens")
    print(f"    Q: {event['question'][:80]}...")
    print(f"    A: {event['advice'][:80]}...")

Architecture

advisor_middleware/
├── __init__.py        # Public API: AdvisorMiddleware, AdvisorConfig, ...
├── middleware.py       # Core middleware — dual-mode wrap_model_call
├── config.py          # AdvisorConfig + ContextCurationConfig dataclasses
├── state.py           # AdvisorState + AdvisorEvent TypedDicts
├── prompts.py         # Executor + advisor system prompts
├── providers.py       # Provider detection, native spec, fallback invocation
└── py.typed           # PEP 561 type marker

Development

# Install with dev dependencies
pip install -e ".[dev,deepagents,anthropic]"

# Run tests
pytest

# Lint
ruff check advisor_middleware/

# Type check
mypy advisor_middleware/

License

MIT — Emanuele Ielo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

advisor_middleware-0.1.1.tar.gz (187.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

advisor_middleware-0.1.1-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file advisor_middleware-0.1.1.tar.gz.

File metadata

  • Download URL: advisor_middleware-0.1.1.tar.gz
  • Upload date:
  • Size: 187.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for advisor_middleware-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b419d99da6cb62c0cfcf618a3de2095e9915f43e15a99929aa43428a1777dbf
MD5 f509400b43e7e74339fa85013ee78862
BLAKE2b-256 c6b4078e8f82d1b6e86b6db37f900339ec9418f341f7f2a7e4df32b0dbeab34e

See more details on using hashes here.

File details

Details for the file advisor_middleware-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for advisor_middleware-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59c9cc7180e976db8c3e6fe08a55d496f87b4aba99c236472856b0aaabd662de
MD5 c1fc4f8cadd55b32e7b2ae8f8e8f92fb
BLAKE2b-256 9714d1a16981094426e612873737819fb153c3914641dfddcd172bee03cefef6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page