Anthropic's Advisor Strategy as a drop-in DeepAgents middleware — pair a powerful advisor with a fast executor
Project description
advisor-middleware
Anthropic's Advisor Strategy as a drop-in DeepAgents middleware.
Problem • How it works • Quick start • Configuration • Benchmark
Open-source implementation of Anthropic's Advisor Strategy — a pattern that pairs a fast, cheap executor model with a powerful advisor model. The executor runs end-to-end; the advisor is consulted only on critical decisions. Result: better performance at lower cost.
advisor-middleware makes this a single import for DeepAgents. It handles provider detection, native API routing, fallback invocation, cost guardrails, and context curation — so you just plug it in and your agents get smarter.
The Problem
| Traditional sub-agent pattern | Advisor Strategy |
|---|---|
| Large orchestrator decomposes work into tasks | Small executor drives end-to-end |
| Expensive model runs every turn | Expensive model consulted only when needed |
| Worker pools + orchestration overhead | Zero orchestration — just a tool call |
| Hard to predict costs | max_uses guardrail caps advisor spend |
"It makes better architectural decisions on complex tasks while adding no overhead on simple ones. The plans and trajectories are night and day different." — Eric Simmons, CEO and Founder
How It Works
flowchart TD
A["Executor (Sonnet/Haiku)"] -->|"Runs every turn"| B{"Stuck on a\nhard decision?"}
B -->|No| C["Continue executing\n(read, write, search, execute)"]
C --> A
B -->|Yes| D["Call advisor tool"]
D --> E["Advisor (Opus)\nReviews shared context"]
E -->|"Returns plan/correction/stop"| F["Executor resumes\nwith guidance"]
F --> A
style A fill:#fff,stroke:#333,color:#333
style E fill:#c0392b,color:#fff
style C fill:#2d6a4f,color:#fff
style F fill:#2d6a4f,color:#fff
The middleware operates in two modes depending on the executor's provider:
- Native mode (Anthropic executor): Injects the
advisor_20260301server-side tool spec. The API handles everything internally — zero extra round-trips, zero overhead on simple turns. - Fallback mode (any provider): Exposes an
advisortool backed by a direct LLM call to the advisor model. Works with any executor/advisor combination.
Quick Start
Install
pip install advisor-middleware
# Or from source
pip install git+https://github.com/emanueleielo/advisor-middleware.git
Minimal — zero config
from deepagents import create_deep_agent
from advisor_middleware import AdvisorMiddleware
mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
system_prompt="You are a senior software engineer.",
backend=backend,
middleware=[mw],
)
That's it. Sonnet executes, Opus advises. The middleware auto-detects Anthropic and uses the native API tool — no extra configuration needed.
Cross-provider
from advisor_middleware import AdvisorMiddleware, AdvisorConfig
mw = AdvisorMiddleware(
config=AdvisorConfig(
advisor_model="anthropic:claude-opus-4-6",
prefer_native=False, # force fallback mode
max_uses_per_turn=2,
),
)
agent = create_deep_agent(
model="openai:gpt-4o", # any provider as executor
middleware=[mw],
)
With compact-middleware
from advisor_middleware import AdvisorMiddleware
from compact_middleware import CompactionMiddleware, CompactionToolMiddleware
advisor_mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")
compact_mw = CompactionMiddleware(model="anthropic:claude-sonnet-4-6", backend=backend)
compact_tool_mw = CompactionToolMiddleware(compact_mw)
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
backend=backend,
middleware=[advisor_mw, compact_mw, compact_tool_mw],
)
Configuration
AdvisorConfig
| Parameter | Type | Default | Description |
|---|---|---|---|
advisor_model |
str | BaseChatModel |
"claude-opus-4-6" |
Advisor model ID or resolved instance |
max_uses_per_turn |
int |
3 |
Max advisor calls per agent turn |
max_uses_per_session |
int | None |
None |
Lifetime cap (None = unlimited) |
prefer_native |
bool |
True |
Use native advisor_20260301 when possible |
max_tokens |
int |
1024 |
Max tokens the advisor can generate per consultation |
temperature |
float |
1.0 |
Advisor temperature (fallback mode only) |
advisor_system_prompt |
str | None |
None |
Override advisor prompt (fallback only) |
context |
ContextCurationConfig |
(see below) | Controls context forwarded to advisor |
ContextCurationConfig
| Parameter | Type | Default | Description |
|---|---|---|---|
include_system_prompt |
bool |
True |
Forward executor's system prompt |
include_tool_results |
bool |
True |
Include tool results in context |
max_context_messages |
int | None |
None |
Limit messages sent to advisor |
max_context_chars |
int | None |
None |
Hard character budget for context |
Cost control example
config = AdvisorConfig(
max_uses_per_turn=2, # max 2 consultations per turn
max_uses_per_session=10, # max 10 total in the session
context=ContextCurationConfig(
max_context_messages=10, # only last 10 messages
include_tool_results=False, # skip bulky tool outputs
),
)
Native vs Fallback
Native (advisor_20260301) |
Fallback (LLM call) | |
|---|---|---|
| When | Anthropic executor + prefer_native=True |
Any other executor, or prefer_native=False |
| How | Server-side tool spec injected into API call | Direct LLM call to advisor model |
| Round-trips | 0 extra (handled by API) | 1 per consultation |
| Overhead | Zero on simple turns | Minimal (only when called) |
| Model freedom | Anthropic advisor only | Any model as advisor |
| Context curation | Handled by API | Configurable via ContextCurationConfig |
The middleware auto-detects the executor's provider and routes accordingly. You can force fallback mode with prefer_native=False for full control over context curation.
Benchmark
We tested with a real debugging task: a 4-file async task queue system (connection pool + circuit breaker + rate limiter + retry logic) with interacting bugs that cause tasks to be silently dropped under load. The agent must read all files, diagnose cross-component interactions, and fix every bug.
python examples/benchmark.py
Results: Haiku solo vs Haiku + Opus Advisor
| Haiku Solo | Haiku + Opus Advisor | |
|---|---|---|
| Tests passing | 11/12 | 12/12 |
| Turns | 11 | 6 |
| File writes | 7 (3 rewrites) | 3 (all correct first try) |
| Advisor calls | 0 | 1 |
| Duration | 210.7s | 90.3s |
What happened: Haiku solo rewrote connection.py four times, going in circles trying to fix the semaphore leak. It never solved the circuit breaker recovery issue.
Haiku + Advisor consulted Opus once after reading all files. Opus confirmed the bug diagnosis, corrected a proposed fix, and flagged an issue Haiku missed. Haiku then wrote all three fixes correctly on the first attempt.
Why it works
The advisor doesn't help on simple tasks — Haiku handles routine reads, writes, and obvious fixes alone. The value shows on cross-file reasoning where Haiku gets stuck in trial-and-error loops:
- Opus identified that a semaphore release was needed for each discarded connection, not just one
- Opus correctly noted the circuit breaker race condition is a non-issue under asyncio's GIL (avoiding an unnecessary lock)
- Opus flagged that rate-limit timeouts should requeue tasks without consuming retry attempts
One well-timed consultation eliminated multiple cycles of incorrect rewrites.
Anthropic's benchmarks
From the original blog post:
| Config | SWE-bench Multilingual | Cost per task |
|---|---|---|
| Sonnet + Opus Advisor | +2.7pp vs Sonnet solo | -11.9% |
| Haiku + Opus Advisor (BrowseComp) | 41.2% vs 19.7% solo | 85% cheaper than Sonnet |
Introspection
Track advisor usage programmatically:
mw = AdvisorMiddleware(advisor_model="claude-opus-4-6")
# ... after agent execution ...
print(f"Total consultations: {mw.get_total_uses()}")
print(f"Total advisor tokens: {mw.get_total_advisor_tokens()}")
for event in mw.get_events():
print(f" Turn {event['turn']}: {event['strategy']} — {event['advisor_tokens']} tokens")
print(f" Q: {event['question'][:80]}...")
print(f" A: {event['advice'][:80]}...")
Architecture
advisor_middleware/
├── __init__.py # Public API: AdvisorMiddleware, AdvisorConfig, ...
├── middleware.py # Core middleware — dual-mode wrap_model_call
├── config.py # AdvisorConfig + ContextCurationConfig dataclasses
├── state.py # AdvisorState + AdvisorEvent TypedDicts
├── prompts.py # Executor + advisor system prompts
├── providers.py # Provider detection, native spec, fallback invocation
└── py.typed # PEP 561 type marker
Development
# Install with dev dependencies
pip install -e ".[dev,deepagents,anthropic]"
# Run tests
pytest
# Lint
ruff check advisor_middleware/
# Type check
mypy advisor_middleware/
License
MIT — Emanuele Ielo
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file advisor_middleware-0.1.0.tar.gz.
File metadata
- Download URL: advisor_middleware-0.1.0.tar.gz
- Upload date:
- Size: 187.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec77cfb122b01aa977ed3560e53d919d83b5b4570aa7e90aa21ba16b8bfeafd2
|
|
| MD5 |
0757fe12193e21bea78400a3898cd0a7
|
|
| BLAKE2b-256 |
d724e0e36282fb58ee11a70078959d9a2a309f5f423b414840121d50527ac323
|
File details
Details for the file advisor_middleware-0.1.0-py3-none-any.whl.
File metadata
- Download URL: advisor_middleware-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3866d06717c66e85269c07a80e39f2e331f59be37c1acba98935b0902ac5475
|
|
| MD5 |
5da9dc53cbc659639e93e6780e740e56
|
|
| BLAKE2b-256 |
539cc2ae5f3f64f754de260b049d5a2fa91704be04c6cb4e53bec1db44689cdb
|