Drop-in open-source agent SDK. Multi-model, streaming, MCP, sub-agents.
Project description
any-agent-sdk
Claude Agent SDK for open-source models. Drop-in compatible with claude-agent-sdk — swap the import, keep your code — but the agent loop runs against Llama, Qwen, DeepSeek, Mixtral, Phi, Gemma, or anything you serve through Ollama, vLLM, llama.cpp, TGI, Together, Fireworks, Groq, or OpenRouter.
# Before
from claude_agent_sdk import query, ClaudeAgentOptions, tool
# After
from any_agent_sdk import query, ClaudeAgentOptions, tool
That's it. Every canonical Claude SDK example runs verbatim. The wire format underneath is OpenAI-compat or Ollama; the surface above is Anthropic-shaped.
Quick start
pip install any-agent-sdk
any-agent setup-local # installs Ollama if missing, pulls qwen2.5:1.5b, verifies
import asyncio
from any_agent_sdk import query, ClaudeAgentOptions, tool, AssistantMessage
@tool
async def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"{city}: 67°F"
async def main():
async for msg in query(
prompt="What's the weather in SF?",
options=ClaudeAgentOptions(
model="qwen2.5:1.5b", # routes to local Ollama automatically
tools=[get_weather],
max_turns=5,
),
):
if isinstance(msg, AssistantMessage):
for block in msg.content:
if hasattr(block, "text"):
print(block.text)
asyncio.run(main())
Same script against Together AI — change one line:
options = ClaudeAgentOptions(
model="Qwen/Qwen2.5-72B-Instruct-Turbo", # routes to Together automatically (uses $TOGETHER_API_KEY)
tools=[get_weather],
max_turns=5,
)
Same script against Fireworks, vLLM, llama.cpp, Groq — just change model. The backend URL is inferred from the model name shape; pass backend= explicitly to override.
Custom backend — point at any OpenAI-compatible server
Auto-routing covers the well-known providers from the model name. For everything else — your own vLLM on a private GPU box, LM Studio on a custom port, a corporate proxy, OpenRouter, Groq, an internal inference cluster — pass backend= explicitly. The URL wins over inference.
# Self-hosted vLLM on a private GPU box
options = ClaudeAgentOptions(
model="Qwen/Qwen2.5-72B-Instruct",
backend="https://gpu-box.internal:8000/v1",
api_key=os.environ["INTERNAL_KEY"],
tools=[get_weather],
)
# LM Studio on a non-standard port
options = ClaudeAgentOptions(
model="qwen2.5:7b",
backend="http://localhost:1234/v1",
tools=[get_weather],
)
# Groq (blazing fast llama / mixtral)
options = ClaudeAgentOptions(
model="llama-3.3-70b-versatile",
backend="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
# OpenRouter aggregator (200+ models behind one API)
options = ClaudeAgentOptions(
model="anthropic/claude-3.5-sonnet", # OpenRouter proxies even Anthropic
backend="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
Or set it once for the whole process via env:
export ANY_AGENT_BASE_URL=https://gpu-box.internal:8000/v1
export ANY_AGENT_API_KEY=...
python my_agent.py
Precedence: explicit backend= > $ANY_AGENT_BASE_URL > model-name inference > Ollama default.
Models — ranked, picked by where they run
Ranked by current OSS leaderboards (Arena Elo · GPQA · SWE-bench, May 2026). Pick the highest-ranked model that fits your hardware.
| # | Model | Runs | model= |
Notable |
|---|---|---|---|---|
| 1 | Kimi K2.6 | cloud | moonshotai/Kimi-K2.6-Instruct |
#1 open-weights GPQA (90.5%) |
| 2 | Qwen3 235B-A22B | cloud · 64 GB+ local | Qwen/Qwen3-235B-A22B-Instruct-Turbo |
Broadest benchmark leader · Apache 2.0 |
| 3 | GLM-5 | cloud | zai-org/GLM-5 |
Best Arena Elo among open (1451) |
| 4 | MiniMax M2.5 | cloud | minimaxai/MiniMax-M2.5 |
80.2% SWE-bench · ties Claude Opus 4.6 on code |
| 5 | DeepSeek-V3.2 | cloud · 80 GB+ local | deepseek-ai/DeepSeek-V3.2 |
Top general-purpose OSS |
| 6 | Llama 4 Maverick | cloud · 72 GB local | meta-llama/Llama-4-Maverick-17B-128E |
Meta's flagship 2025 MoE |
| 7 | gpt-oss-120b | cloud · 80 GB local | gpt-oss:120b |
OpenAI's open release · ~o4-mini class |
| 8 | DeepSeek-R1 | cloud · 48 GB+ local | deepseek-r1:70b / deepseek-ai/... |
Reasoning · emits <think> blocks |
| 9 | Llama 4 Scout | 24 GB local · cloud | llama4:scout |
10M context window · fits a 24 GB GPU |
| 10 | Hermes 4 70B | 48 GB local · cloud | hermes4:70b |
Nous — tool-use + reasoning tuned |
| 11 | DeepSeek-R1 32B | 24 GB local | deepseek-r1:32b |
Reasoning, fits a big-laptop GPU |
| 12 | Qwen3 32B | 24 GB local | qwen3:32b |
Strong general-purpose |
| 13 | Llama 3.3 70B | 48 GB local · cloud | llama3.3:70b |
Stable, well-supported |
| 14 | gpt-oss-20b | 16 GB local | gpt-oss:20b |
OpenAI open · runs on a laptop |
| 15 | Phi 4 medium | 16 GB local | phi4:medium |
MS — strong reasoning for size |
| 16 | Gemma 3 27B | 16 GB local | gemma3:27b |
Google's latest |
| 17 | Qwen3 14B / 8B | 8–12 GB local | qwen3:14b / qwen3:8b |
Mid-tier all-rounder |
| 18 | Llama 3.1 8B | 8 GB local | llama3.1:8b |
Mainstream baseline |
| 19 | Phi 4 small | 8 GB local | phi4:small |
Compact reasoning |
| 20 | DeepSeek-R1 8B/14B | 8–12 GB local | deepseek-r1:8b / :14b |
Reasoning on a mainstream laptop |
CPU-laptop tier (no GPU, ≤ 8 GB RAM) — any-agent setup-local picks from this list:
| # | Tag | Params | RAM | Tools | Reasoning | Notes |
|---|---|---|---|---|---|---|
| C1 | qwen2.5:1.5b |
1.5B | 4 GB | yes | no | Default — best 1.5B for agents |
| C2 | deepseek-r1:1.5b |
1.5B | 4 GB | yes | yes | Reasoning, emits <think> |
| C3 | llama3.2:3b |
3.2B | 6 GB | yes | no | Best 3B for 8 GB laptops |
| C4 | qwen2.5:3b |
3B | 6 GB | yes | no | Same class as Llama 3.2 3B |
| C5 | phi3.5:3.8b |
3.8B | 6 GB | yes | no | Punches above its weight |
| C6 | llama3.2:1b |
1.2B | 4 GB | yes | no | Sharper than 0.5B Qwen |
| C7 | qwen2.5:0.5b |
0.5B | 2 GB | yes | no | Smallest with tool calls |
| C8 | gemma2:2b |
2B | 4 GB | no | no | Chat only, polished prose |
| C9 | tinyllama:1.1b |
1.1B | 2 GB | no | no | RAM-constrained pick |
| C10 | smollm2:135m |
135M | 2 GB | no | no | Tiny — sanity-check install |
any-agent setup-local # one command — installs Ollama if missing, pulls C1, smoke tests
any-agent setup-local --list # see the catalog
any-agent setup-local --model qwen2.5:3b
How to actually call them
Auto-routing reads the model name shape (see any_agent_sdk/routing.py):
| Shape | Backend it routes to | Env to set |
|---|---|---|
name:tag (e.g. qwen3:8b) |
Ollama (http://localhost:11434) |
— |
org/repo (e.g. Qwen/Qwen3-235B-...) |
Together AI | TOGETHER_API_KEY |
accounts/fireworks/models/... |
Fireworks AI | FIREWORKS_API_KEY |
gpt-*, o1-*, o3-*, o4-* |
OpenAI native | OPENAI_API_KEY |
gemini-* |
Google Gen-Lang (OpenAI-compat) | GEMINI_API_KEY |
claude-* |
refused — use the real claude-agent-sdk |
— |
| anything else | Ollama default | — |
For Groq, Moonshot (Kimi native), DeepSeek native, OpenRouter, Cerebras, DeepInfra, Anyscale, LM Studio, self-hosted vLLM / llama.cpp / TGI — pass backend= explicitly or set ANY_AGENT_BASE_URL (see Custom backend above). The pattern is the same: it's an OpenAI-compatible URL plus an API key.
Why this exists
The Claude Agent SDK is the best-designed agent runtime in the open. Streaming tool dispatch, 28-event hook system, permission rules per source, MCP across four transports, sub-agents, sessions with fork/resume, auto-compaction — none of the OSS alternatives ship the whole set. LangGraph is too heavy and skips MCP. smolagents is too small. llama-stack is tightly scoped. The Anthropic and OpenAI agent SDKs are bound to their hosted APIs.
any-agent-sdk is the same surface, model-agnostic underneath. You write to Anthropic's design; you run it on whatever you can serve.
Plus the OSS-specific bits the hosted SDKs don't need to think about:
- Universal tool use — Path A (native via OpenAI-compat
tools[]) when supported; Path B (prompt-engineered<tool_call>XML) when not; Path C (grammar-constrained JSON) when the server can enforce it. Capability-table-driven, automatic per model. - Universal thinking — handles inline
<think>tags (R1, QwQ, Marco-o1, R1-Distill) and out-of-band thinking blocks. Zero cost when the model doesn't emit thinking. - Backend agnosticism — same agent code, one env var or one kwarg between Ollama at
localhost:11434and Fireworks atapi.fireworks.ai.
The acceptance test
v1.0 ships when this is true on a fresh machine:
pip install any-agent-sdk
any-agent setup-local
# ...10-line script with 2 tools + 5-turn agent task...
python my_agent.py # Just Works on the first try
Then the same script works against Together, Fireworks, vLLM, llama.cpp, Groq just by changing model. Today: DeepSeek-R1 1.5B on local Ollama runs six of Anthropic's own canonical examples verbatim. Suite at 202 tests. The acceptance test passes on Ollama; provider matrix expansion is the remaining work.
Roadmap
What's shipped — and what's still ahead. Check our progress.
Drop-in surface (Claude SDK parity)
-
query()yielding flat-shapeAssistantMessage/UserMessage/SystemMessage/ResultMessage -
ClaudeAgentOptionswith model, backend, tools, system_prompt, max_turns, max_tokens, temperature, hooks, can_use_tool, permissions, mcp_servers, plugins, agents, max_budget_usd, setting_sources, allowed_tools, disallowed_tools, cwd, session_id, persist, stderr -
ClaudeSDKClient— streaming async context manager -
@tooldecorator (Claude-shaped positional signature) -
AgentDefinitionfor sub-agents -
Plugin(tools=, system_prompt_addition=, hooks=)— merges at session start -
PermissionResultAllow(updated_input=...)rewriting tool args before dispatch -
PermissionResultDenysurfacing throughResultMessage.permission_denials -
HookMatcherfor 28 hook events (PreToolUse, PostToolUse, SessionStart, SessionEnd, Stop, ...) -
ToolPermissionContextpassed tocan_use_tool -
create_sdk_mcp_server(name, version, tools=) -
WebFetch/WebSearchbuilt-in tools (Exa-backed) -
CLIConnectionError,ClaudeSDKError -
ToolPermissionContext.signalfor cancellation (anyio.Event, fired byAgent.cancel()) -
setting_sourcesactually loading and persisting per source - Streaming-mode
client.query()with mid-stream tool dispatch
Backends
- Ollama (native API + auto-routing from tag form)
- OpenAI-compat (vLLM, Together, Fireworks, Groq, OpenRouter, Cerebras)
- llama.cpp (via
--jinja) - TGI (HuggingFace text-generation-inference)
- OpenAI native (
gpt-*,o1/o3/o4) - Gemini OpenAI-compat endpoint
- Mock provider for tests
- Auto-route from model name shape — no
backend=needed - Modal serverless adapter
- Anthropic via separate
anthropic_passthrough(for parity testing only)
Tool use
- Path A: native via OpenAI-compat
tools[] - Path B: prompt-engineered
<tool_call>XML (for Llama 2, Mistral 7B, older Qwens) - Path C: grammar-constrained JSON
- Capability-table-driven path selection (30+ models)
- Parallel tool dispatch
- Tool result threading
- Streaming tool dispatch (start tool execution mid-stream, not after
MessageStop)
Thinking / reasoning
- Inline
<think>blocks (DeepSeek-R1, QwQ, Marco-o1, R1-distill family) - Out-of-band thinking blocks (DeepSeek API)
-
ThinkingBlockinAssistantMessage.content
MCP
- In-process MCP server via
create_sdk_mcp_server - stdio transport
- sse transport
- http transport
- Elicitation (server prompts user mid-session)
- Sampling (server calls back into the agent's model)
Sessions + state
- JSONL transcript persistence
-
~/.any-agent/directory + per-session paths - Memory entries + index
-
<system-reminder>+isMetainjection - Auto-compaction at token threshold
- Session fork
- Session resume from arbitrary checkpoint
Budget
- Per-model pricing table
-
max_usdceiling →BudgetExceededError -
total_cost_usdonResultMessage -
modelUsageper-model breakdown -
max_turnsceiling
Local install
-
any-agent setup-local— installs Ollama if missing, pulls a CPU-friendly model, smoke tests - 12-entry CPU-friendly catalog (135M → 8B params)
- Auto-install of Ollama on Linux/macOS via official script
- Windows installer wrapper
- llama.cpp
setup-localalternative for users who prefer it (any-agent setup-local-llamacpp)
Examples (run verbatim against DeepSeek-R1 1.5B on local Ollama)
-
quickstart.py -
ollama_local.py -
with_thinking.py -
tools_option.py -
mcp_calculator.py -
system_prompt.py -
fireworks_hosted.pyruns against live Fireworks -
vllm_self_hosted.pyruns against live vLLM -
multi_agent_research.pyend-to-end with sub-agents
1.0 prerequisites
- Streaming tool dispatch rewrite
- Mid-stream cancellation via
ToolPermissionContext.signal - All 16 examples verified against ≥ 3 backends
- Docs site (mkdocs-material)
- PyPI 1.0 release with semver guarantee
Drop-in compatibility — what works today
from any_agent_sdk import (
# Core
query, ClaudeAgentOptions, ClaudeSDKClient,
# Messages (flat shape, matches claude_agent_sdk)
AssistantMessage, UserMessage, SystemMessage, ResultMessage,
TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock,
# Tools
tool, Tool, ToolRegistry, create_sdk_mcp_server,
# Permissions
PermissionResultAllow, PermissionResultDeny, ToolPermissionContext,
# Hooks
HookMatcher, HookInput, HookJSONOutput, HookContext,
# Sub-agents
AgentDefinition,
# Plugins
Plugin,
# Built-in tools
WebFetch, WebSearch,
# Errors
ClaudeSDKError, CLIConnectionError,
)
Every name in that import block has a working implementation backed by tests. ClaudeSDKClient is a streaming async context manager. Plugin(tools=..., system_prompt_addition=..., hooks=...) merges into the agent at session start. PermissionResultAllow(updated_input={...}) rewrites tool args before dispatch. ResultMessage.permission_denials carries every rejected call.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oss_agent_sdk-0.1.0.tar.gz.
File metadata
- Download URL: oss_agent_sdk-0.1.0.tar.gz
- Upload date:
- Size: 195.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ab421f4b9a50a8f984b6606c812d47971faf5a888bc26776a3f6092974c3b2f
|
|
| MD5 |
40d3cca64c8bbb84c1f05beae1111f18
|
|
| BLAKE2b-256 |
02923528bcf5e055ffa7d21b9ec1750bc723aaa87ac366b18d786d8058773cf7
|
File details
Details for the file oss_agent_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: oss_agent_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 243.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8933e1ef17d60e49920dbc0d6969b63e238f4d828eaee9fbda670b5dafd4e1b
|
|
| MD5 |
3ce6c72c58cbe3bed1821ba1ed9b05dd
|
|
| BLAKE2b-256 |
99ca0c094d3dcfe0fc2ffb79d0dfca1f40c00272d9ffa1d7550ed13d740aafd7
|