Drop-in open-source agent SDK. Multi-model, streaming, MCP, sub-agents.

These details have not been verified by PyPI

Project links

Project description

any-agent-sdk

Claude Agent SDK for open-source models. Drop-in compatible with claude-agent-sdk — swap the import, keep your code — but the agent loop runs against Llama, Qwen, DeepSeek, Mixtral, Phi, Gemma, or anything you serve through Ollama, vLLM, llama.cpp, TGI, Together, Fireworks, Groq, or OpenRouter.

# Before
from claude_agent_sdk import query, ClaudeAgentOptions, tool

# After
from any_agent_sdk import query, ClaudeAgentOptions, tool

That's it. Every canonical Claude SDK example runs verbatim. The wire format underneath is OpenAI-compat or Ollama; the surface above is Anthropic-shaped.

Quick start

pip install any-agent-sdk
any-agent setup-local         # installs Ollama if missing, pulls qwen2.5:1.5b, verifies

import asyncio
from any_agent_sdk import query, ClaudeAgentOptions, tool, AssistantMessage

@tool
async def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"{city}: 67°F"

async def main():
    async for msg in query(
        prompt="What's the weather in SF?",
        options=ClaudeAgentOptions(
            model="qwen2.5:1.5b",   # routes to local Ollama automatically
            tools=[get_weather],
            max_turns=5,
        ),
    ):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if hasattr(block, "text"):
                    print(block.text)

asyncio.run(main())

Same script against Together AI — change one line:

options = ClaudeAgentOptions(
    model="Qwen/Qwen2.5-72B-Instruct-Turbo",  # routes to Together automatically (uses $TOGETHER_API_KEY)
    tools=[get_weather],
    max_turns=5,
)

Same script against Fireworks, vLLM, llama.cpp, Groq — just change model. The backend URL is inferred from the model name shape; pass backend= explicitly to override.

Custom backend — point at any OpenAI-compatible server

Auto-routing covers the well-known providers from the model name. For everything else — your own vLLM on a private GPU box, LM Studio on a custom port, a corporate proxy, OpenRouter, Groq, an internal inference cluster — pass backend= explicitly. The URL wins over inference.

# Self-hosted vLLM on a private GPU box
options = ClaudeAgentOptions(
    model="Qwen/Qwen2.5-72B-Instruct",
    backend="https://gpu-box.internal:8000/v1",
    api_key=os.environ["INTERNAL_KEY"],
    tools=[get_weather],
)

# LM Studio on a non-standard port
options = ClaudeAgentOptions(
    model="qwen2.5:7b",
    backend="http://localhost:1234/v1",
    tools=[get_weather],
)

# Groq (blazing fast llama / mixtral)
options = ClaudeAgentOptions(
    model="llama-3.3-70b-versatile",
    backend="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],
)

# OpenRouter aggregator (200+ models behind one API)
options = ClaudeAgentOptions(
    model="anthropic/claude-3.5-sonnet",  # OpenRouter proxies even Anthropic
    backend="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

Or set it once for the whole process via env:

export ANY_AGENT_BASE_URL=https://gpu-box.internal:8000/v1
export ANY_AGENT_API_KEY=...
python my_agent.py

Precedence: explicit backend= > $ANY_AGENT_BASE_URL > model-name inference > Ollama default.

Models — ranked, picked by where they run

Ranked by current OSS leaderboards (Arena Elo · GPQA · SWE-bench, May 2026). Pick the highest-ranked model that fits your hardware.

#	Model	Runs	`model=`	Notable
1	Kimi K2.6	cloud	`moonshotai/Kimi-K2.6-Instruct`	#1 open-weights GPQA (90.5%)
2	Qwen3 235B-A22B	cloud · 64 GB+ local	`Qwen/Qwen3-235B-A22B-Instruct-Turbo`	Broadest benchmark leader · Apache 2.0
3	GLM-5	cloud	`zai-org/GLM-5`	Best Arena Elo among open (1451)
4	MiniMax M2.5	cloud	`minimaxai/MiniMax-M2.5`	80.2% SWE-bench · ties Claude Opus 4.6 on code
5	DeepSeek-V3.2	cloud · 80 GB+ local	`deepseek-ai/DeepSeek-V3.2`	Top general-purpose OSS
6	Llama 4 Maverick	cloud · 72 GB local	`meta-llama/Llama-4-Maverick-17B-128E`	Meta's flagship 2025 MoE
7	gpt-oss-120b	cloud · 80 GB local	`gpt-oss:120b`	OpenAI's open release · ~o4-mini class
8	DeepSeek-R1	cloud · 48 GB+ local	`deepseek-r1:70b` / `deepseek-ai/...`	Reasoning · emits `<think>` blocks
9	Llama 4 Scout	24 GB local · cloud	`llama4:scout`	10M context window · fits a 24 GB GPU
10	Hermes 4 70B	48 GB local · cloud	`hermes4:70b`	Nous — tool-use + reasoning tuned
11	DeepSeek-R1 32B	24 GB local	`deepseek-r1:32b`	Reasoning, fits a big-laptop GPU
12	Qwen3 32B	24 GB local	`qwen3:32b`	Strong general-purpose
13	Llama 3.3 70B	48 GB local · cloud	`llama3.3:70b`	Stable, well-supported
14	gpt-oss-20b	16 GB local	`gpt-oss:20b`	OpenAI open · runs on a laptop
15	Phi 4 medium	16 GB local	`phi4:medium`	MS — strong reasoning for size
16	Gemma 3 27B	16 GB local	`gemma3:27b`	Google's latest
17	Qwen3 14B / 8B	8–12 GB local	`qwen3:14b` / `qwen3:8b`	Mid-tier all-rounder
18	Llama 3.1 8B	8 GB local	`llama3.1:8b`	Mainstream baseline
19	Phi 4 small	8 GB local	`phi4:small`	Compact reasoning
20	DeepSeek-R1 8B/14B	8–12 GB local	`deepseek-r1:8b` / `:14b`	Reasoning on a mainstream laptop

CPU-laptop tier (no GPU, ≤ 8 GB RAM) — any-agent setup-local picks from this list:

#	Tag	Params	RAM	Tools	Reasoning	Notes
C1	`qwen2.5:1.5b`	1.5B	4 GB	yes	no	Default — best 1.5B for agents
C2	`deepseek-r1:1.5b`	1.5B	4 GB	yes	yes	Reasoning, emits `<think>`
C3	`llama3.2:3b`	3.2B	6 GB	yes	no	Best 3B for 8 GB laptops
C4	`qwen2.5:3b`	3B	6 GB	yes	no	Same class as Llama 3.2 3B
C5	`phi3.5:3.8b`	3.8B	6 GB	yes	no	Punches above its weight
C6	`llama3.2:1b`	1.2B	4 GB	yes	no	Sharper than 0.5B Qwen
C7	`qwen2.5:0.5b`	0.5B	2 GB	yes	no	Smallest with tool calls
C8	`gemma2:2b`	2B	4 GB	no	no	Chat only, polished prose
C9	`tinyllama:1.1b`	1.1B	2 GB	no	no	RAM-constrained pick
C10	`smollm2:135m`	135M	2 GB	no	no	Tiny — sanity-check install

any-agent setup-local           # one command — installs Ollama if missing, pulls C1, smoke tests
any-agent setup-local --list    # see the catalog
any-agent setup-local --model qwen2.5:3b

How to actually call them

Auto-routing reads the model name shape (see any_agent_sdk/routing.py):

Shape	Backend it routes to	Env to set
`name:tag` (e.g. `qwen3:8b`)	Ollama (`http://localhost:11434`)	—
`org/repo` (e.g. `Qwen/Qwen3-235B-...`)	Together AI	`TOGETHER_API_KEY`
`accounts/fireworks/models/...`	Fireworks AI	`FIREWORKS_API_KEY`
`gpt-`, `o1-`, `o3-`, `o4-`	OpenAI native	`OPENAI_API_KEY`
`gemini-*`	Google Gen-Lang (OpenAI-compat)	`GEMINI_API_KEY`
`claude-*`	refused — use the real `claude-agent-sdk`	—
anything else	Ollama default	—

For Groq, Moonshot (Kimi native), DeepSeek native, OpenRouter, Cerebras, DeepInfra, Anyscale, LM Studio, self-hosted vLLM / llama.cpp / TGI — pass backend= explicitly or set ANY_AGENT_BASE_URL (see Custom backend above). The pattern is the same: it's an OpenAI-compatible URL plus an API key.

Why this exists

The Claude Agent SDK is the best-designed agent runtime in the open. Streaming tool dispatch, 28-event hook system, permission rules per source, MCP across four transports, sub-agents, sessions with fork/resume, auto-compaction — none of the OSS alternatives ship the whole set. LangGraph is too heavy and skips MCP. smolagents is too small. llama-stack is tightly scoped. The Anthropic and OpenAI agent SDKs are bound to their hosted APIs.

any-agent-sdk is the same surface, model-agnostic underneath. You write to Anthropic's design; you run it on whatever you can serve.

Plus the OSS-specific bits the hosted SDKs don't need to think about:

Universal tool use — Path A (native via OpenAI-compat tools[]) when supported; Path B (prompt-engineered <tool_call> XML) when not; Path C (grammar-constrained JSON) when the server can enforce it. Capability-table-driven, automatic per model.
Universal thinking — handles inline <think> tags (R1, QwQ, Marco-o1, R1-Distill) and out-of-band thinking blocks. Zero cost when the model doesn't emit thinking.
Backend agnosticism — same agent code, one env var or one kwarg between Ollama at localhost:11434 and Fireworks at api.fireworks.ai.

The acceptance test

v1.0 ships when this is true on a fresh machine:

pip install any-agent-sdk
any-agent setup-local
# ...10-line script with 2 tools + 5-turn agent task...
python my_agent.py   # Just Works on the first try

Then the same script works against Together, Fireworks, vLLM, llama.cpp, Groq just by changing model. Today: DeepSeek-R1 1.5B on local Ollama runs six of Anthropic's own canonical examples verbatim. Suite at 202 tests. The acceptance test passes on Ollama; provider matrix expansion is the remaining work.

Roadmap

What's shipped — and what's still ahead. Check our progress.

Drop-in surface (Claude SDK parity)

query() yielding flat-shape AssistantMessage / UserMessage / SystemMessage / ResultMessage
ClaudeAgentOptions with model, backend, tools, system_prompt, max_turns, max_tokens, temperature, hooks, can_use_tool, permissions, mcp_servers, plugins, agents, max_budget_usd, setting_sources, allowed_tools, disallowed_tools, cwd, session_id, persist, stderr
ClaudeSDKClient — streaming async context manager
@tool decorator (Claude-shaped positional signature)
AgentDefinition for sub-agents
Plugin(tools=, system_prompt_addition=, hooks=) — merges at session start
PermissionResultAllow(updated_input=...) rewriting tool args before dispatch
PermissionResultDeny surfacing through ResultMessage.permission_denials
HookMatcher for 28 hook events (PreToolUse, PostToolUse, SessionStart, SessionEnd, Stop, ...)
ToolPermissionContext passed to can_use_tool
create_sdk_mcp_server(name, version, tools=)
WebFetch / WebSearch built-in tools (Exa-backed)
CLIConnectionError, ClaudeSDKError
ToolPermissionContext.signal for cancellation (anyio.Event, fired by Agent.cancel())
setting_sources actually loading and persisting per source
Streaming-mode client.query() with mid-stream tool dispatch

Backends

Ollama (native API + auto-routing from tag form)
OpenAI-compat (vLLM, Together, Fireworks, Groq, OpenRouter, Cerebras)
llama.cpp (via --jinja)
TGI (HuggingFace text-generation-inference)
OpenAI native (gpt-*, o1/o3/o4)
Gemini OpenAI-compat endpoint
Mock provider for tests
Auto-route from model name shape — no backend= needed
Modal serverless adapter
Anthropic via separate anthropic_passthrough (for parity testing only)

Tool use

Path A: native via OpenAI-compat tools[]
Path B: prompt-engineered <tool_call> XML (for Llama 2, Mistral 7B, older Qwens)
Path C: grammar-constrained JSON
Capability-table-driven path selection (30+ models)
Parallel tool dispatch
Tool result threading
Streaming tool dispatch (start tool execution mid-stream, not after MessageStop)

Thinking / reasoning

Inline <think> blocks (DeepSeek-R1, QwQ, Marco-o1, R1-distill family)
Out-of-band thinking blocks (DeepSeek API)
ThinkingBlock in AssistantMessage.content

MCP

In-process MCP server via create_sdk_mcp_server
stdio transport
sse transport
http transport
Elicitation (server prompts user mid-session)
Sampling (server calls back into the agent's model)

Sessions + state

JSONL transcript persistence
~/.any-agent/ directory + per-session paths
Memory entries + index
<system-reminder> + isMeta injection
Auto-compaction at token threshold
Session fork
Session resume from arbitrary checkpoint

Budget

Per-model pricing table
max_usd ceiling → BudgetExceededError
total_cost_usd on ResultMessage
modelUsage per-model breakdown
max_turns ceiling

Local install

any-agent setup-local — installs Ollama if missing, pulls a CPU-friendly model, smoke tests
12-entry CPU-friendly catalog (135M → 8B params)
Auto-install of Ollama on Linux/macOS via official script
Windows installer wrapper
llama.cpp setup-local alternative for users who prefer it (any-agent setup-local-llamacpp)

Examples (run verbatim against DeepSeek-R1 1.5B on local Ollama)

quickstart.py
ollama_local.py
with_thinking.py
tools_option.py
mcp_calculator.py
system_prompt.py
fireworks_hosted.py runs against live Fireworks
vllm_self_hosted.py runs against live vLLM
multi_agent_research.py end-to-end with sub-agents

1.0 prerequisites

Streaming tool dispatch rewrite
Mid-stream cancellation via ToolPermissionContext.signal
All 16 examples verified against ≥ 3 backends
Docs site (mkdocs-material)
PyPI 1.0 release with semver guarantee

Drop-in compatibility — what works today

from any_agent_sdk import (
    # Core
    query, ClaudeAgentOptions, ClaudeSDKClient,

    # Messages (flat shape, matches claude_agent_sdk)
    AssistantMessage, UserMessage, SystemMessage, ResultMessage,
    TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock,

    # Tools
    tool, Tool, ToolRegistry, create_sdk_mcp_server,

    # Permissions
    PermissionResultAllow, PermissionResultDeny, ToolPermissionContext,

    # Hooks
    HookMatcher, HookInput, HookJSONOutput, HookContext,

    # Sub-agents
    AgentDefinition,

    # Plugins
    Plugin,

    # Built-in tools
    WebFetch, WebSearch,

    # Errors
    ClaudeSDKError, CLIConnectionError,
)

Every name in that import block has a working implementation backed by tests. ClaudeSDKClient is a streaming async context manager. Plugin(tools=..., system_prompt_addition=..., hooks=...) merges into the agent at session start. PermissionResultAllow(updated_input={...}) rewrites tool args before dispatch. ResultMessage.permission_denials carries every rejected call.

License

Apache-2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oss_agent_sdk-0.1.0.tar.gz (195.5 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oss_agent_sdk-0.1.0-py3-none-any.whl (243.7 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file oss_agent_sdk-0.1.0.tar.gz.

File metadata

Download URL: oss_agent_sdk-0.1.0.tar.gz
Upload date: May 16, 2026
Size: 195.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oss_agent_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0ab421f4b9a50a8f984b6606c812d47971faf5a888bc26776a3f6092974c3b2f`
MD5	`40d3cca64c8bbb84c1f05beae1111f18`
BLAKE2b-256	`02923528bcf5e055ffa7d21b9ec1750bc723aaa87ac366b18d786d8058773cf7`

See more details on using hashes here.

File details

Details for the file oss_agent_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: oss_agent_sdk-0.1.0-py3-none-any.whl
Upload date: May 16, 2026
Size: 243.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oss_agent_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8933e1ef17d60e49920dbc0d6969b63e238f4d828eaee9fbda670b5dafd4e1b`
MD5	`3ce6c72c58cbe3bed1821ba1ed9b05dd`
BLAKE2b-256	`99ca0c094d3dcfe0fc2ffb79d0dfca1f40c00272d9ffa1d7550ed13d740aafd7`

See more details on using hashes here.

oss-agent-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

any-agent-sdk

Quick start

Custom backend — point at any OpenAI-compatible server

Models — ranked, picked by where they run

How to actually call them

Why this exists

The acceptance test

Roadmap

Drop-in compatibility — what works today

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes