Skip to main content

Drop-in open-source agent SDK. Multi-model, streaming, MCP, sub-agents.

Project description

mantis-agent-sdk

Claude Agent SDK for open-source models. Drop-in compatible with claude-agent-sdk — swap the import, keep your code — but the agent loop runs against Llama, Qwen, DeepSeek, Mixtral, Phi, Gemma, or anything you serve through Ollama, vLLM, llama.cpp, TGI, Together, Fireworks, Groq, or OpenRouter.

# Before
from claude_agent_sdk import query, ClaudeAgentOptions, tool

# After
from mantis_agent import query, ClaudeAgentOptions, tool

That's it. Every canonical Claude SDK example runs verbatim. The wire format underneath is OpenAI-compat or Ollama; the surface above is Anthropic-shaped.

There are two ways to use it: the mantis terminal (a Claude-Code-style agent TUI you run in any directory) and the Python library (the rest of this README).


The mantis terminal

A full-screen, Claude-Code-style agent terminal for local and hosted models — run it in any directory and chat with an agent that can read, write, edit, grep, and run shell commands on your machine.

pip install 'mantis-agent-sdk[cli]'   # the [cli] extra adds the rich terminal
mantis                                 # launch the agent terminal
            ▄▀▄▀
           ▄█▀                Mantis Code v1.3.0
        ▄██▀▀█▀               qwen2.5-7b-instruct  ·  Ollama (local)
    ▄█ ▄███▀▀                 ~/Documents/code/your-project
 ▄▄██▀▀██▀▀▀▀▀
 ▀▀ █  █▀ ▀▄
 ▄▄▀  ▄▀   ▀▄

› build me a fastapi todo app

⚒ Write app/main.py
  └ wrote 612 bytes (28 lines) to app/main.py
       1 + from fastapi import FastAPI
       2 + app = FastAPI()
       …

● Done — run it with `uvicorn app.main:app --reload`.

What you get:

  • Input pinned to the bottom, always visible — even while the agent is working. The conversation scrolls above it (full-screen mode). Set MANTIS_CLASSIC=1 for a plain scrolling REPL.
  • Markdown replies — syntax-highlighted code blocks, lists, tables, inline code.
  • Real edit diffsedit_file/write_file render as line-numbered green/red diffs.
  • Friendly tool calls⚒ Read foo.py, ⚒ Run <cmd>, ⚒ Edit foo.py with the result hugged underneath.
  • Animated thinking spinner with a live timer (✻ Undulating… (3s)).
  • Clipboard paste (Ctrl+V) — paste a copied image or file path straight into the prompt as an attachment.
  • Slash commands/model <id> to switch models live, /models to browse the local + hosted + self-host catalog, /clear, /cwd, /help, /exit.
  • Keys — Enter sends · Esc/Ctrl+C interrupts a running reply (Ctrl+C also quits when idle) · Ctrl+D quits · shift+tab cycles permission mode.

Configuration (same env vars the library uses):

Env var Meaning
MANTIS_AGENT_MODEL default model slug (else qwen2.5-7b-instruct)
MANTIS_AGENT_BASE_URL default backend (else Ollama at localhost:11434)
MANTIS_AGENT_API_KEY API key for hosted backends
MANTIS_CLASSIC=1 force the classic scrolling REPL instead of full-screen
mantis --model qwen2.5:7b                       # pick a model
MANTIS_AGENT_BASE_URL=https://gpu-box:8000/v1 mantis --model my-model   # your own server

There's also a tiny stdlib-only diagnostics CLI, mantis-agent (probe, list-models, run, chat, setup-local), with no extra dependencies — handy for smoke-testing a backend.


Quick start

pip install mantis-agent-sdk
mantis-agent setup-local         # installs Ollama if missing, pulls qwen2.5:1.5b, verifies
import asyncio
from mantis_agent import query, ClaudeAgentOptions, tool, AssistantMessage

@tool
async def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"{city}: 67°F"

async def main():
    async for msg in query(
        prompt="What's the weather in SF?",
        options=ClaudeAgentOptions(
            model="qwen2.5:1.5b",   # routes to local Ollama automatically
            tools=[get_weather],
            max_turns=5,
        ),
    ):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if hasattr(block, "text"):
                    print(block.text)

asyncio.run(main())

Same script against Together AI — change one line:

options = ClaudeAgentOptions(
    model="Qwen/Qwen2.5-72B-Instruct-Turbo",  # routes to Together automatically (uses $TOGETHER_API_KEY)
    tools=[get_weather],
    max_turns=5,
)

Same script against Fireworks, vLLM, llama.cpp, Groq — just change model. The backend URL is inferred from the model name shape; pass backend= explicitly to override.


Custom backend — point at any OpenAI-compatible server

Auto-routing covers the well-known providers from the model name. For everything else — your own vLLM on a private GPU box, LM Studio on a custom port, a corporate proxy, OpenRouter, Groq, an internal inference cluster — pass backend= explicitly. The URL wins over inference.

# Self-hosted vLLM on a private GPU box
options = ClaudeAgentOptions(
    model="Qwen/Qwen2.5-72B-Instruct",
    backend="https://gpu-box.internal:8000/v1",
    api_key=os.environ["INTERNAL_KEY"],
    tools=[get_weather],
)

# LM Studio on a non-standard port
options = ClaudeAgentOptions(
    model="qwen2.5:7b",
    backend="http://localhost:1234/v1",
    tools=[get_weather],
)

# Groq (blazing fast llama / mixtral)
options = ClaudeAgentOptions(
    model="llama-3.3-70b-versatile",
    backend="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],
)

# OpenRouter aggregator (200+ models behind one API)
options = ClaudeAgentOptions(
    model="anthropic/claude-3.5-sonnet",  # OpenRouter proxies even Anthropic
    backend="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

Or set it once for the whole process via env:

export MANTIS_AGENT_BASE_URL=https://gpu-box.internal:8000/v1
export MANTIS_AGENT_API_KEY=...
python my_agent.py

Precedence: explicit backend= > $MANTIS_AGENT_BASE_URL > model-name inference > Ollama default.


Models — ranked, picked by where they run

Ranked by current OSS leaderboards (Arena Elo · GPQA · SWE-bench, May 2026). Pick the highest-ranked model that fits your hardware.

# Model Runs model= Notable
1 Kimi K2.6 cloud moonshotai/Kimi-K2.6-Instruct #1 open-weights GPQA (90.5%)
2 Qwen3 235B-A22B cloud · 64 GB+ local Qwen/Qwen3-235B-A22B-Instruct-Turbo Broadest benchmark leader · Apache 2.0
3 GLM-5 cloud zai-org/GLM-5 Best Arena Elo among open (1451)
4 MiniMax M2.5 cloud minimaxai/MiniMax-M2.5 80.2% SWE-bench · ties Claude Opus 4.6 on code
5 DeepSeek-V3.2 cloud · 80 GB+ local deepseek-ai/DeepSeek-V3.2 Top general-purpose OSS
6 Llama 4 Maverick cloud · 72 GB local meta-llama/Llama-4-Maverick-17B-128E Meta's flagship 2025 MoE
7 gpt-oss-120b cloud · 80 GB local gpt-oss:120b OpenAI's open release · ~o4-mini class
8 DeepSeek-R1 cloud · 48 GB+ local deepseek-r1:70b / deepseek-ai/... Reasoning · emits <think> blocks
9 Llama 4 Scout 24 GB local · cloud llama4:scout 10M context window · fits a 24 GB GPU
10 Hermes 4 70B 48 GB local · cloud hermes4:70b Nous — tool-use + reasoning tuned
11 DeepSeek-R1 32B 24 GB local deepseek-r1:32b Reasoning, fits a big-laptop GPU
12 Qwen3 32B 24 GB local qwen3:32b Strong general-purpose
13 Llama 3.3 70B 48 GB local · cloud llama3.3:70b Stable, well-supported
14 gpt-oss-20b 16 GB local gpt-oss:20b OpenAI open · runs on a laptop
15 Phi 4 medium 16 GB local phi4:medium MS — strong reasoning for size
16 Gemma 3 27B 16 GB local gemma3:27b Google's latest
17 Qwen3 14B / 8B 8–12 GB local qwen3:14b / qwen3:8b Mid-tier all-rounder
18 Llama 3.1 8B 8 GB local llama3.1:8b Mainstream baseline
19 Phi 4 small 8 GB local phi4:small Compact reasoning
20 DeepSeek-R1 8B/14B 8–12 GB local deepseek-r1:8b / :14b Reasoning on a mainstream laptop

CPU-laptop tier (no GPU, ≤ 8 GB RAM) — mantis-agent setup-local picks from this list:

# Tag Params RAM Tools Reasoning Notes
C1 qwen2.5:1.5b 1.5B 4 GB yes no Default — best 1.5B for agents
C2 deepseek-r1:1.5b 1.5B 4 GB yes yes Reasoning, emits <think>
C3 llama3.2:3b 3.2B 6 GB yes no Best 3B for 8 GB laptops
C4 qwen2.5:3b 3B 6 GB yes no Same class as Llama 3.2 3B
C5 phi3.5:3.8b 3.8B 6 GB yes no Punches above its weight
C6 llama3.2:1b 1.2B 4 GB yes no Sharper than 0.5B Qwen
C7 qwen2.5:0.5b 0.5B 2 GB yes no Smallest with tool calls
C8 gemma2:2b 2B 4 GB no no Chat only, polished prose
C9 tinyllama:1.1b 1.1B 2 GB no no RAM-constrained pick
C10 smollm2:135m 135M 2 GB no no Tiny — sanity-check install
mantis-agent setup-local           # one command — installs Ollama if missing, pulls C1, smoke tests
mantis-agent setup-local --list    # see the catalog
mantis-agent setup-local --model qwen2.5:3b

How to actually call them

Auto-routing reads the model name shape (see mantis_agent/routing.py):

Shape Backend it routes to Env to set
name:tag (e.g. qwen3:8b) Ollama (http://localhost:11434)
org/repo (e.g. Qwen/Qwen3-235B-...) Together AI TOGETHER_API_KEY
accounts/fireworks/models/... Fireworks AI FIREWORKS_API_KEY
gpt-*, o1-*, o3-*, o4-* OpenAI native OPENAI_API_KEY
gemini-* Google Gen-Lang (OpenAI-compat) GEMINI_API_KEY
claude-* refused — use the real claude-agent-sdk
anything else Ollama default

For Groq, Moonshot (Kimi native), DeepSeek native, OpenRouter, Cerebras, DeepInfra, Anyscale, LM Studio, self-hosted vLLM / llama.cpp / TGI — pass backend= explicitly or set MANTIS_AGENT_BASE_URL (see Custom backend above). The pattern is the same: it's an OpenAI-compatible URL plus an API key.


Why this exists

The Claude Agent SDK is the best-designed agent runtime in the open. Streaming tool dispatch, 28-event hook system, permission rules per source, MCP across four transports, sub-agents, sessions with fork/resume, auto-compaction — none of the OSS alternatives ship the whole set. LangGraph is too heavy and skips MCP. smolagents is too small. llama-stack is tightly scoped. The Anthropic and OpenAI agent SDKs are bound to their hosted APIs.

mantis-agent-sdk is the same surface, model-agnostic underneath. You write to Anthropic's design; you run it on whatever you can serve.

Plus the OSS-specific bits the hosted SDKs don't need to think about:

  • Universal tool use — Path A (native via OpenAI-compat tools[]) when supported; Path B (prompt-engineered <tool_call> XML) when not; Path C (grammar-constrained JSON) when the server can enforce it. Capability-table-driven, automatic per model.
  • Universal thinking — handles inline <think> tags (R1, QwQ, Marco-o1, R1-Distill) and out-of-band thinking blocks. Zero cost when the model doesn't emit thinking.
  • Backend agnosticism — same agent code, one env var or one kwarg between Ollama at localhost:11434 and Fireworks at api.fireworks.ai.
  • Tracing built inAgent(tracer=InMemoryTracer()) gives you a full span tree of every run (agent.runagent.turnllm.call + tool.call), with token / cost totals on the root span and tool.call spans that record input KEYS but never values. Swap in OTelTracer() to ship the same spans to Datadog / Honeycomb / Tempo / Jaeger with zero extra code. Anthropic's official SDK requires you to wire OpenTelemetry yourself; we ship it.

Observability

from mantis_agent import Agent, InMemoryTracer, UserMessage, TextBlock

tracer = InMemoryTracer()
agent = Agent(model="claude-sonnet-4.5", tools=[...], tracer=tracer)
await agent.run([UserMessage(content=[TextBlock(text="...")])])

# Flat list of every finished span, in end-time order.
for sp in tracer.spans:
    print(sp.name, sp.duration_ms, sp.attributes)

# Or the forest, with parent/child links restored.
import json; print(json.dumps(tracer.tree(), indent=2, default=str))

# Or per-span-name aggregates + run totals (turns / tokens / cost_usd).
print(tracer.summary())

# Or ship the trace to disk for offline analysis.
tracer.write_jsonl("trace.jsonl")

To push the same spans into an existing OpenTelemetry pipeline:

from mantis_agent import OTelTracer
tracer = OTelTracer(service_name="my-agent")          # requires opentelemetry-api
agent  = Agent(model="claude-sonnet-4.5", tracer=tracer)

OTelTracer uses your already-configured TracerProvider — point it at Datadog, Honeycomb, Tempo, Jaeger, or anything else that speaks OTLP. We don't ship an exporter; we ship spans that fit your existing one. Spans carry the same attributes whether you use InMemoryTracer or OTelTracer, so dashboards built against one work against both.

Privacy by default. Tool spans carry the sorted list of input keys but never input values — agent traces routinely get shipped to third-party SaaS and showed up in screenshots and tickets, so we made the safe choice the only choice. If you need values too, build your own Tracer impl in ~30 lines.

Live example you can run with no API key:

python -m mantis_agent.examples.with_tracing

The acceptance test

v1.0 ships when this is true on a fresh machine:

pip install mantis-agent-sdk
mantis-agent setup-local
# ...10-line script with 2 tools + 5-turn agent task...
python my_agent.py   # Just Works on the first try

Then the same script works against Together, Fireworks, vLLM, llama.cpp, Groq just by changing model. Today: DeepSeek-R1 1.5B on local Ollama runs six of Anthropic's own canonical examples verbatim. Suite at 202 tests. The acceptance test passes on Ollama; provider matrix expansion is the remaining work.


Roadmap

What's shipped — and what's still ahead. Check our progress.

Drop-in surface (Claude SDK parity)

  • query() yielding flat-shape AssistantMessage / UserMessage / SystemMessage / ResultMessage
  • ClaudeAgentOptions with model, backend, tools, system_prompt, max_turns, max_tokens, temperature, hooks, can_use_tool, permissions, mcp_servers, plugins, agents, max_budget_usd, setting_sources, allowed_tools, disallowed_tools, cwd, session_id, persist, stderr
  • ClaudeSDKClient — streaming async context manager
  • @tool decorator (Claude-shaped positional signature)
  • AgentDefinition for sub-agents
  • Plugin(tools=, system_prompt_addition=, hooks=) — merges at session start
  • PermissionResultAllow(updated_input=...) rewriting tool args before dispatch
  • PermissionResultDeny surfacing through ResultMessage.permission_denials
  • HookMatcher for 28 hook events (PreToolUse, PostToolUse, SessionStart, SessionEnd, Stop, ...)
  • ToolPermissionContext passed to can_use_tool
  • create_sdk_mcp_server(name, version, tools=)
  • WebFetch / WebSearch built-in tools (Exa-backed)
  • CLIConnectionError, ClaudeSDKError
  • ToolPermissionContext.signal for cancellation (anyio.Event, fired by Agent.cancel())
  • setting_sources actually loading and persisting per source
  • Streaming-mode client.query() with mid-stream tool dispatch

Backends

  • Ollama (native API + auto-routing from tag form)
  • OpenAI-compat (vLLM, Together, Fireworks, Groq, OpenRouter, Cerebras)
  • llama.cpp (via --jinja)
  • TGI (HuggingFace text-generation-inference)
  • OpenAI native (gpt-*, o1/o3/o4)
  • Gemini OpenAI-compat endpoint
  • Mock provider for tests
  • Auto-route from model name shape — no backend= needed
  • Modal serverless adapter
  • Anthropic via separate anthropic_passthrough (for parity testing only)

Tool use

  • Path A: native via OpenAI-compat tools[]
  • Path B: prompt-engineered <tool_call> XML (for Llama 2, Mistral 7B, older Qwens)
  • Path C: grammar-constrained JSON
  • Capability-table-driven path selection (30+ models)
  • Parallel tool dispatch
  • Tool result threading
  • Streaming tool dispatch (start tool execution mid-stream, not after MessageStop)

Thinking / reasoning

  • Inline <think> blocks (DeepSeek-R1, QwQ, Marco-o1, R1-distill family)
  • Out-of-band thinking blocks (DeepSeek API)
  • ThinkingBlock in AssistantMessage.content

MCP

  • In-process MCP server via create_sdk_mcp_server
  • stdio transport
  • sse transport
  • http transport
  • Elicitation (server prompts user mid-session)
  • Sampling (server calls back into the agent's model)

Sessions + state

  • JSONL transcript persistence
  • ~/.mantis-agent/ directory + per-session paths
  • Memory entries + index
  • <system-reminder> + isMeta injection
  • Auto-compaction at token threshold
  • Session fork
  • Session resume from arbitrary checkpoint

Structured output

  • response_format={"type": "json_object"} — free-form JSON mode
  • response_format={"type": "json_schema", "json_schema": {...}} — schema-constrained
  • Per-backend translation (OpenAI envelope / Ollama format / TGI grammar)
  • Loud rejection on backends without support (anthropic_passthrough)

Budget

  • Per-model pricing table
  • max_usd ceiling → BudgetExceededError
  • total_cost_usd on ResultMessage
  • modelUsage per-model breakdown
  • max_turns ceiling

Local install

  • mantis-agent setup-local — installs Ollama if missing, pulls a CPU-friendly model, smoke tests
  • 12-entry CPU-friendly catalog (135M → 8B params)
  • Auto-install of Ollama on Linux/macOS via official script
  • Windows installer wrapper
  • llama.cpp setup-local alternative for users who prefer it (mantis-agent setup-local-llamacpp)

Examples (run verbatim against DeepSeek-R1 1.5B on local Ollama)

  • quickstart.py
  • ollama_local.py
  • with_thinking.py
  • tools_option.py
  • mcp_calculator.py
  • system_prompt.py
  • fireworks_hosted.py runs against live Fireworks
  • vllm_self_hosted.py runs against live vLLM (+ MANTIS_AGENT_MOCK=1 offline mode)
  • multi_agent_research.py end-to-end with sub-agents

1.0 prerequisites

  • Streaming tool dispatch rewrite (iter_completions / wait_one — observe results in completion order, not batched on wait_all)
  • Mid-stream cancellation via ToolPermissionContext.signal
  • All 16 examples verified against ≥ 3 backends
  • Docs site (mkdocs-material)
  • PyPI 1.0 release with semver guarantee

Drop-in compatibility — what works today

from mantis_agent import (
    # Core
    query, ClaudeAgentOptions, ClaudeSDKClient,

    # Messages (flat shape, matches claude_agent_sdk)
    AssistantMessage, UserMessage, SystemMessage, ResultMessage,
    TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock,

    # Tools
    tool, Tool, ToolRegistry, create_sdk_mcp_server,

    # Permissions
    PermissionResultAllow, PermissionResultDeny, ToolPermissionContext,

    # Hooks
    HookMatcher, HookInput, HookJSONOutput, HookContext,

    # Sub-agents
    AgentDefinition,

    # Plugins
    Plugin,

    # Built-in tools
    WebFetch, WebSearch,

    # Errors
    ClaudeSDKError, CLIConnectionError,
)

Every name in that import block has a working implementation backed by tests. ClaudeSDKClient is a streaming async context manager. Plugin(tools=..., system_prompt_addition=..., hooks=...) merges into the agent at session start. PermissionResultAllow(updated_input={...}) rewrites tool args before dispatch. ResultMessage.permission_denials carries every rejected call.


License

Apache-2.0. See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mantis_agent_sdk-1.4.1.tar.gz (486.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mantis_agent_sdk-1.4.1-py3-none-any.whl (343.6 kB view details)

Uploaded Python 3

File details

Details for the file mantis_agent_sdk-1.4.1.tar.gz.

File metadata

  • Download URL: mantis_agent_sdk-1.4.1.tar.gz
  • Upload date:
  • Size: 486.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mantis_agent_sdk-1.4.1.tar.gz
Algorithm Hash digest
SHA256 838de0937a18948bf18852eb6dc866d5c13432b888c974db0b4433a68c51d9f9
MD5 c44e0a656b10eadfb89f519a0d52711f
BLAKE2b-256 8d236bd91fe9c853e279f638acb792a239a9758878d56e975ca7e63e43181bd7

See more details on using hashes here.

File details

Details for the file mantis_agent_sdk-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: mantis_agent_sdk-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 343.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mantis_agent_sdk-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ee5f7bc09c47d2443afecb1193fed9f9117c52f515bbaab3d66c0263c05b43f8
MD5 5bf8cae3b8d69de236e404a722427d47
BLAKE2b-256 c66879596552dd11b97a7fa47a6bec1eb1576de157b966d225bf8b996963079e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page