Build, compose, and deploy coding agents from modular primitives.
Project description
Chimera
AI that reads, writes, edits, and iterates on code with tests. Chimera is a Python library for building these tools yourself, plus a ready-to-run coding agent on top of it.
Status: Alpha — 7720 passing tests, 62 skipped (live integration tests excluded; v0.7.0 baseline measured 2026-05-09). Reproducible benchmarks with GLM-5.1: HumanEval 66.5% pass@1 (109/164), SWE-bench Lite 10% (2/20, top-20 smallest patches). Raw results in data/.
Who This Is For
You build with CLI coding agents. You use terminal-native AI tools daily and you know what it feels like when an agent reads your codebase, edits files, and runs tests from your shell. You want to build your own — with your model, your tools, your rules — or take apart how these agents work to understand why they behave differently.
You're curious about coding agents. You've seen demos of AI writing entire apps. You want to understand what's actually happening — what the pieces are, how the loop works, why some agents are better at certain tasks. Chimera breaks it all down into parts you can inspect, modify, and run yourself.
What It Does
A coding agent is an LLM connected to your filesystem. It reads code, decides what to change, edits files, runs tests, and repeats until the task is done.
Chimera gives you two things:
-
A coding-agent harness with a plugin system — codebase search, auto-testing, code review, and context management, exposed as hooks, MCP servers, and skills you can wire into any compatible host.
-
A Python library for building your own coding agents from modular pieces — pick your LLM, pick your tools, pick your strategy, wire them together.
Install
Latest release: v0.7.0 (release notes).
Not yet on PyPI. Install from source:
pip install "git+https://github.com/0bserver07/chimera.git@v0.7.0#egg=chimera-run[anthropic]" # GLM-5 / Anthropic-compatible
pip install "git+https://github.com/0bserver07/chimera.git@v0.7.0#egg=chimera-run[openai]" # GPT
pip install "git+https://github.com/0bserver07/chimera.git@v0.7.0#egg=chimera-run[all]" # anthropic + openai + browser + remote
Requires Python 3.11+. A chimera-run PyPI release is planned post-alpha.
Build Your Own Coding Agent
from chimera.assembly.coding_agent import CodingAgent
# One line — full-featured coding agent with 24 tools.
# Requires ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic and ANTHROPIC_AUTH_TOKEN in env.
agent = CodingAgent(model="glm-5")
# Run a task
import asyncio
async def main():
async for event in agent.run("Fix the bug in auth.py"):
print(event.type.value, getattr(event.data, 'content', '')[:100])
asyncio.run(main())
Presets
| Preset | Tools | Features |
|---|---|---|
coding_agent |
24 (bash, read, write, edit, search, git, test, agent, skill, ...) | Permissions, hooks, transcripts, compaction, streaming |
codex |
24 | Permissions, transcripts (no hooks) |
minimal |
4 (bash, read, write, edit) | No extras |
explore |
3 (read, search, list) | Read-only |
# Codex-style agent
agent = CodingAgent.from_preset("codex", model="gpt-4o")
# Minimal agent for simple tasks
agent = CodingAgent.from_preset("minimal", model="claude-haiku-3.5")
# Custom API endpoint (any Anthropic-compatible API)
import os
os.environ["ANTHROPIC_BASE_URL"] = "https://your-api.com/v1"
os.environ["ANTHROPIC_AUTH_TOKEN"] = "your-key"
agent = CodingAgent(model="your-model")
Architecture
Chimera is modular — every component is replaceable:
CodingAgent
├── Provider (Anthropic, OpenAI, Google, Ollama, or any compatible API)
├── Tools (20+ built-in, plus custom tools, MCP servers, skills)
├── AgentLoop (async generator with streaming, error recovery, abort)
├── Permissions (multi-source rules, 6 modes, interactive prompts)
├── Hooks (27 lifecycle events, shell/LLM/function hooks)
├── Commands (slash commands, skills from .chimera/skills/)
├── Sub-Agents (3-tier context isolation, background tasks)
├── State (content replacement, file cache, session transcripts)
└── Infrastructure (feature flags, analytics, memory, compaction)
See Architecture for the full module map.
Run It Standalone
The Mink CLI ships a fully assembled coding agent with no extra setup:
chimera mink # interactive REPL on Ollama Kimi K2.6 by default
chimera mink -p "summarize this repo" # one-shot, prints to stdout
chimera mink runs list # inspect every persisted run
chimera mink agents list # show available agent presets
chimera code # legacy stack with slash commands and session save
chimera mink is the v0.3.0 coding REPL: streaming tool calls, hooks,
permissions from .claude/settings.json, MCP, subagents, and a rich
TUI on a TTY (auto-disabled when piping; force off with --no-color).
See the Mink quickstart for the walking
skeleton, env vars, and the runs/agents subcommand surface, and
docs/mink/providers.md for the full
provider matrix (Ollama, Anthropic, OpenAI, Google, OpenAI-compat).
Hooks run automatically on every edit:
- Path validation — blocks edits to files that don't exist (no more hallucinated paths)
- Auto-test — finds and runs related tests after every file change
- Auto-lint — runs your linter after every edit
- Security scan — blocks dangerous bash commands
- Verify done — runs the full test suite before the agent can declare "done"
MCP servers give the agent new tools to call:
chimera-search— semantic codebase search + symbol lookupchimera-review— multi-perspective code review (logic, security, tests, architecture, and 4 more)chimera-testgen— generate test skeletons from source analysischimera-migration— scan for and apply code migrations (Python 2 to 3, CJS to ESM)
The plugin honors a settings.json schema for ecosystem interop, so the same hooks/MCP/skills also drop into any host that follows that convention.
Setup guide — install in 2 minutes.
Discoverability note: Each of the 7 coding-agent CLIs has a purpose alias for tab-friendly invocation:
chimera tui≡chimera mink,chimera multi≡chimera otter,chimera sandbox≡chimera ferret,chimera mini≡chimera weasel,chimera tiny≡chimera shrew,chimera shell≡chimera stoat,chimera strict≡chimera badger. Runchimera agentsto list all seven with one-liner pitches and the upstream tool that inspired each. See docs/inspirations.md.
Otter — server-first coding agent
chimera otter is the second coding-agent CLI. Where chimera mink mirrors a TUI-first agent, otter mirrors a server-first / multi-client open-source coding agent: a single ReAct loop you can drive from a one-shot CLI, an interactive REPL, an HTTP server with SSE streaming, or an ACP JSON-RPC transport — all backed by the same LoopConfig, tool registry, and event-sourced session store the rest of Chimera uses.
Quick install + first run with glm-5.1:cloud (via Ollama's Anthropic-compatible bridge):
uv sync --extra dev --extra anthropic
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
chimera otter --model glm-5.1:cloud -p "summarize this repo" # one-shot
chimera otter --model glm-5.1:cloud # interactive REPL
chimera otter --model glm-5.1:cloud serve --port 5173 # HTTP + SSE
chimera otter --model glm-5.1:cloud serve --acp # ACP over stdio
Three transports, one loop:
- REPL — streaming text + tool calls, mid-turn steering,
Ctrl-Ccancel, 26-entry slash-command palette (/help,/share,/agent,/model,/sessions,/compact, …). - HTTP + SSE —
/sessions,/sessions/{id},/sessions/{id}/turns,/sessions/{id}/events(Server-Sent Events). OptionalOTTER_SERVER_TOKENBearer auth. - ACP — JSON-RPC over stdio for IDE clients that already speak the Agent Client Protocol.
Key flag surface:
chimera otter --model glm-5.1:cloud -p "..." # pick the provider/model
chimera otter --no-mcp -p "..." # disable MCP tool sources
chimera otter --no-rules -p "..." # ignore project + user rules files
chimera otter --no-plugins -p "..." # skip directory-loaded plugins
chimera otter --no-lsp -p "..." # disable LSP-backed edit verification
Every persisted run lives under ~/.chimera/eventlog/otter-<utc>-<uuid>/ and is listable, showable, and shareable (chimera otter sessions list | show | --since 7d, chimera otter share <id> --sink http|file|stdout --format html|md|json).
See the Otter quickstart for the full walkthrough — provider resolution order, env vars, on-disk layout, and pointers to providers.md, models.md, sessions.md, share.md, and server.md.
Ferret — sandbox-first IDE-flagship coding agent
chimera ferret is the third coding-agent CLI. Where mink is TUI-first and otter is server-first, ferret mirrors the upstream IDE-flagship posture: a sandbox-first runner with single-flag approval presets, an ACP JSON-RPC transport that ships as the default serve transport (HTTP is opt-in), and an optional cloud bridge so a local ferret session can be driven from a remote UI. The two headline guardrails compose: --sandbox blocks at the OS level, --approval blocks at the policy level, and a tool call has to pass both.
uv sync --extra dev --extra openai
export OPENAI_API_KEY=sk-...
chimera ferret -p "audit the repo" # default: read-only sandbox + read-only approval
chimera ferret --sandbox workspace-write --approval auto -p "fix tests" # writes inside cwd, no asks for safe ops
chimera ferret # interactive REPL
chimera ferret serve # ACP over stdio (the default)
chimera ferret serve --http --port 5173 # HTTP, opt-in
See the Ferret quickstart for the four entry points, sandbox modes, approval presets, IDE-bridge wiring, and cloud-bridge setup.
Weasel — minimal harness with four operating modes
chimera weasel is the fourth coding-agent CLI. Where mink/otter/ferret each ship strong opinions, weasel mirrors the minimal harness posture: powerful defaults, no sub-agents, no plan mode, no built-in approval presets — just one ReAct loop reachable through four interchangeable I/O envelopes (interactive REPL, one-shot print, stdio JSON-RPC, embedded SDK), an auto-discovered .weasel/extensions/ directory, and an embeddable Agent class. If you want more, you build it (or install an extension); weasel will not get in the way.
uv sync --extra dev --extra anthropic
export ANTHROPIC_API_KEY=sk-ant-...
chimera weasel # mode 1: interactive REPL
chimera weasel -p "summarize TODO comments in src/" # mode 2: one-shot print (add --json for a single JSON blob)
chimera weasel --mode rpc < requests.jsonl # mode 3: stdio JSON-RPC server
python -c "from chimera.weasel.sdk import Agent; print(Agent(model='claude-sonnet-4-6').run('list files').text)" # mode 4: SDK
See the Weasel quickstart for the four modes in detail, the extension layout, and the SDK recipe.
Shrew — coding agent tuned for small local models
chimera shrew is the fifth coding-agent CLI, explicitly tuned for small local models (Qwen3.5-9B, Qwen3.6-35B-A3B MoE, and friends). It is a thin layer on top of weasel — same four modes, same session schema, same extension surface — but with three small-model adjustments: the default model is qwen3.6-35b-a3b served by llama.cpp on 127.0.0.1:8888, --max-steps defaults to 30 (smaller than mink/otter's 50; small models don't benefit from long horizons), and the default --allowed-tools is the restricted Read,Write,Edit,Bash set so a 4-bit quantised model doesn't burn its context budget on tool selection. Cloud fallbacks (anthropic/claude-haiku-4-5, openai/gpt-4o-mini) work via --model vendor/name.
# Build llama.cpp and serve a GGUF on :8888 (see docs/shrew/small-model-setup.md)
./llama-server -m qwen3.6-35b-a3b.Q4_K_M.gguf --host 127.0.0.1 --port 8888 &
uv sync --extra dev
chimera shrew -p "explain this repo" # one-shot against the local llama.cpp server
chimera shrew # interactive REPL
chimera shrew --list-models # known model identifiers
chimera shrew bench aider-polyglot --bench-limit 5 # small-model benchmark harness
chimera shrew --model anthropic/claude-haiku-4-5 -p "..." # cloud fallback
See the Shrew quickstart for the small-model setup walkthrough and benchmark harness.
Stoat — shell-mode-toggle coding agent
chimera stoat is the sixth coding-agent CLI. Where the first five each ship rich opinionated postures, stoat's distinguishing ergonomic is the shell-mode toggle: in the same REPL, each line either feeds the LLM agent or runs as a direct shell command, and the user flips between the two with /shell (or --shell-mode on boot). Stoat is for users who live in their terminal and want one buffer for both ls -la and "explain this repo". The provider chain is Kimi-first via $MOONSHOT_API_KEY (kimi-k2.6 on api.moonshot.ai/v1), with Anthropic / OpenAI / OpenRouter / Ollama fallthroughs.
uv sync --extra dev --extra anthropic
export MOONSHOT_API_KEY=...
chimera stoat -p "summarize TODO comments in src/" # one-shot
chimera stoat # interactive REPL — toggle with /shell
chimera stoat --shell-mode # boot directly into shell mode
In the REPL, stoat> is agent mode, stoat$ is shell mode; /shell toggles. Mode-tagged history (/history renders > and $ markers) keeps both modes visible inline. See the Stoat quickstart for the full shell-mode walkthrough.
Badger — harness-rewrite coding agent
chimera badger is the seventh coding-agent CLI. Where stoat's headline is ergonomic, badger's is harness discipline: tighter step budget (--max-steps defaults to 25 vs 50 for the other six), rerun-on-failure as a first-class flag (--rerun-on-failure --max-reruns 2), and a parity-tracker subcommand (chimera badger parity --against PARITY.md) that diffs a declared schema against the live agent's defaults. The provider chain is Anthropic-first.
uv sync --extra dev --extra anthropic
export ANTHROPIC_API_KEY=sk-ant-...
chimera badger -p "Refactor src/util.py to remove duplicated string formatting"
chimera badger -p "fix the failing tests" --rerun-on-failure --max-reruns 3
chimera badger parity --against PARITY.json # rc=0 on match, rc=1 on diff
The rerun-on-failure detector is a conservative marker list (pytest summaries, Python tracebacks, Rust E0xxx, syntax errors, explicit BUILD FAILED). When fired, the refined-prompt directive names the markers and asks the agent to verify before reporting done. See the Badger quickstart for the rationale and full surface.
How It's Organized
Chimera is an 8-layer stack. Each layer has a documented API boundary; swap any provider, tool, env, or strategy without touching the rest.
What you run CLI commands: chimera code / synthesize / eval / review / ci-fix / fs
─────────────────────────────────────────────────────────────────
Automated CI repair, code review, research, migration planning, doc and
workflows test generation — multi-step pipelines built on the agent layer
─────────────────────────────────────────────────────────────────
Iterating on code Give it a spec and tests, it keeps trying until the tests pass.
Strategies: converge on tests, search a tree of approaches,
generate-then-verify (CEGIS), curriculum learning
─────────────────────────────────────────────────────────────────
Measuring quality Run benchmarks (HumanEval, SWE-bench, AIMO, custom), collect
pass rates and costs, compare agent configurations
─────────────────────────────────────────────────────────────────
The agent itself An LLM in a loop: think, call a tool, observe the result,
repeat. 24 built-in tools (read, write, edit, bash, search,
git, test, web fetch, etc). 4 loop strategies.
─────────────────────────────────────────────────────────────────
LLM providers Anthropic, OpenAI, Google, Ollama, Modal, or any
OpenAI-compatible API. Streaming, async, cost tracking.
─────────────────────────────────────────────────────────────────
Plumbing Auth, sessions (save/resume/fork), event bus, permissions,
context compaction, secrets, plugins, MCP, LSP
─────────────────────────────────────────────────────────────────
Where code runs Your filesystem, a Docker container, a git branch,
a remote server, or a cloud sandbox
Benchmarks
Reproducible runs with raw data in data/:
| Benchmark | GLM-5.1 | Raw data |
|---|---|---|
| HumanEval (164 problems) | 66.5% pass@1 (109/164) | data/humaneval-glm51-results.json |
| SWE-bench Lite (20 smallest patches) | 10% (2/20) | data/swebench-lite-glm51-results.jsonl |
Earlier GLM-5 runs (HumanEval, Terminal-Bench) exist in our notes but the raw result files were not preserved; we won't publish unverifiable numbers. Full transparency report — every benchmark has a status, methodology, and known gaps.
Run It Free with Ollama
Chimera speaks Ollama's Anthropic-compatible API out of the box. You can run the full agent against kimi-k2.6:cloud, glm-5.1:cloud, or any local Qwen/Llama with zero code changes:
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
python examples/agent/ollama_coding_agent.py --model kimi-k2.6:cloud
Full Ollama setup guide — prerequisites, recommended models, context window notes, troubleshooting.
When to Reach for Chimera
Use Chimera if you want to:
- Run a coding agent on your own model (local Ollama, GLM, GPT, Anthropic-compatible) with hooks, MCP, and skills wired in
- Build your own coding agent — different LLM, different tools, different strategy
- Understand how coding agents work — every major architecture decomposed into swappable pieces
- Research and benchmark — compare agent architectures with controlled experiments
Links
- Quick Start — hooks, MCP servers, skills
- Coding Agents Overview — comparative tour of the seven-strong family (mink, otter, ferret, weasel, shrew, stoat, badger)
- Mink Quickstart —
chimera minkREPL, runs/agents subcommands - Mink Providers — backend matrix, env vars, troubleshooting
- Otter Quickstart —
chimera otterone-shot, REPL, HTTP+SSE, ACP - Ferret Quickstart —
chimera ferretsandbox + approval presets, ACP-defaultserve, cloud bridge - Weasel Quickstart —
chimera weaselfour modes (REPL / print / RPC / SDK), extensions - Shrew Quickstart —
chimera shrewsmall-model defaults, llama.cpp setup, benchmark harness - Stoat Quickstart —
chimera stoatshell-mode toggle, Kimi-first provider chain - Badger Quickstart —
chimera badgerharness discipline, rerun-on-failure, parity-tracker - Build Your Own Agent — full library guide
- All Playbooks — 13 guides covering every feature
- Examples — 28 curated runnable scripts across 7 categories
- Function Synthesis — compile specs into callable
.chibundles- 3 runtime backends (llama.cpp, transformers, ONNX), schema validation, streaming invoke
LocalCompilerfor real PEFT fine-tuning; publish and fetch bundles viachimera fs push | pull(Hugging Face Hub + S3)- 10 CLI sub-verbs:
compile,run,list,rm,info,push,pull,import-peft,login,rename
- Benchmarks — transparency framework
- Benchmark adapters — every adapter under
chimera/eval/benchmarks/, status, and how to run - Contributing — setup, workflow, code style
- Changelog — version history
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chimera_run-0.7.0.tar.gz.
File metadata
- Download URL: chimera_run-0.7.0.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b3d1278a816d68e87c10605869a42b4eb12a10a9b6e57b414d211bf1c9ac5b5
|
|
| MD5 |
46d3f1a7253176f59d37f9f72b4f8d74
|
|
| BLAKE2b-256 |
dcd2a54067e836c1dd0a2ed123fc47dfd8ff686c7bfaedbec4d4972c9811b419
|
File details
Details for the file chimera_run-0.7.0-py3-none-any.whl.
File metadata
- Download URL: chimera_run-0.7.0-py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23f10bc3ad418277e915c202ad6a5710e59437d31802ddb2a058cf0e2d978f10
|
|
| MD5 |
5016e30420b9c4511d5a34829ae6a910
|
|
| BLAKE2b-256 |
b2f3fa6032677874300ebc4f00600dc99dc5d06cb7606a9d62dc77e41cd580c3
|