Open-source AI agent runtime for any LLM — production-grade coding agent with multi-layer memory, multi-agent orchestration, and defense-in-depth security
Project description
llmcode
Python-native coding agent runtime tuned for local LLMs
5-layer memory · synthesis-first multi-agent · per-model prompts for Qwen / Llama / DeepSeek
Quick Start · Why llmcode · Features · vs Other Tools · Configuration · Docs
Why llmcode?
There are several great open-source AI coding agents now (opencode, Aider, Continue, etc). llmcode exists for a specific niche they don't fully serve:
You want a Claude Code-style coding agent that runs your own model on your own GPU, written in Python so it integrates with your existing Python LLM stack, with deep optimization for the smaller models you'll actually run locally.
If you check any of these boxes:
- You run vLLM, Ollama, or LM Studio with Qwen / Llama / DeepSeek locally
- You don't want another Node.js runtime in your stack (you already have Python)
- You've tried tools tuned for Claude/GPT and watched smaller models drown in the system prompt
- You need multi-agent coordination that doesn't over-spawn on local models
- You want persistent project memory that survives across sessions
- You care about CJK / multi-language terminal handling
then llmcode is for you.
If you mostly use cloud APIs and don't need any of the above, opencode is more mature and you should probably use it.
██╗ ██╗ ███╗ ███╗
██║ ██║ ████╗ ████║
██║ ██║ ██╔████╔██║
██║ ██║ ██║╚██╔╝██║
███████╗ ███████╗ ██║ ╚═╝ ██║
╚══════╝ ╚══════╝ ╚═╝ ╚═╝
██████╗ ██████╗ ██████╗ ███████╗
██╔════╝ ██╔═══██╗ ██╔══██╗ ██╔════╝
██║ ██║ ██║ ██║ ██║ █████╗
██║ ██║ ██║ ██║ ██║ ██╔══╝
╚██████╗ ╚██████╔╝ ██████╔╝ ███████╗
╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝
Quick Start
pip install llmcode-cli
llmcode: command not found? pip installs scripts to~/.local/bin(Linux/macOS) or%APPDATA%\Python\Scripts(Windows). Add it to your PATH:export PATH="$HOME/.local/bin:$PATH"
With a local model (zero cost, fully offline):
mkdir -p ~/.llmcode
cat > ~/.llmcode/config.json << 'EOF'
{
"model": "qwen3.5",
"provider": {
"base_url": "http://localhost:8000/v1"
}
}
EOF
llmcode
With a cloud API:
cat > ~/.llmcode/config.json << 'EOF'
{
"model": "claude-sonnet-4-6",
"provider": {
"base_url": "https://api.anthropic.com/v1",
"api_key_env": "ANTHROPIC_API_KEY"
}
}
EOF
llmcode
Docker (self-hosted):
docker pull ghcr.io/djfeu/llmcode:latest
docker run -it --rm \
-v "$PWD:/workspace" \
-v "$HOME/.llmcode:/home/llmcode/.llmcode" \
--network host \
ghcr.io/djfeu/llmcode
Modes
llmcode # Default fullscreen TUI
llmcode --provider ollama # Auto-detect Ollama + interactive model selector
llmcode --mode plan # Read-only mode, plan before execution
llmcode --yolo # Auto-accept all permissions (dangerous)
llmcode -x "find large files" # Shell assistant: translate to command + execute
llmcode -q "explain this" # Quick Q&A without TUI
llmcode --serve --port 8765 # Remote WebSocket server
llmcode --connect host:8765 # Connect to remote agent
llmcode --resume # Resume from checkpoint
How it compares
llmcode is deeply influenced by Claude Code's architecture, borrows proven patterns from opencode, and adopts ideas from Qwen Code (Alibaba's Gemini CLI fork for Qwen models).
| Feature | llmcode | Claude Code | Qwen Code | opencode |
|---|---|---|---|---|
| Open source | ✅ | ❌ | ✅ | ✅ |
| Language | Python | TypeScript | TypeScript | TypeScript |
| Local model first | ✅ | ❌ | ✅ | ⚠️ |
| Default model | any | Claude | Qwen3-Coder | any |
| Free tier | self-hosted | ❌ | 1000 req/day | self-hosted |
| Per-model system prompts | ✅ | N/A | ⚠️ | ✅ |
| Qwen / Llama / DeepSeek tuned prompts | ✅ | ❌ | ⚠️ | ❌ |
| Model profile system (TOML) | ✅ | ❌ | ❌ | ❌ |
| Skill router (auto match) | 3-tier | ❌ | manual | manual |
| Memory system | 5-layer | basic | basic | basic |
| Multi-agent coordinator | synthesis-first | ❌ | Arena pattern | task tool |
| Arena parallel agents | ✅ | ❌ | ✅ | ❌ |
| Specialist personas | ✅ | ❌ | ❌ | ⚠️ |
| Plan mode | ✅ | ❌ | ✅ | ❌ |
| Docker sandbox | ✅ | ❌ | ✅ | ❌ |
| PTY (interactive shell) | ✅ | ❌ | ✅ | ❌ |
| Context overlap detection | ✅ | ❌ | ❌ | ❌ |
| Diminishing returns auto-stop | ✅ | ❌ | ❌ | ❌ |
| Prompt caching (Anthropic) | ✅ | ✅ | ❌ | ❌ |
| Signed thinking round-trip | ✅ | ✅ | ❌ | ❌ |
| IDE extensions | ✅ | ✅ | ✅ | ❌ |
| i18n (UI level) | ⚠️ | ❌ | ✅ | ❌ |
| MCP servers | ✅ | ✅ | ✅ | ✅ |
| Plugin ecosystem | ✅ | ✅ | ✅ | ✅ |
| Voice input | ✅ | ❌ | ❌ | ❌ |
| Computer use | ✅ | ✅ | ❌ | ❌ |
| Notebook tools | ✅ | ❌ | ❌ | ❌ |
| YOLO mode | ✅ | ✅ | ✅ | ✅ |
Where each tool shines
llmcode — 5-layer memory, synthesis-first multi-agent, diminishing returns detection, per-model prompt tuning for 9 model families, Python-native integration, declarative model profiles with TOML overrides, Anthropic prompt caching + signed thinking.
Qwen Code — Best if you use Qwen models exclusively: free 1000 req/day via Qwen OAuth, IDE extensions (VS Code/Zed/JetBrains), messaging channel deployment (Telegram/WeChat/DingTalk), full i18n. Based on Google Gemini CLI.
opencode — Wider community, more mature, TypeScript ecosystem native.
Claude Code — Most polished UX, deepest Claude integration, but closed-source and cloud-only.
Features
Local-LLM optimization
This is llmcode's core focus. Local models behave very differently from Claude / GPT:
- They drown in big system prompts. llmcode's 3-tier skill router only injects skills that match the current intent — keyword match → TF-IDF similarity → optional LLM classifier. No more "all 28 skills loaded every turn".
- They follow instructions too literally. llmcode has separate per-model system prompts for Qwen, Llama, DeepSeek, Kimi, Codex, Gemini, GPT, and Claude — auto-selected from model name.
- They tend to repeat themselves. llmcode's diminishing returns detection auto-stops when continuation produces < 500 new tokens for 3+ iterations in a row.
- They over-spawn agents. llmcode's coordinator forces a synthesis step before delegation, asking "should I delegate at all?" before splitting work.
Memory system (5 layers)
| Layer | Purpose | Lifetime |
|---|---|---|
| L0 Governance | Project rules from CLAUDE.md / AGENTS.md / .llmcode/governance.md |
Permanent, always loaded |
| L1 Working | Current task scratch space | Ephemeral |
| L2 Project | Long-term project knowledge with 4-type taxonomy (user/feedback/project/reference) | Persistent, DreamTask consolidates |
| L3 Task | Multi-session task state machine (PLAN→DO→VERIFY→CLOSE→DONE) | Cross-session |
| L4 Summary | Past session summaries | Persistent |
Plus typed memory with MEMORY.md index, 25KB hard limit, and content validation that rejects derivable content (git logs, code dumps, file path lists).
See docs/memory.md for the full guide.
Coordinator with synthesis-first
user task → synthesize → should_delegate? → decompose → spawn/resume → wait → aggregate
The coordinator's first action is not decomposition — it's a synthesis check that asks the LLM "do I actually need to delegate this, and if so, what do I already know vs. what needs investigation?" This catches 30-50% of cases where naive coordinators would have spawned 3-5 unnecessary workers for trivial tasks.
Plus subagent resume — pass resume_member_ids to continue existing workers instead of spawning fresh, so multi-stage workflows keep their accumulated context.
See docs/coordinator.md for the full tutorial.
Tools
| Category | Tools |
|---|---|
| File I/O | read_file, write_file, edit_file, multi_edit (with resolve_path workspace boundary check) |
| Search | glob_search, grep_search, tool_search |
| Web | web_search (DuckDuckGo / Brave / Tavily / SearXNG backends), web_fetch |
| Execution | bash (21-point security + Docker sandbox + PTY mode), agent (sub-agents with tier-based role routing: build / plan / explore / verify / general), enter_plan_mode, exit_plan_mode |
| LSP | lsp_hover, lsp_document_symbol, lsp_workspace_symbol, lsp_go_to_definition, lsp_find_references, lsp_go_to_implementation, lsp_call_hierarchy, lsp_diagnostics (auto-detects 25+ language servers via walk-up root finder) |
| Git | git_status, git_diff, git_log, git_commit, git_push, git_stash, git_branch |
| Notebook | notebook_read, notebook_edit |
| Computer Use | screenshot, mouse_click, keyboard_type, key_press, scroll, mouse_drag |
| Task Lifecycle | task_plan, task_verify, task_close |
| Scheduling | cron_create, cron_list, cron_delete |
| IDE | ide_open, ide_diagnostics, ide_selection |
| Swarm | swarm_create, swarm_list, swarm_message, swarm_delete, coordinate |
| Skills | skill_load (LLM-driven loading on top of auto-router) |
Smart per-model tool selection: GPT models get apply_patch (unified diff format), other models get edit_file. Auto-detected from model name.
Path resolution: resolve_path() auto-corrects wrong absolute paths from LLM (e.g. llm-code vs llm_code confusion) with workspace boundary check to prevent path traversal.
Model Profile System
Declarative per-model profiles replace scattered hardcoded model adaptations. Profiles control:
- Provider capabilities — native tools, image support, reasoning mode
- Streaming behavior — implicit thinking, reasoning field names, thinking budget format (
chat_template_kwargsvsanthropic_native) - Deployment — local model detection (unlimited token upgrades), auto-discovery via
/v1/modelsprobe - Routing — per-model tier-C skill router model override
- Pricing — per-1M-token input/output costs for cost tracking
Built-in profiles for Qwen3/3.5, Claude, GPT-4o, DeepSeek-R1, o3/o4-mini. User overrides via ~/.llmcode/model_profiles/*.toml. See examples/model_profiles/ for templates.
Anthropic Provider
Native httpx-based provider for Anthropic's Messages API:
- Prompt caching — automatic
cache_controlon system prompt, tools, and last user message - Signed thinking — signature delta accumulation for extended thinking round-trip
- Server tool use —
server_tool_use/server_tool_resultblocks with signature round-trip (web search, etc.) - Overload backoff — progressive 30s → 60s → 120s retry on 529
Security
- 21-point bash security — injection detection, network access control, credential paths, recursive operation warnings, etc.
- MCP instruction sanitization — strips prompt injection patterns
- Bash output secret scanning — auto-redacts AWS/GitHub/JWT keys before they enter LLM context
- Environment variable filtering — sensitive vars replaced with
[FILTERED] - File protection —
.env, SSH keys,*.pemblocked on write - Workspace boundary checks — file tools refuse paths outside the project tree
- Docker sandbox — optional container isolation for bash commands (Docker/Podman auto-detected, configurable image/network/memory limits)
- Plugin permissions gate — blocks plugins requesting subprocess/fs_write/env unless
--force
Terminal UI
- Native text selection —
mouse=Falsepreserves terminal text selection; use Shift+Up/Down and PageUp/PageDown for scrollback - Cmd+V auto-detect — text via bracketed paste, image via clipboard fallback
- Shift+Tab cycles agents — BUILD → PLAN → SUGGEST → BUILD
- Mouse wheel scrolling — scroll up to browse history (auto-scroll pauses), scroll back down to resume
- PageUp/Down + Shift+↑/↓ — scrollback navigation
/yolo— toggle auto-accept/init— generateAGENTS.mdfrom repo analysis/copy— copy last response to clipboard/search— cross-session FTS5 search/personas— list specialist agents (Sisyphus refactor / Oracle deep-analysis / Atlas orchestrator / Librarian / Explore / Metis / Momus / Multimodal-Looker / WebResearcher)/orchestrate <task>— category-routed persona dispatch with retry-on-failure/profile— per-model token/cost breakdown for the current session/settings— settings panel/set <key> <value>— live config write-back (temperature, max_tokens, model)/model— switch model with profile info display (capabilities, pricing, provider)/export <path>— chunked markdown export of the conversation/compact— manually compact conversation history- Ctrl+P — Quick Open fuzzy file finder
- Click-to-open URLs — markdown links and bare URLs in chat are clickable (cell-aware, CJK-safe)
- 180 spinner verbs — Pondering, Caramelizing, Brewing… randomized per turn
- Background task indicator — status bar shows running/pending tasks
- Vim mode — full motions, operators, text objects
Hooks (24 events)
{
"hooks": [
{"event": "post_tool_use", "tool_pattern": "write_file|edit_file", "command": "ruff format {path}"},
{"event": "session.*", "command": "echo $HOOK_EVENT >> ~/agent.log", "on_error": "ignore"}
]
}
Categories: tool, command, prompt, agent, session, http.
Builtin hooks (opt-in via config.builtin_hooks.enabled):
context_window_monitor— warns once per session when input tokens exceed 75% of the model's context limitthinking_mode— detects "ultrathink" / 深入思考 keywords in user prompts and boosts the next turn's thinking budgetrules_injector— auto-injectsCLAUDE.md/AGENTS.md/.cursorrulescontent when reading files inside a project that has themauto_format— format files after write/edit (existing)
Marketplace
Compatible with Claude Code's plugin ecosystem.
/skill # Browse skills
/plugin install obra/superpowers
/mcp # Browse MCP servers
Sources: Official (anthropics/claude-plugins-official), Community, npm, GitHub.
Configuration
{
"model": "qwen3.5",
"provider": {
"base_url": "http://localhost:8000/v1",
"timeout": 120
},
"permissions": {
"mode": "prompt"
},
"model_routing": {
"sub_agent": "qwen3.5-32b",
"compaction": "qwen3.5-7b",
"fallback": "qwen3.5-7b"
},
"skill_router": {
"enabled": true,
"tier_a": true,
"tier_b": true,
"tier_c": false
},
"diminishing_returns": {
"enabled": true,
"min_continuations": 3,
"min_delta_tokens": 500
},
"swarm": {
"enabled": true,
"synthesis_enabled": true,
"max_members": 5
},
"thinking": { "mode": "adaptive", "budget_tokens": 10000 },
"dream": { "enabled": true, "min_turns": 3 },
"hooks": []
}
Config locations (low → high precedence)
~/.llmcode/config.json— User global.llmcode/config.json— Project.llmcode/config.local.json— Local (gitignored)- CLI flags / env vars
Lazy / scoped MCP servers
mcpServers now supports a split schema so heavy MCP servers start only
when a persona or skill that needs them is invoked (gated by an in-TUI
approval prompt). Legacy flat configs still work — every entry is treated
as always_on.
{
"mcpServers": {
"always_on": {
"filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "."] }
},
"on_demand": {
"tavily": {
"command": "npx",
"args": ["-y", "tavily-mcp"],
"env": { "TAVILY_API_KEY": "$TAVILY_API_KEY" }
},
"browser": {
"command": "npx",
"args": ["-y", "@browsermcp/mcp"]
}
}
}
}
A persona declares which on_demand servers it needs via its
mcp_servers tuple (see llm_code/swarm/personas/web_researcher.py);
a skill can declare the same via an mcp_servers: list in its SKILL.md
frontmatter. Persona-scoped servers are torn down when the persona
finishes; skill-scoped servers live for the session.
Optional features
pip install llmcode-cli[voice] # Voice input via STT
pip install llmcode-cli[computer-use] # GUI automation
pip install llmcode-cli[ide] # IDE integration
pip install llmcode-cli[telemetry] # OpenTelemetry tracing
pip install llmcode-cli[treesitter] # Tree-sitter multi-language repo map
Docs
- Memory system — 5-layer architecture, typed taxonomy, DreamTask
- Coordinator — synthesis-first orchestration, resume mechanism
- Architecture — high-level system overview
- Plugins — building plugins
- Tools — tool reference
- Configuration — all config options
Architecture
llm_code/ 29,000+ lines Python
├── api/ Provider abstraction (OpenAI-compat + Anthropic)
├── cli/ CLI entry point, TUI launcher, oneshot modes (-x/-q)
│ └── templates/ LLM-driven command templates (init.md, etc)
├── runtime/ ReAct engine, 5-layer memory, skill router,
│ compression, hooks, permissions, checkpoint,
│ dream, VCR, speculative execution, telemetry,
│ file protection, sandbox, secret scanner,
│ conversation DB, tree-sitter repo map
│ └── prompts/ Per-model system prompts (anthropic, gpt,
│ gemini, qwen, llama, deepseek, kimi, codex)
├── tools/ 30+ tools with deferred loading + security
├── task/ PLAN/DO/VERIFY/CLOSE state machine
├── hida/ Dynamic context loading (10-type classifier)
├── mcp/ MCP client (4 transports) + OAuth + health checks
├── marketplace/ Plugin system + security scanning
├── lsp/ Language Server Protocol client
├── remote/ WebSocket server/client + SSH proxy
├── vim/ Vim engine
├── voice/ STT (Whisper, Google, Anthropic backends)
├── computer_use/ GUI automation
├── cron/ Task scheduler
├── ide/ IDE bridge (WebSocket JSON-RPC)
├── swarm/ Multi-agent coordinator (synthesis-first)
└── utils/ Notebook, diff, hyperlinks, search
tests/ 3,696 tests across 270+ files
Contributing
git clone https://github.com/DJFeu/llmcode
cd llmcode
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # 3,696 tests
ruff check llm_code/ # lint
Looking for contributors interested in:
- More provider integrations (Anthropic native, OpenAI, Google, xAI, DeepSeek)
- More built-in skills (especially for Python-specific workflows)
- IDE integrations (VS Code, JetBrains, Neovim)
- i18n / l10n
- Per-model prompt tuning for additional model families
- Documentation, tutorials, examples
- Real-world usage feedback (especially on local Qwen/Llama/DeepSeek)
Requirements
- Python 3.11+
- An LLM server (vLLM, Ollama, LM Studio, or any OpenAI-compatible cloud API)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmcode_cli-1.16.1.tar.gz.
File metadata
- Download URL: llmcode_cli-1.16.1.tar.gz
- Upload date:
- Size: 455.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40ce060761d1f70e9b57b8ff24fcc25ca8e32ec5fec7e7798360c39d2b69b5b2
|
|
| MD5 |
fcb44621ca630b646f22d50c1f442ea4
|
|
| BLAKE2b-256 |
b0ecc9481d4ff635bf1e8c08d7ee0f73e3aa0f3e12f43432e5d8d999f20c5e53
|
Provenance
The following attestation bundles were made for llmcode_cli-1.16.1.tar.gz:
Publisher:
publish.yml on DJFeu/llmcode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmcode_cli-1.16.1.tar.gz -
Subject digest:
40ce060761d1f70e9b57b8ff24fcc25ca8e32ec5fec7e7798360c39d2b69b5b2 - Sigstore transparency entry: 1270517507
- Sigstore integration time:
-
Permalink:
DJFeu/llmcode@96243e88b1d26c3e73af1d1d368ddcbccf19a401 -
Branch / Tag:
refs/tags/v1.16.1 - Owner: https://github.com/DJFeu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@96243e88b1d26c3e73af1d1d368ddcbccf19a401 -
Trigger Event:
release
-
Statement type:
File details
Details for the file llmcode_cli-1.16.1-py3-none-any.whl.
File metadata
- Download URL: llmcode_cli-1.16.1-py3-none-any.whl
- Upload date:
- Size: 597.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6a55d48811299d23093d1a6a727ba1cfe6e1c3995b072468df99ddd6d404897
|
|
| MD5 |
a4dc0aab69d4be4f631e6e81920fb7ca
|
|
| BLAKE2b-256 |
86c417b2eb7cb2c874ae5ebb8161e05dea7bbeb59d0924805c87495077202906
|
Provenance
The following attestation bundles were made for llmcode_cli-1.16.1-py3-none-any.whl:
Publisher:
publish.yml on DJFeu/llmcode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmcode_cli-1.16.1-py3-none-any.whl -
Subject digest:
c6a55d48811299d23093d1a6a727ba1cfe6e1c3995b072468df99ddd6d404897 - Sigstore transparency entry: 1270517522
- Sigstore integration time:
-
Permalink:
DJFeu/llmcode@96243e88b1d26c3e73af1d1d368ddcbccf19a401 -
Branch / Tag:
refs/tags/v1.16.1 - Owner: https://github.com/DJFeu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@96243e88b1d26c3e73af1d1d368ddcbccf19a401 -
Trigger Event:
release
-
Statement type: