Minimal code agent CLI
Project description
Klaude Code
Minimal code agent CLI.
Features
- Multi-provider: Anthropic Message API, OpenAI Responses API, OpenRouter, ChatGPT Codex OAuth etc.
- Keep reasoning item in context: Interleaved thinking support
- Model-aware tools: Claude Code tool set for Opus,
apply_patchfor GPT-5/Codex - Reminders: Cooldown-based todo tracking, instruction reinforcement and external file change reminder
- Sub-agents: General Purpose, Finder, Code Reviewer, Code Simplifier (+ fork-context variant)
- Recursive
@filementions: Circular dependency protection, relative path resolution - External file sync: Monitoring for external edits (linter, manual)
- Interrupt handling: Ctrl+C preserves partial responses and synthesizes tool cancellation results
- Output truncation: Large outputs saved to file system with snapshot links
- Agent Skills: Built-in + user + project Agent Skills (with implicit invocation by Skill tool or explicit invocation by typing
//skillor/skill) - Prompt caching: Append-only message history maximizes prefix cache hits (cached tokens cost 10% of base input)
- Context management: Auto-compaction, Rewind (rollback to checkpoint), Handoff (compress and continue in fresh context)
- Auto memory: Persistent cross-session memory per project (
~/.klaude/projects/<project>/memory/) - Web UI: Browser-based interface via
klaude webor/webslash command - Sessions: Resumable with
--continue, forkable with/fork-session - Extras: Slash commands, sub-agents, image paste, terminal notifications, auto-theming
Installation
uv tool install klaude-code
To update:
uv tool upgrade klaude-code
Or use the built-in command:
klaude upgrade
Development Install
git clone https://github.com/inspirepan/klaude-code.git
cd klaude-code
make install # init submodules, build web frontend, install as editable
Or step by step:
git submodule update --init --recursive
uv sync # install Python deps
uv run python scripts/build_web.py # build web frontend
uv tool install -e . # install CLI globally (editable)
Requires pnpm or npm for the web frontend build (pnpm preferred).
Usage
klaude [--model [<name>]] [--continue] [--resume [<id>]]
Options:
--model/-m: Choose a model.--model(no value): opens the interactive selector.--model <value>: resolves<value>to a single model; if it can't, it opens the interactive selector filtered by<value>.
--continue/-c: Resume the most recent session.--resume/-r: Resume a session.--resume(no value): select a session to resume for this project.--resume <id>: resume a session by its ID directly.
--vanilla: Minimal mode with only basic tools (Bash, Read, Edit, Write) and no system prompts.
Model selection behavior:
- Default: uses
main_modelfrom config. --model(no value): always prompts you to pick.--model <value>: tries to resolve<value>to a single model; if it can't, it prompts with a filtered list (and falls back to showing all models if there are no matches).
Debug Options:
--debug/-d: Enable debug mode with verbose logging and LLM trace.--debug-filter: Filter debug output by type (comma-separated).
Configuration
Quick Start (Zero Config)
Klaude comes with built-in provider configurations. Just set an API key environment variable and start using it:
# Pick one (or more) of these:
export ANTHROPIC_API_KEY=sk-ant-xxx # Claude models
export OPENAI_API_KEY=sk-xxx # GPT models
export OPENROUTER_API_KEY=sk-or-xxx # OpenRouter (multi-provider)
export DEEPSEEK_API_KEY=sk-xxx # DeepSeek models
export MOONSHOT_API_KEY=sk-xxx # Moonshot/Kimi models
export MINIMAX_API_KEY=xxx # MiniMax models
export GOOGLE_API_KEY=xxx # Google Gemini models (or GEMINI_API_KEY)
export EXA_API_KEY=exa-xxx # Exa Search (optional, WebSearch provider, preferred)
export BRAVE_API_KEY=BSA-xxx # Brave Search (optional, WebSearch provider, fallback)
# Then just run:
klaude
On first run, you'll be prompted to select a model. Your choice is saved as main_model.
You can also configure fallback lists for the main model and helper models:
main_model:
- gpt-5.5
- gpt-5.4
- opus
fast_model:
- haiku
- gemini-flash
- gpt-5-nano
compact_model:
- gemini-flash
- haiku
Klaude expands each entry into concrete provider candidates in provider_list order, then falls through to the next model in the list. For example, gpt-5.4 will try available providers such as gpt-5.4@openai, gpt-5.4@github-copilot, and gpt-5.4@openrouter before moving to the next model. Runtime fallback is used for non-retryable provider/model failures such as quota, billing, permission, or model-unavailable errors. fast_model is used for session-title generation; compact_model is used for compact/helper tasks.
When you switch models with /model, Klaude updates main_model without discarding the fallback chain: the selected model is moved to the front if it already exists, or inserted at the front otherwise.
Built-in Providers
| Provider | Env Variable | Models |
|---|---|---|
| anthropic | ANTHROPIC_API_KEY |
sonnet, sonnet-no-thinking, opus, haiku |
| openai | OPENAI_API_KEY |
gpt-5.4-high, gpt-5.4-xhigh, gpt-5.3-codex, gpt-5.3-codex-xhigh |
| openrouter | OPENROUTER_API_KEY |
gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh, kimi, haiku, sonnet, sonnet-no-thinking, opus, gemini-pro, gemini-flash, grok, minimax, glm |
GOOGLE_API_KEY or GEMINI_API_KEY |
gemini-pro, gemini-flash | |
| google-vertex | GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION |
gemini-pro, gemini-flash |
| deepseek | DEEPSEEK_API_KEY |
deepseek |
| moonshot | MOONSHOT_API_KEY |
kimi |
| minimax | MINIMAX_API_KEY |
m2.7, m2.7:highspeed |
| cerebras | CEREBRAS_API_KEY |
glm |
| claude-max | N/A (OAuth) | sonnet, sonnet-no-thinking, opus, haiku |
| codex | N/A (OAuth) | gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh |
| github-copilot | N/A (OAuth) | gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh, sonnet, sonnet-4.5, haiku, opus |
| ark-api | ARK_API_KEY |
seed-pro, seed-code |
| ark-coding-plan | ARK_API_KEY |
seed-code, kimi |
List all configured providers and models:
klaude list
Models from providers without valid credentials are shown as dimmed/unavailable.
Bedrock is supported as a custom provider rather than a built-in one. See docs/bedrock-setup.md.
Authentication
Use the auth command to configure API keys or login to subscription-based providers:
# Interactive provider selection
klaude auth login
# Configure API keys
klaude auth login anthropic # Set ANTHROPIC_API_KEY
klaude auth login openai # Set OPENAI_API_KEY
klaude auth login google # Set GOOGLE_API_KEY
klaude auth login openrouter # Set OPENROUTER_API_KEY
klaude auth login deepseek # Set DEEPSEEK_API_KEY
klaude auth login moonshot # Set MOONSHOT_API_KEY
klaude auth login minimax # Set MINIMAX_API_KEY
# OAuth login for subscription-based providers
klaude auth login codex # ChatGPT Pro subscription
API keys are stored in ~/.klaude/klaude-auth.json and used as fallback when environment variables are not set.
To logout from OAuth providers:
klaude auth logout codex
Custom Configuration
User config file: ~/.klaude/klaude-config.yaml
Open in editor:
klaude conf
Model Configuration
You can add custom models to built-in providers or define new ones. Configuration is inherited from built-in providers by matching provider_name.
# ~/.klaude/klaude-config.yaml
main_model:
- gpt-5.5
- gpt-5.4
- opus
fast_model:
- haiku
- gemini-flash
- gpt-5-nano
compact_model:
- gemini-flash
- haiku
provider_list:
# Add/Override models for built-in OpenRouter provider
- provider_name: openrouter
model_list:
- model_name: qwen-coder
model_id: qwen/qwen-2.5-coder-32b-instruct
context_limit: 131072
cost: { input: 0.3, output: 0.9 }
- model_name: sonnet # Override built-in sonnet params
model_id: anthropic/claude-3.5-sonnet
context_limit: 200000
# Add a completely new provider
- provider_name: my-azure
protocol: openai
api_key: ${AZURE_OPENAI_KEY}
base_url: https://my-instance.openai.azure.com/
is_azure: true
azure_api_version: "2024-02-15-preview"
model_list:
- model_name: gpt-4
model_id: gpt-4-deploy-name
context_limit: 128000
Key Tips:
- Merging: If
provider_namematches a built-in provider, settings likeprotocolandapi_keyare inherited. - Overriding: Use the same
model_nameas a built-in model to override its parameters. - Environment Variables: Use
${VAR_NAME}syntax for secrets. - Model Preference Lists:
main_model,fast_model, andcompact_modelaccept either a single string or a list of model selectors. When you provide a list, Klaude first tries matching providers inprovider_listorder for each selector, then moves to the next selector. - Updating Defaults:
/modelkeeps saving the selected model back tomain_model, but preserves fallback order by moving or inserting the selected model at the front of the list.
Sub-agent Model Configuration
sub_agent_models accepts registered sub-agent type names as keys. Current supported keys are:
general-purpose- Autonomous multi-step task executorgeneral-purpose-fork-context- Same as above but inherits parent conversation historyfinder- Fast codebase search and explorationcode-reviewer- Identifies bugs in proposed changescode-simplifier- Refines code for clarity and consistency
If a sub-agent type is not configured, it falls back to the main agent model. Each key also accepts a list for fallback ordering.
sub_agent_models:
general-purpose: sonnet
finder:
- haiku
- gemini-flash
code-reviewer: opus
Supported Protocols
anthropic- Anthropic Messages APIopenai- OpenAI Chat Completion APIresponses- OpenAI Responses API (for o-series, GPT-5, Codex)codex_oauth- OpenAI Codex CLI (OAuth-based, for ChatGPT Pro subscribers)github_copilot_oauth- GitHub Copilot (OAuth-based)openrouter- OpenRouter API (handlingreasoning_detailsfor interleaved thinking)google- Google Gemini APIgoogle_vertex- Google Vertex AI (uses GCP credentials)bedrock- AWS Bedrock for Claude (uses AWS credentials instead of api_key)
For a working Bedrock provider example, see docs/bedrock-setup.md.
List configured providers and models:
klaude list
Cost Tracking
View aggregated usage statistics across all sessions:
# Show all historical usage data
klaude cost
# Show usage for the last 7 days only
klaude cost --days 7
# Alias for days
klaude cost --recent 7
Slash Commands
Inside the interactive session (klaude), use these commands to streamline your workflow:
-
/...supports mixed completion for commands + skills (command names take priority on conflicts). -
//...shows skill-only completion and triggers skills explicitly. -
/copy- Copy last assistant message to clipboard. -
/compact- Clear conversation history but keep a summary in context. -
/fork-session- Fork current session from a selected point. -
/refresh-terminal- Refresh terminal display. -
/web- Switch to web UI mode. -
/new- Start a new session (clears context). -
/model- Switch the active LLM and updatemain_modelin config while preserving fallback order (the selected model is moved/inserted to the front). -
/sub-agent-model- Configure sub-agent models at runtime. -
/thinking- Change thinking/reasoning level. -
/status- Show session usage statistics (cost, tokens, model breakdown). -
/login- Login to provider or configure API key. -
/logout- Logout from provider. -
/continue- Continue current session without a new user message. -
/debug [filters]- Toggle debug mode and configure debug filters.
Input Shortcuts
| Key | Action |
|---|---|
Enter |
Submit input |
Shift+Enter |
Insert newline (terminal-dependent) |
Ctrl+J |
Insert newline |
Ctrl+L |
Open model picker overlay |
Ctrl+T |
Open thinking level picker overlay |
Ctrl+V |
Paste image from clipboard |
Left/Right |
Move cursor (wraps across lines) |
Backspace |
Delete character or selected text |
c (with selection) |
Copy selected text to clipboard |
Sub-Agents
The main agent can spawn specialized sub-agents for specific tasks:
| Sub-Agent | Purpose |
|---|---|
| General Purpose | Handle complex multi-step tasks autonomously |
| General Purpose (Fork Context) | Same as above, but inherits the parent agent's full conversation history |
| Finder | Fast codebase exploration - find files, search code, answer questions about the codebase |
| Code Reviewer | Identify real bugs in proposed changes |
| Code Simplifier | Refine recently changed code for clarity and consistency |
Web UI
Klaude includes a browser-based interface as an alternative to the terminal TUI.
# Start web UI directly
klaude web
# With options
klaude web --port 9000 --host 0.0.0.0 --no-open
You can also switch from TUI to web mid-session with the /web slash command. The web UI provides the same capabilities as the TUI: multi-session management, file browsing, tool execution, and real-time streaming.
Prompt Caching
Klaude is designed to maximize prefix cache hit rates across LLM API calls. Cached tokens are priced at ~10% of base input tokens, so high cache hit rates significantly reduce cost.
Append-only message history. The conversation history is strictly append-only. New messages, tool results, and attachments are always appended to the end of the message array, never inserted or modified in the middle. Any mutation to the head of the messages array (compressing old tool results, replacing images, reordering tool definitions) would invalidate the prefix cache and force a full re-tokenization.
Design choices that preserve prefix stability:
- Stable system prompt: The system prompt is composed of a static base prompt + stable tool strategy block + environment info, avoiding per-turn variation.
- Stable JSON serialization: Tool schemas and provider payloads use
canonicalize_json()for deterministic key ordering across calls. - Cache control markers: For Anthropic and OpenRouter (Claude models),
cache_control: {"type": "ephemeral"}is placed on the system prompt and the last message part to hint the provider's caching boundary. - Compaction preserves prefix: When context is compacted, the summary is prepended as a new first message while keeping the retained tail intact -- no existing message bytes are modified.
- Fork-context sub-agents: Sub-agents with
fork_context=Trueinherit the parent's full system prompt and tool list to maximize prefix cache sharing.
The TUI displays cache hit rate per turn in the metadata line (e.g. cache 12.5k (98%)). Rates below 90% are highlighted as a warning.
Context Management
The agent automatically manages context window limits:
- Auto-compaction: When the conversation approaches the model's context limit, older messages are summarized and replaced with a compact summary. The agent also recovers from context overflow errors by compacting and retrying.
- Rewind: The agent can roll back the conversation to a previous checkpoint (automatically inserted at key points). File system changes are preserved; only conversation history is rewound.
- Handoff: The agent can compress the current conversation into a summary and continue in a fresh context. Useful for very long sessions where context quality degrades.
Auto Memory
Klaude maintains persistent memory per project across sessions. Memory files are stored in ~/.klaude/projects/<project-key>/memory/ with a MEMORY.md index file. The agent automatically loads relevant memories at session start and can save new memories during a session.
Memory types include user preferences, feedback/corrections, project context, and external references.
Project Configuration Files
Klaude reads instruction files from your project directory to customize agent behavior:
| File | Purpose |
|---|---|
AGENTS.md |
Project-level instructions checked into version control (shared with team) |
CLAUDE.md |
Personal instructions (typically gitignored) |
These files are loaded automatically and injected into the system prompt. They can be placed at the project root or in subdirectories for scoped instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file klaude_code-2.30.0.tar.gz.
File metadata
- Download URL: klaude_code-2.30.0.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e56aca04d75129376ee3c214296ec3533c106a86f92ce913ca367622704032e
|
|
| MD5 |
67a79f9d266f9566b81fa688069942c3
|
|
| BLAKE2b-256 |
bd292954bfc9b2a52b96a216c99d3985b848b1a5e1c006a4eb6c537af8dd122f
|
File details
Details for the file klaude_code-2.30.0-py3-none-any.whl.
File metadata
- Download URL: klaude_code-2.30.0-py3-none-any.whl
- Upload date:
- Size: 4.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a3b9cbba9f6f3fb6ae2854af4fc5c7fe5e64520b1b4a8b4fba4b022247fd075
|
|
| MD5 |
0b980ced116154728b58a772dc9cc083
|
|
| BLAKE2b-256 |
c2ead97c9f9658eb4500ce31cc1fcbab0b35ecf5aca3c8195deb95c786cbe535
|