Lean, full-stack agentic AI framework with PTRL (Prompt-Time Reinforcement Learning)
Project description
๐ฆ ClawAgents
A lean, full-stack agentic AI framework โ ~2,500 LOC
ClawAgents is a production-ready agentic framework that gives LLMs the ability to read, write, and execute code โ with built-in planning, memory, sandboxing, and a gateway server. It supports OpenAI GPT-5, Google Gemini, and Anthropic Claude out of the box, with a pluggable provider architecture for any LLM.
Built by extracting and unifying the best architectural patterns from OpenClaw (~5,800 files) and DeepAgents (~1,400 LOC core), ClawAgents delivers the same power at a fraction of the complexity.
Installation
pip install clawagents # Core (OpenAI only)
pip install clawagents[gemini] # + Google Gemini support
pip install clawagents[anthropic] # + Anthropic Claude support
pip install clawagents[all] # All providers + tiktoken
Version 6.0.0 โ Latest stable release (April 2026)
30-Second Quick Start
The fastest way to get going โ scaffolds a .env, a run_agent.py starter script, and an AGENTS.md memory file:
pip install clawagents
cd ~/my-project # any project directory
clawagents --init # creates .env, run_agent.py, AGENTS.md
Then edit .env with your API key and run:
python run_agent.py
That's it. The generated run_agent.py includes commented-out examples for every provider (OpenAI, Gemini, Azure, Ollama, vLLM).
Where does .env go?
ClawAgents loads .env from the directory you run the command from (your current working directory). Different projects can have different configurations.
~/my-project/
โโโ .env โ ClawAgents reads this when you run from ~/my-project/
โโโ run_agent.py
โโโ AGENTS.md
โโโ src/
Four ways to configure (in priority order):
create_claw_agent()parameters โ highest priority, overrides everything- Shell environment variables โ
export OPENAI_API_KEY=sk-...in~/.zshrc(works globally) CLAWAGENTS_ENV_FILEโ set this env var to point to an explicit.envfile path (useful for CI/Docker/multi-project).envfile โ project-level config, loaded fromcwd/.envorcwd/../.env
A ready-to-use template is included in the repo:
cp .env.example .env # then fill in your API key
Or run clawagents --init to generate one interactively.
CLI One-Liner
clawagents --task "List all Python files and summarize the project"
Minimal Python Code
import asyncio
from clawagents import create_claw_agent
async def main():
agent = create_claw_agent("gpt-5-mini") # or "gemini-3-flash", "llama3.1", etc.
result = await agent.invoke("List all Python files in src/")
print(result.result)
asyncio.run(main())
Examples
See the examples/ directory for ready-to-run scripts:
| File | Provider |
|---|---|
01_openai.py |
OpenAI (GPT-5, GPT-4o) |
02_gemini.py |
Google Gemini |
03_azure.py |
Azure OpenAI |
04_local_ollama.py |
Ollama (local) |
05_local_vllm.py |
vLLM (local) |
06_bedrock.py |
AWS Bedrock (via gateway) |
07_with_custom_tools.py |
Custom tools |
08_compare_samples.py |
Multi-sample comparison |
Configuration
1. Configure your environment
Create a .env file (or run clawagents --init to generate one):
PROVIDER=gemini # or "openai"
GEMINI_API_KEY=AIza... # Your Gemini API key
GEMINI_MODEL=gemini-3-flash-preview
STREAMING=1
CONTEXT_WINDOW=1000000
MAX_TOKENS=8192
TEMPERATURE=0 # Model-specific overrides apply (see below)
# Optional: RL-inspired agent improvements
CLAW_TRAJECTORY=1 # Enable trajectory logging + scoring
CLAW_RETHINK=1 # Enable consecutive-failure detection
CLAW_LEARN=1 # Enable PTRL (lessons from past runs)
OpenAI configuration
PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-nano
STREAMING=1
CONTEXT_WINDOW=1000000
MAX_TOKENS=8192
TEMPERATURE=0 # 0 for deterministic output
CLAW_TRAJECTORY=1
CLAW_RETHINK=1
CLAW_LEARN=1
2. One-line agent
from clawagents import create_claw_agent
agent = create_claw_agent("gemini-3-flash")
result = await agent.invoke("List all Python files in src/")
print(result.result)
3. With custom instructions
agent = create_claw_agent(
"gpt-5",
instruction="You are a senior code reviewer. Be thorough and concise."
)
result = await agent.invoke("Review this codebase and suggest improvements")
4. With trajectory logging & rethink
agent = create_claw_agent(
"gpt-5-mini",
trajectory=True, # logs every turn + scores the run
rethink=True, # auto-injects "rethink" after 3 consecutive failures
)
result = await agent.invoke("Refactor the auth module and add tests")
# Run summary written to .clawagents/trajectories/runs.jsonl
5. With PTRL (Prompt-Time Reinforcement Learning)
agent = create_claw_agent(
"gpt-5-mini",
learn=True, # enables all 3 PTRL layers (implies trajectory=True)
rethink=True, # enhanced rethink uses past lessons
)
result = await agent.invoke("Build the data pipeline")
# After the run: lessons extracted and saved to .clawagents/lessons.md
# Next run: lessons injected into system prompt automatically
6. Multi-Sample Comparison (GRPO-inspired)
agent = create_claw_agent("gpt-5-mini", rethink=True)
# Run the task 3 times, pick the best based on objective scoring
result = await agent.compare("Fix the bug in app.py", n_samples=3)
print(result["best_result"]) # best answer
print(result["best_score"]) # objective score
print(result["all_scores"]) # all samples with scores
7. Azure OpenAI
agent = create_claw_agent(
"gpt-4o", # your Azure deployment name
api_key="your-azure-key",
base_url="https://myresource.openai.azure.com/",
api_version="2024-12-01-preview",
learn=True,
)
result = await agent.invoke("Analyze the codebase")
Or via .env:
PROVIDER=openai
OPENAI_API_KEY=your-azure-key
OPENAI_MODEL=gpt-4o
OPENAI_BASE_URL=https://myresource.openai.azure.com/
OPENAI_API_VERSION=2024-12-01-preview
8. AWS Bedrock (via OpenAI-compatible gateway)
Use Bedrock Access Gateway or LiteLLM proxy to expose Bedrock models as an OpenAI-compatible API:
agent = create_claw_agent(
"anthropic.claude-3-sonnet-20240229-v1:0",
base_url="http://localhost:8080/v1",
api_key="bedrock", # gateway handles AWS auth
)
Or via .env:
OPENAI_API_KEY=bedrock
OPENAI_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
OPENAI_BASE_URL=http://localhost:8080/v1
9. Local Models (Ollama / vLLM / LM Studio)
Any OpenAI-compatible local server works out of the box:
# Ollama (default port 11434)
agent = create_claw_agent("llama3.1", base_url="http://localhost:11434/v1")
# vLLM
agent = create_claw_agent("Qwen/Qwen3-8B", base_url="http://localhost:8000/v1")
# LM Studio
agent = create_claw_agent("local-model", base_url="http://localhost:1234/v1")
Or via .env:
# No API key needed for local models โ just omit OPENAI_API_KEY
OPENAI_MODEL=llama3.1
OPENAI_BASE_URL=http://localhost:11434/v1
Tip: For local models that emit
<think>...</think>tokens (Qwen3, DeepSeek), thinking content is automatically detected, stripped from output, and preserved in trajectory records (Feature H).
10. CLI
# Scaffold a project (generates .env, run_agent.py, AGENTS.md)
clawagents --init
# Check your configuration
clawagents --doctor
# Run a task directly
clawagents --task "Find all TODO comments in the codebase"
# Inspect past run trajectories
clawagents --trajectory # last run
clawagents --trajectory 5 # last 5 runs
# Start the gateway server
clawagents --serve --port 3000
# Show all options
clawagents --help
Typical First-Time Flow
pip install clawagents # 1. Install
clawagents --init # 2. Scaffold .env, run_agent.py, AGENTS.md
# edit .env with your API key # 3. Configure
clawagents --doctor # 4. Verify setup
clawagents --task "hello world" # 5. Run your first task
python run_agent.py # 6. Or use the generated script
CLI Reference
| Command | Description |
|---|---|
clawagents --init |
Scaffold a starter project: .env (config template), run_agent.py (starter script with 5 provider options), AGENTS.md (memory file). Skips existing files. |
clawagents --doctor |
Check configuration health: .env discovery, API keys, active model, LLM settings, PTRL flags, local endpoint reachability, trajectory history, AGENTS.md presence. |
clawagents --task "..." |
Run a single task. Prints a startup banner (provider=X model=Y env=Z ptrl=...), executes the agent, prints the result to stdout. |
clawagents --trajectory [N] |
Inspect the last N run summaries (default: 1). Shows run ID, model, task, duration, turns, tool calls, score, quality, failure breakdown, verified score, and judge verdict. Requires CLAW_TRAJECTORY=1. |
clawagents --serve [--port N] |
Start the HTTP gateway server (default port 3000). Endpoints: POST /chat, POST /chat/stream (SSE), GET /queue, GET /health. |
clawagents --sessions |
List saved sessions (requires CLAW_FEATURE_SESSION_PERSISTENCE=1). Shows session ID, turn count, status, and task. |
clawagents --resume [ID|latest] |
Resume a saved session. Loads messages from JSONL and continues the conversation. Defaults to latest. |
clawagents --help |
Show all options with examples. |
๐ Performance: ClawAgents vs Traditional Frameworks
ClawAgents v5.10 outperforms traditional multi-layer agentic frameworks through architectural simplicity. Here's how it stacks up against DeepAgents (LangGraph/LangChain-based) in head-to-head benchmarks.
Benchmark Results (February 2026)
TypeScript โ 5 tasks ร 2 models ร 2 frameworks (20/20 โ )
| Framework | Gemini-2.5-flash | GPT-5-mini |
|---|---|---|
| ClawAgents v5.5 | 2.3s avg ยท 1.4 tools | 13.6s avg ยท 1.4 tools |
| DeepAgents | 2.5s avg ยท 1.8 tools | 15.7s avg ยท 2.4 tools |
Per-Task Breakdown
| Task | ClawAgents (Gemini) | DeepAgents (Gemini) | ClawAgents (GPT-5) | DeepAgents (GPT-5) |
|---|---|---|---|---|
| File Listing | 3.7s, 1 tool | 1.9s, 1 tool | 8.9s, 1 tool | 8.4s, 1 tool |
| Read & Analyze | 1.6s, 1 tool | 3.6s, 3 tools | 5.4s, 1 tool | 13.0s, 2 tools |
| Write File | 2.1s, 2 tools | 2.6s, 2 tools | 5.2s, 2 tools | 7.5s, 2 tools |
| Multi-Step | 3.4s, 3 tools | 3.7s, 3 tools | 46.2s, 3 tools | 46.9s, 7 tools |
| Reasoning | 0.7s, 0 tools | 0.9s, 0 tools | 2.3s, 0 tools | 2.8s, 0 tools |
Python โ 18/20 completed (DeepAgents hung on GPT-5 multi_step)
| Task | ClawAgents (Gemini) | DeepAgents (Gemini) | ClawAgents (GPT-5) | DeepAgents (GPT-5) |
|---|---|---|---|---|
| File Listing | 2.8s, 1 tool | 1.0s, 0 tools* | 9.9s, 1 tool | 3.4s, 1 tool |
| Read & Analyze | 2.0s, 1 tool | 9.8s, 4 tools | 5.5s, 1 tool | 8.4s, 3 tools |
| Write File | 2.0s, 2 tools | 1.0s, 0 tools* | 5.0s, 2 tools | 9.3s, 3 tools |
| Multi-Step | 4.1s, 3 tools | 0.9s, 0 tools* | 16.0s, 3 tools | โ hung >5min |
| Reasoning | 0.7s, 0 tools | 1.0s, 0 tools | โ | โ |
* DeepAgents 0-tool results mean the model answered without using filesystem tools โ faster but lower-quality (unverified answers). ClawAgents consistently uses tools to verify answers.
Why ClawAgents Wins
Traditional Stack (DeepAgents): ClawAgents:
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Your Code โ โ Your Code โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโค
โ LangGraph โ โ ClawAgents โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โ (direct SDK) โ
โ LangChain โ โโโโโโโโโโฌโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ ChatOpenAI / ChatGeminiโ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโ
โ Responses API โ โ Responses API โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
4 layers 1 layer
| Advantage | Impact |
|---|---|
| Direct SDK calls (1 layer vs 4) | Lower latency, fewer failure points |
| Working directory awareness | Tools operate from CWD; DeepAgents has no CWD concept |
| Soft + hard loop detection | Catches repetitive tool calls at 3 repeats, hard-stops at 6 |
| Efficiency rules in system prompt | ~30% reduction in redundant tool calls |
| Fewer tool calls overall | 1.4 avg vs 1.8โ2.4 (20โ40% more efficient) |
| No OpenAI lock-in | Native Gemini + OpenAI support with FallbackProvider chain |
Feature Matrix
| Feature | ClawAgents v5.23 | DeepAgents | OpenClaw |
|---|---|---|---|
| ReAct loop | โ | โ | โ |
| Tool loop detection | โ soft + hard | โ | โ |
| Efficiency rules (system prompt) | โ | โ | โ |
| Adaptive token estimation | โ | โ | โ |
| Model-aware context budgeting | โ | โ | โ |
| Pluggable sandbox backend | โ | โ | โ |
| In-memory VFS (testing) | โ | โ | โ |
| Sub-agent delegation | โ | โ | โ |
| Planning / TodoList | โ | โ | โ |
| Persistent memory (AGENTS.md) | โ | โ | โ |
| Human-in-the-loop | โ | โ | โ |
| Dangling tool call repair | โ | โ | โ |
| Auto-summarization + offloading | โ | โ | โ |
| Lane-based command queue | โ | โ | โ |
| Gateway HTTP server + SSE | โ | โ | โ |
| Tool access control | โ | โ | โ |
think tool (structured reasoning) |
โ | โ | โ |
| LangChain tool adapter | โ | N/A | โ |
| Streaming with stall detection | โ | โ | โ |
| Trajectory logging + run scoring | โ | โ | โ |
| Consecutive-failure rethink | โ | โ | โ |
| Discrete reward bands (RL-inspired) | โ | โ | โ |
| Weighted execution scoring | โ | โ | โ |
| Truncated JSON repair + retry | โ | โ | โ |
| Model-specific temperature override | โ | โ | โ |
| Gemini 3 thought_signature support | โ | โ | โ |
| Prompt-Time RL (PTRL) โ learn from past runs | โ | โ | โ |
| Deterministic verification (exit codes, tests) | โ | โ | โ |
| GRPO-inspired multi-sample comparison | โ | โ | โ |
| Task-type-aware verification | โ | โ | โ |
| RFT-ready transition export | โ | โ | โ |
| Adaptive rethink threshold | โ | โ | โ |
| LLM-as-Judge verification | โ | โ | โ |
Thinking token preservation (<think>) |
โ | โ | โ |
| Tool result caching (LRU) | โ | โ | โ |
| JSON Schema param validation + coercion | โ | โ | โ |
| ComposeTool (deterministic pipelines) | โ | โ | โ |
| WebSocket gateway | โ | โ | โ |
| Multi-channel messaging (Telegram, WhatsApp, Signal) | โ | โ | โ |
| Per-session message serialization | โ | โ | โ |
Architecture
Core Components (~2,500 LOC)
clawagents/
โโโ agent.py # ClawAgent class โ ReAct loop, hooks, compaction
โโโ __main__.py # CLI entrypoint
โโโ config/ # Env-based configuration (incl. TEMPERATURE, CLAW_*)
โโโ providers/ # LLM backends (OpenAI, Gemini, Fallback)
โ โโโ llm.py # max_completion_tokens, temperature override, JSON repair
โโโ tools/ # 14+ built-in tools
โ โโโ filesystem.py # ls, read_file, write_file, edit_file
โ โโโ advanced_fs.py # tree, diff, insert_lines
โ โโโ search.py # grep, glob
โ โโโ execute.py # Shell command execution
โ โโโ planning.py # write_todos, update_todo
โ โโโ delegation.py # Sub-agent task delegation
โ โโโ think.py # Structured reasoning (no side effects)
โ โโโ web.py # URL fetching with HTML cleanup
โ โโโ interactive.py # ask_user (stdin-based)
โโโ sandbox/ # Pluggable backend protocol
โ โโโ protocol.py # SandboxBackend interface (15+ methods)
โ โโโ local.py # LocalBackend (pathlib + asyncio)
โ โโโ in_memory.py # InMemoryBackend (VFS for testing)
โโโ trajectory/ # RL-inspired run analysis (v5.9+)
โ โโโ recorder.py # TrajectoryRecorder, discrete scoring, quality grading
โโโ gateway/ # Production HTTP server
โ โโโ server.py # FastAPI + SSE streaming
โ โโโ queue.py # 4-lane FIFO command queue
โโโ graph/ # Agent loop orchestration + failure tracking
โโโ memory/ # AGENTS.md discovery + compaction
โโโ process/ # Process management
โโโ logging/ # Structured logging
Built-in Tools
Every agent includes these โ no setup needed:
| Tool | Description |
|---|---|
ls |
List directory with size + modified time |
read_file |
Read file with line numbers + pagination |
write_file |
Write/create file (auto-creates directories) |
edit_file |
Replace text with pattern matching |
grep |
Search โ single file or recursive with glob filter |
glob |
Find files by pattern (**/*.py) |
execute |
Shell command execution |
tree |
Recursive directory tree with smart ignoring |
diff |
Unified diff between two files |
insert_lines |
Precise line-level insertion |
think |
Structured reasoning without side effects |
web_fetch |
URL fetching with HTML stripping (50KB cap) |
write_todos |
Plan tasks as a checklist |
update_todo |
Mark plan items complete |
task |
Delegate to a sub-agent with isolated context |
ask_user |
Interactive stdin-based user input |
use_skill |
Load a skill's instructions (when skills exist) |
Tool Examples
๐ Filesystem โ ls, read_file, write_file, edit_file
The agent calls tools by emitting JSON blocks. Here's what happens under the hood when you ask the agent to work with files:
# The agent autonomously emits tool calls like:
# List a directory
{"tool": "ls", "args": {"path": "src/"}}
# โ Returns: drwxr-xr-x 4.0 KB 2026-02-24 components/
# -rw-r--r-- 1.2 KB 2026-02-24 main.py
# Read a file with pagination
{"tool": "read_file", "args": {"path": "src/main.py", "offset": 0, "limit": 50}}
# โ Returns: 1 | import asyncio
# 2 | from clawagents import create_claw_agent
# ...
# Write a new file (parent directories auto-created)
{"tool": "write_file", "args": {"path": "src/utils/helpers.py", "content": "def greet(name):\n return f'Hello, {name}!'"}}
# โ Returns: โ
Wrote 45 bytes to src/utils/helpers.py
# Edit an existing file by pattern match
{"tool": "edit_file", "args": {
"path": "src/main.py",
"old": "print('hello')",
"new": "print('Hello, World!')"
}}
# โ Returns: โ
1 replacement made in src/main.py
๐ Search โ grep, glob
# Recursive grep across all Python files
{"tool": "grep", "args": {"pattern": "TODO", "path": "src/", "include": "*.py"}}
# โ Returns: src/agent.py:42: # TODO: add retry logic
# src/tools/web.py:15: # TODO: handle redirects
# Single-file search
{"tool": "grep", "args": {"pattern": "class.*Tool", "path": "src/tools/registry.py"}}
# โ Returns: 15: class ToolResult:
# 24: class Tool(Protocol):
# Find files by pattern
{"tool": "glob", "args": {"pattern": "**/*.md", "path": "."}}
# โ Returns: ./README.md (15.3 KB)
# ./docs/ARCHITECTURE.md (4.1 KB)
# ./AGENTS.md (892 B)
โก Shell Execution
# Run any shell command
{"tool": "execute", "args": {"command": "python -m pytest tests/ -v"}}
# โ Returns full stdout/stderr with exit code
# With custom timeout (in milliseconds)
{"tool": "execute", "args": {"command": "pip install requests", "timeout": 60000}}
# Dangerous commands are auto-blocked
{"tool": "execute", "args": {"command": "rm -rf /"}}
# โ Error: Blocked potentially destructive command
๐ง Think โ structured reasoning
# The agent can reason without side effects
{"tool": "think", "args": {
"thought": "The user wants me to refactor the database layer. Let me plan: 1) Read the current schema, 2) Identify coupled components, 3) Extract a repository pattern, 4) Update tests."
}}
# โ [Thought recorded] โ no files touched, no commands run
This reduces unnecessary tool calls by giving the agent a structured space to plan.
๐ Planning โ write_todos, update_todo
# Create a structured plan
{"tool": "write_todos", "args": {
"todos": ["Read the existing codebase", "Fix the auth bug", "Add unit tests", "Update docs"]
}}
# โ ## Progress: 0/4 complete
# 0. [ ] Read the existing codebase
# 1. [ ] Fix the auth bug
# 2. [ ] Add unit tests
# 3. [ ] Update docs
# Mark steps complete as you go
{"tool": "update_todo", "args": {"index": 0}}
# โ ## Progress: 1/4 complete
# 0. [x] Read the existing codebase
# 1. [ ] Fix the auth bug
# ...
๐ค Sub-agent delegation
# Delegate to a fresh sub-agent with isolated context
{"tool": "task", "args": {
"description": "Analyze all Python files in src/ and create a summary of the module structure",
"max_iterations": 10
}}
# โ [Sub-agent completed: 6 tool calls, 4 iterations]
# The src/ directory contains 3 modules: ...
# With named specialized sub-agents (configured at creation)
{"tool": "task", "args": {
"description": "Review this pull request for security issues",
"agent": "security-reviewer"
}}
Registering named sub-agents:
from clawagents import create_claw_agent
from clawagents.tools.subagent import SubAgentSpec
agent = create_claw_agent(
"gemini-3-flash",
subagents=[
SubAgentSpec(
name="researcher",
description="Deep research on a topic",
system_prompt="You are a thorough researcher. Always cite sources.",
max_iterations=15,
),
SubAgentSpec(
name="coder",
description="Write and test code",
system_prompt="You are a senior engineer. Write clean, tested code.",
max_iterations=10,
),
],
)
๐ Web Fetch
# Fetch and read a web page (HTML stripped automatically)
{"tool": "web_fetch", "args": {"url": "https://docs.python.org/3/library/asyncio.html"}}
# โ [200] https://docs.python.org/3/library/asyncio.html
# asyncio โ Asynchronous I/O ...
# Fetch a JSON API
{"tool": "web_fetch", "args": {"url": "https://api.github.com/repos/python/cpython", "timeout": 10}}
# โ Returns raw JSON response
Custom Tools
Create your own tools by implementing the Tool protocol:
from clawagents import create_claw_agent
from clawagents.tools.registry import Tool, ToolResult
class DatabaseQueryTool:
name = "query_db"
description = "Run a read-only SQL query against the application database."
parameters = {
"sql": {"type": "string", "description": "The SQL SELECT query", "required": True},
"limit": {"type": "number", "description": "Max rows to return. Default: 100"},
}
async def execute(self, args):
sql = args.get("sql", "")
limit = int(args.get("limit", 100))
# ... your database logic here ...
rows = await run_query(sql, limit=limit)
return ToolResult(success=True, output=format_table(rows))
# Register custom tools alongside built-ins
agent = create_claw_agent("gpt-5", tools=[DatabaseQueryTool()])
You can also wrap LangChain tools directly:
from langchain_community.tools import WikipediaQueryRun
agent = create_claw_agent("gpt-5", tools=[WikipediaQueryRun()])
# LangChain tools are automatically adapted via LangChainToolAdapter
Skills System
Skills are reusable instruction sets that teach the agent domain-specific knowledge โ without polluting the system prompt. They use a progressive disclosure pattern: the agent loads skill instructions on demand via the use_skill tool.
Skill Directory Structure
your-project/
โโโ skills/ # Auto-discovered (or .skills/, skill/, .skill/, Skills/)
โ โโโ code_review/
โ โ โโโ SKILL.md # โ Skill defined as a folder + SKILL.md
โ โโโ sql_expert.md # โ Skill defined as a single .md file
โ โโโ deploy_checklist.md
โโโ AGENTS.md # Project memory (auto-injected)
โโโ src/
โโโ ...
Writing a Skill
Every skill is a Markdown file with optional YAML frontmatter:
Example 1 โ skills/code_review/SKILL.md
---
name: code_review
description: "Perform thorough code reviews following team standards"
allowed-tools: read_file grep glob think
---
# Code Review Skill
When reviewing code, follow these steps:
## 1. Structure Check
- Verify the file follows our module pattern (one class per file)
- Check imports are grouped: stdlib โ third-party โ local
- Ensure `__init__.py` exports are up to date
## 2. Logic Review
- Look for unhandled edge cases (empty inputs, None values)
- Verify error messages are actionable
- Check that async functions are properly awaited
## 3. Security
- No hardcoded secrets or API keys
- SQL queries use parameterized statements
- User input is sanitized before use
## 4. Output Format
Provide your review as:
- โ
**Approved** โ no issues found
- โ ๏ธ **Changes requested** โ list specific issues with file:line references
- ๐ซ **Blocked** โ critical issues that must be fixed
Example 2 โ skills/sql_expert.md (single-file skill)
---
name: sql_expert
description: "Write optimized SQL queries for PostgreSQL"
allowed-tools: execute read_file think
---
# SQL Expert
You are a PostgreSQL expert. When writing queries:
## Rules
1. Always use explicit `JOIN` syntax (never implicit joins in WHERE)
2. Use CTEs (`WITH` clauses) for complex multi-step queries
3. Add `EXPLAIN ANALYZE` when the user asks about performance
4. Use parameterized queries โ never interpolate user values
5. Default to `LIMIT 100` unless the user specifies otherwise
## Patterns
### Pagination
Use keyset pagination for large tables:
```sql
SELECT * FROM events
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;
Aggregation
Always include the raw count alongside percentages:
SELECT
status,
COUNT(*) AS n,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM orders
GROUP BY status
ORDER BY n DESC;
**Example 3 โ `skills/deploy_checklist.md`**
```markdown
---
name: deploy_checklist
description: "Step-by-step production deployment checklist"
---
# Deployment Checklist
Before deploying to production, complete every step:
- [ ] All tests pass: `pytest tests/ -v`
- [ ] No lint errors: `ruff check src/`
- [ ] Version bumped in `pyproject.toml`
- [ ] CHANGELOG.md updated
- [ ] Docker image builds: `docker build -t app:latest .`
- [ ] Smoke test on staging environment
- [ ] Database migrations reviewed and tested
- [ ] Rollback plan documented
How Skills Work at Runtime
# Skills are auto-discovered from ./skills/ directory
agent = create_claw_agent("gemini-3-flash")
# Or specify custom skill directories
agent = create_claw_agent("gpt-5", skills=["./my-skills", "./shared-skills"])
When skills are available, the agent gets two additional tools:
# 1. List available skills
{"tool": "list_skills", "args": {}}
# โ Available skills (3):
# - **code_review**: Perform thorough code reviews following team standards
# โ Allowed tools: read_file, grep, glob, think
# - **sql_expert**: Write optimized SQL queries for PostgreSQL
# โ Allowed tools: execute, read_file, think
# - **deploy_checklist**: Step-by-step production deployment checklist
# 2. Load a specific skill's instructions
{"tool": "use_skill", "args": {"name": "sql_expert"}}
# โ Returns the full skill content, injected into the agent's context
The agent decides on its own when to use a skill. If you ask it to "write a query to find all overdue orders," and a sql_expert skill exists, it will load the skill first, then write the query following those rules.
API Reference
create_claw_agent(model, instruction, ...)
All parameters are optional โ zero-config usage (create_claw_agent()) works if you have a .env with at least one API key.
Model & Provider
| Param | Type | Default | Required? | Description |
|---|---|---|---|---|
model |
str | LLMProvider | None |
None |
No | Model name (e.g. "gpt-5-mini", "gemini-3-flash", "llama3.1"), a pre-built LLMProvider instance, or None to auto-detect from env |
api_key |
str | None |
None |
No | API key. Auto-routed to OpenAI or Gemini based on model name. Falls back to OPENAI_API_KEY / GEMINI_API_KEY env vars. For local models: omit entirely (a placeholder is used automatically) |
base_url |
str | None |
None |
No | Custom endpoint URL for OpenAI-compatible APIs. Set this for Azure OpenAI, AWS Bedrock (via gateway), Ollama, vLLM, LM Studio, or any OpenAI-compatible server. Falls back to OPENAI_BASE_URL env var. Omit to use api.openai.com |
api_version |
str | None |
None |
No | API version string. Only needed for Azure OpenAI (e.g. "2024-12-01-preview"). Falls back to OPENAI_API_VERSION env var. Ignored for all other providers |
Agent Behavior
| Param | Type | Default | Required? | Description |
|---|---|---|---|---|
instruction |
str | None |
None |
No | System prompt โ what the agent should do and how it should behave |
tools |
list | None |
None |
No | Additional tools to register. Built-in tools (filesystem, exec, grep, etc.) are always included |
skills |
str | list | None |
auto-discover | No | Skill directories to load. Default: checks ./skills, ./.skills. Bundled skills (ByteRover, OpenViking) are always included when eligible. |
memory |
str | list | None |
auto-discover | No | Memory files to inject into system prompt. Default: checks ./AGENTS.md, ./CLAWAGENTS.md |
sandbox |
SandboxBackend |
LocalBackend() |
No | Pluggable sandbox backend for file/shell operations. Use InMemoryBackend for testing |
streaming |
bool |
True |
No | Enable streaming responses |
use_native_tools |
bool |
True |
No | Use provider native function calling. Set False for text-based JSON tool calls |
on_event |
callable | None |
None |
No | Callback for agent events (tool calls, errors, context messages, etc.) |
LLM Tuning
| Param | Type | Default | Required? | Description |
|---|---|---|---|---|
context_window |
int | None |
env CONTEXT_WINDOW / 1000000 |
No | Token budget. When messages exceed this, older turns are compacted |
max_tokens |
int | None |
env MAX_TOKENS / 8192 |
No | Max output tokens per LLM response. Sent as max_completion_tokens (OpenAI) or max_output_tokens (Gemini) |
temperature |
float | None |
env TEMPERATURE / 0.0 |
No | LLM sampling temperature. Automatically overridden for reasoning models (o1/o3/o4-mini, gpt-5/gpt-5-mini/gpt-5-turbo โ 1.0). Non-reasoning models (gpt-5-nano, gpt-5-micro, gpt-4o) respect the configured value |
max_iterations |
int | None |
env MAX_ITERATIONS / 200 |
No | Max tool rounds before the agent stops and returns |
PTRL & Trajectory
| Param | Type | Default | Required? | Description |
|---|---|---|---|---|
trajectory |
bool | None |
env CLAW_TRAJECTORY / False |
No | Enable trajectory logging. Records every turn as NDJSON to .clawagents/trajectories/ and scores each run |
rethink |
bool | None |
env CLAW_RETHINK / False |
No | Enable consecutive-failure detection. Injects a "rethink" prompt with adaptive threshold after repeated tool failures |
learn |
bool | None |
env CLAW_LEARN / False |
No | Enable Prompt-Time Reinforcement Learning. Includes: post-run self-analysis, pre-run lesson injection, LLM-as-Judge verification (Feature G), and thinking token preservation (Feature H). Implies trajectory=True |
preview_chars |
int | None |
env CLAW_PREVIEW_CHARS / 120 |
No | Max chars for tool-output previews in trajectory logs |
response_chars |
int | None |
env CLAW_RESPONSE_CHARS / 500 |
No | Max chars for LLM response text in trajectory records |
Priority: Explicit parameter > environment variable > default value. You never need to set both.
Hooks & Access Control
agent = create_claw_agent("gemini-3-flash", instruction="Code reviewer")
# Block dangerous tools at runtime
agent.block_tools("execute", "write_file")
# Or whitelist only safe tools
agent.allow_only_tools("read_file", "ls", "grep", "glob")
# Inject context into every LLM call
agent.inject_context("Always respond in Spanish")
# Limit tool output size
agent.truncate_output(3000)
Advanced โ raw hooks:
agent.before_llm = lambda messages: messages # modify messages before LLM
agent.before_tool = lambda name, args: True # return False to block
agent.after_tool = lambda name, args, result: result # modify tool results
Instance Methods
| Method | Description |
|---|---|
await agent.invoke(task, max_iterations=None) |
Run the agent on a task. Returns AgentState with .result, .status, .iterations, .tool_calls |
await agent.compare(task, n_samples=3, max_iterations=None) |
Run the task N times and return the best result based on objective scoring (GRPO-inspired). Returns {"best_result", "best_score", "best_index", "all_scores"} |
agent.block_tools(*names) |
Block specific tools at runtime |
agent.allow_only_tools(*names) |
Whitelist-only mode โ all other tools blocked |
agent.inject_context(text) |
Inject extra context into every LLM call |
agent.truncate_output(max_chars) |
Limit tool output size |
Auto-Discovery
The agent factory automatically discovers project files:
| What | Default locations checked |
|---|---|
| Memory | ./AGENTS.md, ./CLAWAGENTS.md |
| Skills | ./skills, ./.skills, ./skill, ./.skill, ./Skills. Bundled skills are auto-included based on eligibility (see below). |
Bundled Skills
ClawAgents ships with two complementary bundled skills that work together:
| Skill | Purpose | Prerequisite | Auto-enabled? |
|---|---|---|---|
| ByteRover | Write decisions, patterns, and rules to local Markdown files | Node/npx (brv runs via npx byterover-cli) |
Always |
| OpenViking | Read context from repos, docs, and large knowledge bases with tiered L0/L1/L2 loading | pip install openviking + running openviking-server |
Only when ov CLI is on PATH |
How they complement each other:
- ByteRover is a fast, serverless notebook for the agent. Use
brv curateto persist decisions ("We chose Postgres for ACID compliance") andbrv queryto recall them. No infrastructure needed โ context is stored as Markdown in.brv/context-tree/. - OpenViking is a structured context database. Use
ov add-resourceto ingest entire repos or doc sites, thenov findfor semantic search across all indexed content. Results are organized in a virtual filesystem (viking://) with three tiers: L0 (abstract, ~100 tokens), L1 (overview, ~2k tokens), L2 (full content) โ the agent loads only what it needs, saving tokens.
Typical workflow: OpenViking retrieves context โ agent works on the task โ ByteRover curates the decisions made.
OpenViking prerequisites:
- Install:
pip install openviking --upgrade - Configure: create
~/.openviking/ov.confwith embedding model and VLM settings (see OpenViking docs) - Start server:
openviking-server - The
ovCLI must be on your PATH โ the skill auto-enables when detected |
Override with explicit paths:
agent = create_claw_agent(
"gpt-5",
memory="./docs/AGENTS.md",
skills=["./my-skills", "./shared-skills"]
)
Memory & Context Management
Project Memory
Loads AGENTS.md files and injects content into every LLM call. Use for project-level context and conventions.
Auto-Compaction
When the conversation exceeds 75% of CONTEXT_WINDOW:
- Full history offloaded to
.clawagents/history/compacted_*.json - Older messages summarized into
[Compacted History] - Last 20 messages kept intact
This provides unlimited conversation length with full audit trail preservation.
Gateway Server
Launch a production-ready HTTP server with one line:
from clawagents.gateway import start_gateway
start_gateway(port=3000)
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Synchronous agent invocation |
/chat/stream |
POST | SSE streaming (events: queued, started, agent, done, error) |
/queue |
GET | Queue status for all lanes |
/health |
GET | Health check |
Lane-Based Concurrency
4 lanes with configurable max_concurrent per lane:
mainโ primary user requestscronโ scheduled taskssubagentโ sub-agent delegationnestedโ nested sub-agent calls
Sandbox Backends
ClawAgents uses a pluggable sandbox protocol for all file and shell operations:
from clawagents.sandbox import InMemoryBackend, LocalBackend
# Production: real filesystem
agent = create_claw_agent("gpt-5", sandbox=LocalBackend())
# Testing: pure in-memory VFS
mem = InMemoryBackend()
mem.seed({"src/main.py": "print('hello')", "README.md": "# My Project"})
agent = create_claw_agent("gpt-5", sandbox=mem)
snapshot = mem.snapshot() # deterministic state capture
Environment Variables
All environment variables are optional. They serve as defaults when the corresponding create_claw_agent() parameter is not provided. Explicit parameters always take priority.
General
| Variable | Default | Required? | Description |
|---|---|---|---|
CLAWAGENTS_ENV_FILE |
(unset) | No | Explicit path to a .env file. Overrides default cwd/.env discovery. Useful for CI, Docker, or multi-project setups |
Provider & Model โ set at least one API key (or OPENAI_BASE_URL for local models)
| Variable | Default | Required? | Description |
|---|---|---|---|
PROVIDER |
auto-detect | No | Hint: "openai" or "gemini". Auto-detected from which API key is set |
OPENAI_API_KEY |
โ | Yes (for OpenAI/Azure) | OpenAI or Azure API key. Not needed for local models โ when OPENAI_BASE_URL is set, a placeholder is used automatically |
OPENAI_MODEL |
gpt-5-nano |
No | Model name, Azure deployment name, or local model ID (e.g. llama3.1) |
OPENAI_BASE_URL |
(unset) | No | Custom endpoint for OpenAI-compatible APIs: Azure, Bedrock gateway, Ollama, vLLM, LM Studio. Omit to use api.openai.com |
OPENAI_API_VERSION |
(unset) | No | Azure only. API version string (e.g. 2024-12-01-preview). Ignored by all other providers |
GEMINI_API_KEY |
โ | Yes (for Gemini) | Google Gemini API key |
GEMINI_MODEL |
gemini-3-flash-preview |
No | Gemini model name |
LLM Tuning
| Variable | Default | Required? | Description |
|---|---|---|---|
STREAMING |
1 |
No | 1 = streaming enabled, 0 = disabled |
CONTEXT_WINDOW |
1000000 |
No | Token budget. Older messages are compacted when exceeded |
MAX_TOKENS |
8192 |
No | Max output tokens per response (max_completion_tokens for OpenAI, max_output_tokens for Gemini) |
TEMPERATURE |
0.0 |
No | Sampling temperature. Auto-overridden for reasoning models (o-series + gpt-5/gpt-5-mini/gpt-5-turbo โ 1.0). Non-reasoning models (gpt-5-nano, gpt-5-micro, gpt-4o) use the configured value |
MAX_ITERATIONS |
200 |
No | Max tool rounds before the agent stops. Override per-run: agent.invoke(task, max_iterations=N) |
PTRL & Trajectory Flags โ all off by default, opt-in with 1/true/yes
| Variable | Default | Required? | Description |
|---|---|---|---|
CLAW_TRAJECTORY |
0 |
No | Enable trajectory logging. Records every turn + scores each run to .clawagents/trajectories/ |
CLAW_RETHINK |
0 |
No | Enable consecutive-failure detection + adaptive rethink injection |
CLAW_LEARN |
0 |
No | Enable full PTRL: lesson extraction, injection, LLM-as-Judge, and thinking token preservation. Implies CLAW_TRAJECTORY=1 |
CLAW_PREVIEW_CHARS |
120 |
No | Max chars for tool-output previews in trajectory logs |
CLAW_RESPONSE_CHARS |
500 |
No | Max chars for LLM response text in trajectory records |
Claude Code Features โ mostly off by default, opt-in with 1/true/yes
| Variable | Default | Required? | Description |
|---|---|---|---|
CLAW_FEATURE_MICRO_COMPACT |
1 |
No | Aggressively clear old tool result contents to save context |
CLAW_FEATURE_FILE_SNAPSHOTS |
1 |
No | Safely copy files to .clawagents/snapshots/ before writing |
CLAW_FEATURE_CACHE_TRACKING |
0 |
No | Extract and log detailed Anthropic/OpenAI prompt cache stats |
CLAW_FEATURE_TYPED_MEMORY |
0 |
No | Parse YAML frontmatter in AGENTS.md to classify memory types |
CLAW_FEATURE_WAL |
0 |
No | Persistent Write-Ahead Logging to .clawagents/wal.jsonl (crash recovery) |
CLAW_FEATURE_PERMISSION_RULES |
0 |
No | Enforce declarative glob-based Allow/Deny execution bounds |
CLAW_FEATURE_BACKGROUND_MEMORY |
0 |
No | Background thread extracting agent state/metadata implicitly |
CLAW_FEATURE_FORKED_AGENTS |
0 |
No | Enable the run_forked_agent sandboxed sub-agent API |
CLAW_FEATURE_COORDINATOR |
0 |
No | Enable the run_coordinator swarm routing orchestration mode |
v5.28.0 Features โ inspired by claw-code-main (Rust reference)
| Variable | Default | Required? | Description |
|---|---|---|---|
CLAW_FEATURE_CACHE_BOUNDARY |
1 |
No | Split system prompt at __CACHE_BOUNDARY__ for Anthropic prompt caching. Static prefix cached, dynamic suffix fresh each turn. |
CLAW_FEATURE_SESSION_PERSISTENCE |
0 |
No | Save sessions as append-only JSONL to .clawagents/sessions/. Enables --sessions and --resume. |
CLAW_FEATURE_ERROR_TAXONOMY |
1 |
No | Classify LLM/tool errors into 7 discrete classes (context_window, provider_auth, provider_rate_limit, etc.) with recovery hints. |
CLAW_FEATURE_EXTERNAL_HOOKS |
0 |
No | Run shell hooks before/after tool calls and LLM calls. Config via .clawagents/hooks.json or CLAW_HOOK_* env vars. |
External Hook Env Vars (requires CLAW_FEATURE_EXTERNAL_HOOKS=1)
| Variable | Description |
|---|---|
CLAW_HOOK_PRE_TOOL_USE |
Shell command run before each tool. Receives JSON on stdin, can block or modify args. |
CLAW_HOOK_POST_TOOL_USE |
Shell command run after each tool. Can modify results. |
CLAW_HOOK_PRE_LLM |
Shell command run before each LLM call. Can inject extra messages. |
CLAW_HOOK_POST_LLM |
Shell command run after each LLM response. Fire-and-forget logging. |
Testing
# Install with dev dependencies
pip install -e ".[dev]"
# Run all tests
python -m pytest tests/ -v
# Run benchmarks (requires API keys)
python -m pytest tests/ -v -m benchmark
Changelog
v6.0.0 โ Production Hardening: 17 Improvements from 10 Reference Codebases
Major release incorporating patterns from OpenClaw, DeepAgents, NanoClaw, Claw-Code, ToolUniverse, SkyRL, CUDA-Agent, and OpenClaw-RL.
High Priority
| Feature | Description |
|---|---|
| Native Tool Call Patching (H1) | _patch_dangling_tool_calls now handles native function calling (tool_calls_meta), not just text-mode JSON. Injects synthetic cancelled responses for orphaned tool_call IDs. Prevents 400 API errors in HITL scenarios. |
| Three-Tier Provider Fallback (H2) | New FallbackProvider wraps any LLM with primary โ named fallback โ global fallback chain. Quarantines providers after consecutive failures, periodic health-check restores. Config via fallback_models param or CLAWAGENTS_FALLBACK_MODELS env var. |
| Credential Proxy (H3) | New CredentialProxy โ local HTTP proxy that injects API keys into outbound requests so sandboxed sub-agents never see raw credentials. Opt-in via CLAW_FEATURE_CREDENTIAL_PROXY=1. |
| Rich Hook Result Model (H4) | BeforeToolHook now accepts HookResult return (backward-compatible with bool). Hooks can block with reason, redirect args, inject messages. New HookResult dataclass exported from public API. |
| Fraction-Based Summarization (H5) | Soft-trim threshold now derives from per-model budget_ratio instead of hardcoded 0.60. GPT=0.60, Gemini=0.675, Claude=0.6375. Auto-adapts to any model's context window. |
| Lazy Static Tool Registry (H7) | New LazyTool class + ToolRegistry.register_lazy(). Tools are imported only on first execute() call. Fast startup with large tool sets. |
Medium Priority
| Feature | Description |
|---|---|
| Subagent State Isolation (M1) | EXCLUDED_STATE_KEYS prevents parent state (messages, todos, trajectory, lessons, session) from leaking into child sub-agents. |
| SKILL.md Constraint Documents (M4) | Skills now support forbidden-actions, workspace-layout, success-criteria, workflow-steps in YAML frontmatter. Structured constraints for sandboxed code execution. |
| Pre-Compact Transcript Archival (M5) | Before context compaction, full transcript is archived to .clawagents/transcripts/. Opt-in via CLAW_FEATURE_TRANSCRIPT_ARCHIVAL=1. |
| Atomic File Writes (M7) | Trajectory recorder and session persistence now use temp-then-rename pattern via atomic_write_text(). Prevents corruption on crash. |
| Barrier-Based Scheduling (M8) | Command queue now supports barrier entries. Destructive ops wait for active tasks to complete before executing. |
| Session Heartbeat (M9) | New SessionHeartbeat class auto-releases stale sessions after timeout. Resource management for multi-user deployments. |
| Cross-Provider Test Suite (M10) | 14 conformance tests (7 per backend) ensuring LocalBackend and InMemoryBackend both satisfy the SandboxBackend protocol. |
New files: providers/fallback.py, sandbox/credential_proxy.py, utils/atomic_write.py, session/heartbeat.py, tests/test_cross_provider.py
New feature flags: transcript_archival (off), credential_proxy (off)
New exports: HookResult, FallbackProvider, CredentialProxy, SessionHeartbeat, LazyTool, atomic_write_text, atomic_write_bytes
v5.28.0 โ Error Taxonomy, Prompt Caching, Session Persistence & External Hooks
Four production-grade features ported from the claw-code-main Rust reference implementation:
| Feature | Description |
|---|---|
| Prompt Cache Boundary | Inserts __CACHE_BOUNDARY__ marker in system prompt. Anthropic provider splits into static (cached via cache_control: ephemeral) + dynamic blocks. Reduces input token costs on multi-turn sessions. ON by default. |
| Error Taxonomy & Recovery | Classifies all LLM/tool errors into 7 discrete classes (context_window, provider_auth, provider_rate_limit, provider_retry_exhausted, provider_internal, provider_transport, runtime_io). Each class has retryable, recovery_hint, and optional failover_model. Structured error events emitted via onEvent. ON by default. |
| Session Persistence | Saves agent sessions as append-only JSONL to .clawagents/sessions/. Events: system_prompt, turn_started, assistant_message, tool_result, usage, turn_completed. New CLI: --sessions (list) and --resume [ID|latest] (continue). Opt-in. |
| External Hook System | Shell commands that run before/after tool execution and LLM calls. Config via .clawagents/hooks.json or CLAW_HOOK_* env vars. Hooks receive JSON on stdin, return JSON on stdout. pre_tool_use can block or modify args. 10s timeout, fail-open. Opt-in. |
Also:
- Anthropic cache token extraction โ
cache_creation_tokensandcache_read_tokensnow populated from both streaming and non-streaming Anthropic responses. AgentState.session_fileโ New field tracks the session JSONL path when persistence is enabled.- New public exports โ
ErrorClass,ErrorDescriptor,classify_error,get_recovery_recipe,SessionWriter,SessionReader,list_sessions,HooksConfig,ExternalHookRunner,load_hooks_config.
v5.27.3 โ Gemini Signature Regression Coverage
- Gemini signature regression test โ Added targeted tests for
_serialize_gemini_partsto ensurethought_signatureis propagated to sibling parallelfunction_callparts. - Parallel integration test reliability โ Fixed integration test fixture validation mismatch so large-output parallel execution is validated correctly.
v5.27.2 โ Gemini 3 Thought Signature Fix
- Gemini 3 Propagation โ Propagated
thought_signatureto all parallelfunction_callparts in the response, preventing400 INVALID_ARGUMENTduring multi-tool execution.
v5.27.1 โ Timeout Bugfix
- Fixed NameError โ Added
timeout_sparameter toClawAgent.invoketo prevent an exception when a global timeout is not provided.
v5.27.0 โ Claude Code Architectural Patterns
Ported 10 production-grade architectural patterns from Anthropic's Claude Code directly into ClawAgents. These features are controllable via environment variables or constructor injection:
| Feature | Description |
|---|---|
| Micro-Compact Memory | Aggressively clears giant tool results to save context. |
| File History Snapshots | Safely backs up files to .clawagents/snapshots/ before writing. |
| Prompt Cache Tracking | Real-time stats on Anthropic/OpenAI prompt cache hits. |
| Typed Memory Taxonomy | Auto-parses project, user, and feedback memories via frontmatter. |
| Write-Ahead Logging (WAL) | Crash-resilient interaction logging. |
| Granular Permission Rules | Define glob-based Allow/Deny execution policies. |
| Background Memory Extraction | Periodically scans conversations and extracts metadata. |
| Orchestration | Access to run_forked_agent and run_coordinator (swarm routing). |
v5.26.0 โ Bundled OpenViking Skill, Updated ByteRover Skill
| Feature | Description |
|---|---|
| OpenViking skill | Bundled skills/openviking/SKILL.md teaches the agent to use the ov CLI for tiered context retrieval (L0/L1/L2). Auto-enabled when ov is on PATH |
| ByteRover skill updated | Refreshed to match byterover-cli v1.8.0 โ added --headless, --folder, removed obsolete commands |
| Generic bundled skill loader | Skill loader now scans the entire bundled skills/ directory instead of hardcoding individual skills |
v5.25.0 โ Gemini Streaming Fix
| Feature | Description |
|---|---|
| Fix Gemini SDK warning | Eliminated "non-text parts in the response" warning by iterating candidates[].content.parts[] instead of accessing the .text property on streaming chunks containing function calls |
| Consistent text extraction | Streaming path now uses the same parts-based extraction as the non-streaming _request_once, filtering out thought parts |
v5.24.0 โ Zero-Config Channel Auto-Detection
| Feature | Description |
|---|---|
| Auto-detect channels from env vars | clawagents --serve now reads TELEGRAM_BOT_TOKEN, WHATSAPP_AUTH_DIR, SIGNAL_ACCOUNT from .env and auto-starts the ChannelRouter โ zero code required |
--doctor channel status |
clawagents --doctor reports which messaging channels are configured |
.env.example updated |
All channel env vars documented with inline comments |
--init scaffold |
clawagents --init generates .env with channel variables pre-commented |
v5.23.0 โ WebSocket Gateway, Multi-Channel Messaging (Telegram, WhatsApp, Signal)
Full multi-platform messaging support inspired by OpenClaw's channel architecture:
| Feature | Description |
|---|---|
| WebSocket gateway | FastAPI native WebSocket endpoint at /ws alongside existing HTTP. Methods: chat.send (streaming events), chat.history, chat.inject, ping. Auth via ?token= query param |
| Channel adapter interface | ChannelAdapter protocol + ChannelMessage dataclass โ standard contract for any messaging platform |
| Telegram adapter | Uses python-telegram-bot. Config: {"bot_token": "..."} |
| WhatsApp adapter | Baileys subprocess (Node.js) or WhatsApp Business API. Config: {"mode": "baileys", "auth_dir": ".whatsapp-auth"} |
| Signal adapter | Uses signal-cli subprocess with JSON-RPC. Config: {"account": "+1234567890"} |
| Channel router | ChannelRouter dispatches inbound messages to agents, routes replies back. Per-session serialization via KeyedAsyncQueue, optional debouncer, hooks |
from clawagents import create_claw_agent, ChannelRouter
from clawagents.channels.telegram import TelegramAdapter
from clawagents.channels.whatsapp import WhatsAppAdapter
router = ChannelRouter(lambda: create_claw_agent("gpt-5-mini"))
router.register(TelegramAdapter())
router.register(WhatsAppAdapter())
await router.start_all({
"telegram": {"bot_token": "123456:ABC..."},
"whatsapp": {"mode": "baileys", "auth_dir": ".whatsapp-auth"},
})
v5.22.0 โ Tool Result Caching, Parameter Validation & ComposeTool
3 features inspired by ToolUniverse's tool management patterns:
| Feature | Description |
|---|---|
| Tool result caching | LRU in-memory cache (ResultCacheManager) avoids redundant tool calls. Tools opt in with cacheable = True. Per-tool TTL overrides via result_cache.set_tool_ttl(). Built-in cacheable tools: read_file, grep, web_fetch. Default: 256 entries, 60s TTL |
| Parameter validation + coercion | validate_tool_args() checks required params and type-matches before execution. Lenient coercion handles common LLM quirks: "42" โ 42, "true" โ True, JSON strings โ objects/arrays. Enabled by default on ToolRegistry |
| ComposeTool | create_compose_tool() chains multiple tools in a deterministic pipeline without an LLM in the loop. Lighter than sub-agents for predictable workflows. Steps receive previous results and a call_tool helper. Failures short-circuit with clear error messages |
v5.21.0 โ Context Engine, Loop Detection & Compaction Overhaul
8 improvements inspired by the latest OpenClaw architecture:
| Feature | Description |
|---|---|
| Chunked compaction with retry | Compaction now splits old messages into ~30K-token chunks, summarizes each separately with up to 3 retries (exponential backoff), and explicitly preserves file paths, function names, error messages, and commands verbatim |
| Better loop detection | Result hashing detects "different args, same result" stalls; ping-pong detection catches AโBโAโB oscillation; global circuit breaker hard-stops at 30 no-progress calls |
| Context pruning (soft-trim) | New _soft_trim_messages runs at 60% context usage (before the 75% compaction trigger). Trims old tool results >1000 chars, removes duplicates, and stubs stale image data |
| Skill eligibility gating | Skills can declare requires: in YAML frontmatter (os, bins, env). Ineligible skills are filtered at load time |
| Skill prompt budget | Max 20 skills / 4000 chars injected into the system prompt. Full list accessible via list_skills |
| Control token sanitization | Strips leaked model control tokens (<|assistant|>, <|endoftext|>, full-width variants) from final output |
| Head+tail truncation | Eviction fallback and content preview now use head+tail (preserving error messages at the end). Also fixes a bug where few-line, huge-character content bypassed preview truncation |
| Pluggable context engine | New ContextEngine ABC with after_turn, compact, bootstrap, cleanup lifecycle hooks. DefaultContextEngine is a no-op pass-through. Registry: register_context_engine() / resolve_context_engine() |
v5.20.4 โ Gemini MALFORMED_FUNCTION_CALL Retry
| Feature | Description |
|---|---|
| Gemini malformed FC retry | When Gemini returns finish_reason=MALFORMED_FUNCTION_CALL with 0 parts (common with complex parallel tool calls), the provider now automatically retries with tool_config.mode=ANY instead of stopping the agent |
| Streaming + non-streaming | Fix applied to both streaming (_stream_with_retry) and non-streaming (_request_once) code paths |
| Recursion guard | _malformed_retry flag prevents infinite retry loops if mode=ANY also fails |
v5.20.3 โ GPT-5 Temperature Corrections
| Feature | Description |
|---|---|
| GPT-5-nano temperature | Live API tests confirmed gpt-5-nano requires temperature=1 (not 0). Fixed in _FIXED_TEMPERATURE_MODELS |
v5.20.0 โ Temperature & Compaction Fixes
| Feature | Description |
|---|---|
| Temperature fix | GPT-5 models no longer forced to temperature=1.0. Only o-series models (o1, o3, o4-mini) retain the fixed override. This restores deterministic behavior when TEMPERATURE=0 is set |
| Compaction overhaul | Context compaction no longer causes the agent to "forget" what it was doing. Five improvements: (1) RECENT_MESSAGES_TO_KEEP increased from 6 โ 20, (2) tool call/result pairs are never split, (3) summary prompt now includes original task + structured preservation instructions, (4) compacted summary inserted as role="user" with [System โ Compacted History] prefix instead of role="assistant", (5) text log for summarization includes structured [TOOL CALLS] and [TOOL RESULT] markers |
| Debug cleanup | All development instrumentation removed from production code |
v5.19.0 โ Anthropic Provider, Security, Architecture Overhaul
| Feature | Description |
|---|---|
| Anthropic/Claude provider | First-class support for Claude models via ANTHROPIC_API_KEY. Install with pip install clawagents[anthropic] |
| Optional Gemini | google-genai is now an optional dependency. Install with pip install clawagents[gemini] or pip install clawagents[all] |
py.typed + __version__ |
PEP 561 type stub marker and clawagents.__version__ export for downstream tools |
| Lazy config loading | No more module-level side effects โ .env discovery happens on first load_config() call |
Lazy Path.cwd() |
All module-level Path.cwd() calls replaced with lazy functions โ safe for import from any directory |
| Gateway authentication | GATEWAY_API_KEY env var enables Bearer token auth on POST endpoints |
| CORS support | Gateway now supports GATEWAY_CORS_ORIGINS for cross-origin requests |
| Improved blocked patterns | Expanded dangerous command detection with regex matching |
| API key masking | clawagents --doctor now masks keys (shows ********...last4) |
| Azure detection | New OPENAI_API_TYPE=azure env var for explicit Azure OpenAI configuration |
| Global timeout | --timeout N CLI flag and CLAW_TIMEOUT env var for agent run time limits |
--verbose / --quiet |
CLI flags for controlling output verbosity |
--prune-trajectories N |
Delete trajectory files older than N days |
| Lesson export/import | export_lessons() / import_lessons() for sharing lessons between projects |
| Trajectory pruning | prune_trajectories(max_age_days) utility function |
pydantic-settings |
Now properly listed as a dependency (was missing) |
| pyproject.toml metadata | Added license, authors, classifiers, URLs, optional dependency groups |
| New tests | Tests for _repair_json, trajectory recorder, config module |
v5.18.0 โ Doctor, Trajectory Inspector & Config Improvements
| Feature | Description |
|---|---|
clawagents --doctor |
New diagnostic command checks .env discovery, API keys, active model, LLM settings, PTRL flags, local endpoint reachability, trajectory history, and AGENTS.md presence |
clawagents --trajectory [N] |
Inspect the last N run summaries: score, quality, failures, judge verdict, duration โ human-readable trajectory output |
| Startup banner | Every --task and --serve now prints provider=X model=Y env=Z ptrl=... for instant visibility into active config |
CLAWAGENTS_ENV_FILE |
New env var to explicitly point to a .env file path. Priority: CLAWAGENTS_ENV_FILE > cwd/.env > cwd/../.env. Useful for CI, Docker, multi-project |
| Publish hygiene | GitHub releases no longer include .clawagents/, .pytest_cache/, logs, or other runtime artifacts |
| Config/docs consistency tests | 6 pytest tests verify every EngineConfig field appears in .env.example and README.md |
--port in TypeScript |
Gateway server port now configurable via --port N in TypeScript CLI |
v5.17.0 โ Quick Start Scaffold & Examples
| Feature | Description |
|---|---|
clawagents --init |
New CLI command scaffolds a starter project in the current directory: generates .env (with all providers commented out), run_agent.py (ready-to-run starter script with 5 provider options), and AGENTS.md (memory template) |
clawagents --help |
Shows usage with examples, quick start instructions |
clawagents --task |
Run a single task from the command line |
clawagents --serve |
Start the HTTP gateway server from CLI |
| Examples directory | 8 ready-to-run example scripts: OpenAI, Gemini, Azure, Ollama, vLLM, Bedrock, custom tools, and multi-sample comparison |
| README overhaul | New "30-Second Quick Start" section, examples table, clearer onboarding flow |
v5.16.0 โ LLM-as-Judge & Thinking Token Preservation
| Feature | Description |
|---|---|
| G. LLM-as-Judge verification | After each run (when learn=True), a separate, focused LLM call evaluates whether the task was actually accomplished. Returns a 0-3 score with justification โ more reliable than heuristic scoring. Results stored as judge_score and judge_justification on RunSummary |
| H. Thinking token preservation | Models like Qwen3 and DeepSeek that emit <think>...</think> blocks are now fully supported. Thinking content is extracted before tool-call parsing, preserved on messages and trajectory records, and stripped from visible output. Available via strip_thinking_tokens() utility |
v5.15.0 โ Deterministic Verification & GRPO-Inspired Comparison
| Feature | Description |
|---|---|
| A. Deterministic rewards | Tool execution results (exit codes, test pass/fail counts) are now used as objective ground truth for scoring, replacing pure LLM self-assessment. Each turn and run summary includes deterministic_score and verified_score fields |
| B. Multi-sample comparison | New agent.compare(task, n_samples=3) method runs the same task N times and picks the best result using objective scoring โ inspired by SkyRL's Group Relative Policy Optimization (GRPO) |
| C. Task-type-aware verification | Auto-detects task type (coding/file/search/refactor/general) and applies type-specific verifiers. Coding tasks use test results; file tasks check write success; refactoring checks edits + tests |
| D. Progressive context caching | System prompt token count is computed once and cached, avoiding redundant re-counting on every turn. Logged at startup for budget visibility |
| E. RFT-ready transitions | Each trajectory now exports {run_id}_rft.json with (observation, action, reward, done) tuples per step โ structured for future Rejection Fine-Tuning pipelines |
| F. Adaptive rethink threshold | Rethink trigger threshold now adjusts dynamically: complex tasks (coding/refactor) get more patience (threshold=5), simple tasks (search/file) trigger sooner (threshold=3), and late in runs threshold drops to minimum (2) |
v5.14.0 โ SkyRL-Inspired PTRL Improvements
| Feature | Description |
|---|---|
| ๐ฆ Quality gate for lesson extraction | Lessons only extracted from runs with mixed outcomes (both successes and failures). Zero-variance runs (all-success or all-failure with no contrast) are skipped โ inspired by SkyRL's GRPO dynamic sampling |
| โฐ Lesson staleness decay | Each lesson block is now timestamped + model-tagged (@timestamp [model]). load_lessons(max_age_s=N) filters out stale lessons. Prevents prompt pollution from outdated advice |
| ๐ค Format vs. logic failure classification | Every failed tool call is classified as "format" (bad JSON, wrong params) or "logic" (valid call, wrong approach). Rethink messages now include format-specific or strategy-specific guidance |
| ๐ Per-step reward attribution | Each TurnRecord now includes observation_context (what the agent saw before deciding), productivity_score (-1.0 to 1.0), and failure_type per tool call. RunSummary adds format_failures, logic_failures, has_mixed_outcomes, and finish_reason |
| ๐ง Enhanced self-analysis prompt | Post-run LLM analysis now receives failure type breakdown and productivity scores for targeted lesson extraction |
v5.13.0 โ Prompt-Time Reinforcement Learning (PTRL)
| Feature | Description |
|---|---|
| ๐ง PTRL: Post-run self-analysis | After each run, the LLM reviews its own trajectory and extracts 2-5 actionable lessons, saved to .clawagents/lessons.md |
| ๐ PTRL: Pre-run lesson injection | On subsequent runs, stored lessons are injected into the system prompt so the agent avoids past mistakes |
| ๐ PTRL: Enhanced mid-run rethink | When consecutive failures trigger a rethink, relevant past lessons are included in the rethink message |
๐๏ธ learn flag / CLAW_LEARN env |
Opt-in via learn=True or CLAW_LEARN=1. Automatically enables trajectory logging |
๐ Default context_window โ 1,000,000 |
Increased from 128,000 to support modern large-context models |
| ๐ง macOS sandbox symlink fix | LocalBackend now resolves symlinks at init (fixes /var โ /private/var on macOS) |
| โ All 150 tests passing | Fixed 48 pre-existing test failures (sandbox path traversal, LLMMessage subscript, mock assertions) |
v5.12.1 โ Streamlit / Jupyter Compatibility
| Feature | Description |
|---|---|
| ๐ง Signal handler fix | add_signal_handler now catches RuntimeError in addition to NotImplementedError/OSError, fixing crashes in Streamlit, Jupyter, and other non-main-thread environments |
v5.12.0 โ Gemini 3 Thought Signature Support
| Feature | Description |
|---|---|
๐ง thought_signature preservation |
Gemini 3 thinking models (e.g. gemini-3-flash-preview) require thought and thought_signature fields to be echoed back during multi-turn function calling. ClawAgents now captures the full response parts and replays them verbatim, preventing 400 errors. |
๐ gemini_parts field |
New optional field on LLMMessage and LLMResponse carries raw Gemini response parts through the conversation history. Used automatically โ no user action required. |
v5.11.0 โ Configurable Limits
| Feature | Description |
|---|---|
๐ข max_iterations |
Now settable at construction or via MAX_ITERATIONS env (default 200, was hardcoded in caller) |
๐ preview_chars |
Tool-output preview length configurable via CLAW_PREVIEW_CHARS env (default 120) |
๐ response_chars |
Response text length in trajectory records via CLAW_RESPONSE_CHARS env (default 500) |
| โ๏ธ Priority | Explicit param > env var > default for all three |
v5.10.0 โ Discrete Reward Bands & Weighted Scoring
| Feature | Description |
|---|---|
| ๐ฏ Discrete reward bands | Run scores mapped to -1 โฆ +3 bands (inspired by CUDA-Agent PPO reward shaping) |
| โ๏ธ Weighted execution scoring | execute, shell, run_code weighted 2ร higher than generic tools |
| ๐ท๏ธ Run quality grading | Each run classified as clean, noisy, or failed for trajectory filtering |
| ๐ก๏ธ Gameable tool exclusion | think, todolist, use_skill, etc. excluded from scoring to prevent reward hacking |
v5.9.0 โ Trajectory Logging & Rethink
| Feature | Description |
|---|---|
| ๐ Trajectory logging | Structured recording of every turn, tool call, and outcome to runs.jsonl |
| ๐ Consecutive-failure rethink | After 3 consecutive meaningful failures, injects a system "rethink" prompt |
| ๐๏ธ Opt-in flags | trajectory=True / CLAW_TRAJECTORY=1 and rethink=True / CLAW_RETHINK=1 |
v5.8.0 โ JSON Resilience
| Feature | Description |
|---|---|
| ๐ง JSON repair | _repair_json() utility fixes truncated JSON from hitting max_completion_tokens |
| ๐ Truncated JSON retry | Detects incomplete JSON tool calls and prompts the LLM to resend |
v5.7.0 โ Model-Specific Temperature
| Feature | Description |
|---|---|
| ๐ก๏ธ Fixed-temperature models | Reasoning models (o-series, gpt-5, gpt-5-mini, gpt-5-turbo) auto-override to temperature=1.0. Non-reasoning models (gpt-5-nano, gpt-5-micro, gpt-4o) respect configured temperature |
| ๐ก๏ธ Configurable temperature | TEMPERATURE env var + temperature parameter on create_claw_agent |
v5.6.0 โ LLM Parameter Fixes
| Feature | Description |
|---|---|
๐ max_completion_tokens |
OpenAI calls now use max_completion_tokens (replacing deprecated max_tokens) |
๐ max_output_tokens |
Gemini calls now pass max_output_tokens correctly |
| โ๏ธ Config priority | Explicit param > .env > default โ no more shadowing of env values |
v5.5.0 โ Foundation
| Feature | Description |
|---|---|
| ๐ Pluggable Sandbox | SandboxBackend protocol with LocalBackend + InMemoryBackend |
| ๐ Gateway Server | FastAPI server with SSE streaming and 4-lane queue |
| ๐๏ธ Advanced FS Tools | tree, diff, insert_lines |
| ๐ง Think Tool | Structured reasoning without side effects |
| ๐ Web Fetch | URL fetching with HTML cleanup |
| ๐ฌ Ask User | Interactive stdin-based input |
| ๐ History Offloading | Full audit trail preserved after compaction |
| ๐ Tool Access Control | block_tools() / allow_only_tools() at runtime |
| ๐ Context Injection | inject_context() hook for every LLM call |
| โ๏ธ Output Truncation | truncate_output() to cap tool output size |
Trajectory Logging & RL-Inspired Scoring
ClawAgents includes an optional trajectory system inspired by reinforcement learning techniques from CUDA-Agent and OpenClaw-RL. Enable it with trajectory=True or CLAW_TRAJECTORY=1.
What gets logged
Every agent run records:
- Turn-level data: tool calls, arguments, success/failure, output previews
- Weighted turn scores: execution tools (shell, code runners) weighted 2ร higher than generic tools
- Run summary: total turns, tool calls, successes/failures, elapsed time
Discrete reward bands
Each run receives a score from -1 to +3:
| Score | Meaning |
|---|---|
| +3 | All tools succeeded, task completed cleanly |
| +2 | Minor hiccups but overall success |
| +1 | Partial success with some failures |
| 0 | Inconclusive โ mixed results |
| -1 | Majority of tool calls failed |
Quality grading
Runs are classified for downstream filtering:
| Quality | Criteria |
|---|---|
clean |
Score โฅ 2 and โค 2 mid-run failures |
noisy |
Score โฅ 0 but too many mid-run failures |
failed |
Score < 0 |
Anti-gaming protections
Tools like think, todolist, use_skill, list_skills, and update_todo are excluded from scoring โ they can't inflate success rates.
Consecutive-failure rethink
With rethink=True or CLAW_RETHINK=1, the agent monitors tool outcomes in real-time. After 3 consecutive meaningful failures, it injects a system message:
"You have had 3 consecutive tool failures. Stop and rethink your approach before continuing."
This simple mechanism prevents the agent from spiraling into repeated failed attempts.
Output
Run summaries are appended to .clawagents/trajectories/runs.jsonl:
{
"run_id": "a1b2c3d4",
"model": "gpt-5-mini",
"total_turns": 8,
"tool_calls": 12,
"successes": 10,
"failures": 2,
"run_score": 2,
"quality": "clean",
"elapsed_ms": 45230,
"turns": [...]
}
Roadmap
- Docker sandbox backend (protocol ready)
- Semantic browser automation (accessibility tree)
- Prompt caching (Anthropic-style)
- Persistent memory learning from trajectory data (advanced โ RFT-style rule extraction)
- Post-run self-analysis + lesson extraction โ (v5.13 โ PTRL)
- Pre-run lesson injection โ (v5.13 โ PTRL)
- Enhanced mid-run rethink with past lessons โ (v5.13 โ PTRL)
- Trajectory logging + discrete reward bands โ (v5.9โ5.10)
- Consecutive-failure rethink injection โ (v5.9)
- Weighted execution scoring + quality grading โ (v5.10)
- JSON repair + truncated JSON retry โ (v5.8)
- Model-specific temperature override โ (v5.7)
- Configurable temperature / max_completion_tokens โ (v5.6)
- Pluggable sandbox backend โ (v5.5)
- Lane-based queue serialization โ (v5.5)
- Skill progressive disclosure โ (v5.5)
- Gateway HTTP server โ (v5.5)
License
MIT
Built with ๐ฆ by the ClawAgents team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clawagents-6.0.0.tar.gz.
File metadata
- Download URL: clawagents-6.0.0.tar.gz
- Upload date:
- Size: 214.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd826b69e11e856fa94af88c25dec6d2bc7bf692ab2cc5003c91cd88053e100e
|
|
| MD5 |
2f1157045b78807085741b969fdd0624
|
|
| BLAKE2b-256 |
f7b5811bb52c6e5abc84ba0e1a274d6cd3f973baecda7d752d36a4b8db9aea10
|
File details
Details for the file clawagents-6.0.0-py3-none-any.whl.
File metadata
- Download URL: clawagents-6.0.0-py3-none-any.whl
- Upload date:
- Size: 176.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c14e83311a1a22f117a03c20f37d36c61ce53a651bc78eca20f726efb43d6b7
|
|
| MD5 |
b1a7f3804dc6c77ba44d165ec9c564d7
|
|
| BLAKE2b-256 |
c61dabc57fd29f6f05884882c1b3d4fc72e584a490e6aa0baed4ebffdcc21a0a
|