Python port of the ClawAgents framework
Project description
๐ฆ ClawAgents
A lean, full-stack agentic AI framework โ ~2,500 LOC
ClawAgents is a production-ready agentic framework that gives LLMs the ability to read, write, and execute code โ with built-in planning, memory, sandboxing, and a gateway server. It supports OpenAI GPT-5 and Google Gemini out of the box, with a pluggable provider architecture for any LLM.
Built by extracting and unifying the best architectural patterns from OpenClaw (~5,800 files) and DeepAgents (~1,400 LOC core), ClawAgents delivers the same power at a fraction of the complexity.
Installation
pip install clawagents
Version 5.10.0 โ Latest stable release (February 2026)
Quick Start
1. Configure your environment
Create a .env file:
PROVIDER=gemini # or "openai"
GEMINI_API_KEY=AIza... # Your Gemini API key
GEMINI_MODEL=gemini-3-flash-preview
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=0 # Model-specific overrides apply (see below)
# Optional: RL-inspired agent improvements
CLAW_TRAJECTORY=1 # Enable trajectory logging + scoring
CLAW_RETHINK=1 # Enable consecutive-failure detection
OpenAI configuration
PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-nano
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=1 # GPT-5 family requires temperature=1
CLAW_TRAJECTORY=1
CLAW_RETHINK=1
2. One-line agent
from clawagents import create_claw_agent
agent = create_claw_agent("gemini-3-flash")
result = await agent.invoke("List all Python files in src/")
print(result.result)
3. With custom instructions
agent = create_claw_agent(
"gpt-5",
instruction="You are a senior code reviewer. Be thorough and concise."
)
result = await agent.invoke("Review this codebase and suggest improvements")
4. With trajectory logging & rethink
agent = create_claw_agent(
"gpt-5-mini",
trajectory=True, # logs every turn + scores the run
rethink=True, # auto-injects "rethink" after 3 consecutive failures
)
result = await agent.invoke("Refactor the auth module and add tests")
# Run summary written to .clawagents/trajectories/runs.jsonl
5. CLI mode
python -m clawagents --task "Find all TODO comments in the codebase"
๐ Performance: ClawAgents vs Traditional Frameworks
ClawAgents v5.10 outperforms traditional multi-layer agentic frameworks through architectural simplicity. Here's how it stacks up against DeepAgents (LangGraph/LangChain-based) in head-to-head benchmarks.
Benchmark Results (February 2026)
TypeScript โ 5 tasks ร 2 models ร 2 frameworks (20/20 โ )
| Framework | Gemini-2.5-flash | GPT-5-mini |
|---|---|---|
| ClawAgents v5.5 | 2.3s avg ยท 1.4 tools | 13.6s avg ยท 1.4 tools |
| DeepAgents | 2.5s avg ยท 1.8 tools | 15.7s avg ยท 2.4 tools |
Per-Task Breakdown
| Task | ClawAgents (Gemini) | DeepAgents (Gemini) | ClawAgents (GPT-5) | DeepAgents (GPT-5) |
|---|---|---|---|---|
| File Listing | 3.7s, 1 tool | 1.9s, 1 tool | 8.9s, 1 tool | 8.4s, 1 tool |
| Read & Analyze | 1.6s, 1 tool | 3.6s, 3 tools | 5.4s, 1 tool | 13.0s, 2 tools |
| Write File | 2.1s, 2 tools | 2.6s, 2 tools | 5.2s, 2 tools | 7.5s, 2 tools |
| Multi-Step | 3.4s, 3 tools | 3.7s, 3 tools | 46.2s, 3 tools | 46.9s, 7 tools |
| Reasoning | 0.7s, 0 tools | 0.9s, 0 tools | 2.3s, 0 tools | 2.8s, 0 tools |
Python โ 18/20 completed (DeepAgents hung on GPT-5 multi_step)
| Task | ClawAgents (Gemini) | DeepAgents (Gemini) | ClawAgents (GPT-5) | DeepAgents (GPT-5) |
|---|---|---|---|---|
| File Listing | 2.8s, 1 tool | 1.0s, 0 tools* | 9.9s, 1 tool | 3.4s, 1 tool |
| Read & Analyze | 2.0s, 1 tool | 9.8s, 4 tools | 5.5s, 1 tool | 8.4s, 3 tools |
| Write File | 2.0s, 2 tools | 1.0s, 0 tools* | 5.0s, 2 tools | 9.3s, 3 tools |
| Multi-Step | 4.1s, 3 tools | 0.9s, 0 tools* | 16.0s, 3 tools | โ hung >5min |
| Reasoning | 0.7s, 0 tools | 1.0s, 0 tools | โ | โ |
* DeepAgents 0-tool results mean the model answered without using filesystem tools โ faster but lower-quality (unverified answers). ClawAgents consistently uses tools to verify answers.
Why ClawAgents Wins
Traditional Stack (DeepAgents): ClawAgents:
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Your Code โ โ Your Code โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโค
โ LangGraph โ โ ClawAgents โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โ (direct SDK) โ
โ LangChain โ โโโโโโโโโโฌโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ ChatOpenAI / ChatGeminiโ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโ
โ Responses API โ โ Responses API โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
4 layers 1 layer
| Advantage | Impact |
|---|---|
| Direct SDK calls (1 layer vs 4) | Lower latency, fewer failure points |
| Working directory awareness | Tools operate from CWD; DeepAgents has no CWD concept |
| Soft + hard loop detection | Catches repetitive tool calls at 3 repeats, hard-stops at 6 |
| Efficiency rules in system prompt | ~30% reduction in redundant tool calls |
| Fewer tool calls overall | 1.4 avg vs 1.8โ2.4 (20โ40% more efficient) |
| No OpenAI lock-in | Native Gemini + OpenAI support with FallbackProvider chain |
Feature Matrix
| Feature | ClawAgents v5.10 | DeepAgents | OpenClaw |
|---|---|---|---|
| ReAct loop | โ | โ | โ |
| Tool loop detection | โ soft + hard | โ | โ |
| Efficiency rules (system prompt) | โ | โ | โ |
| Adaptive token estimation | โ | โ | โ |
| Model-aware context budgeting | โ | โ | โ |
| Pluggable sandbox backend | โ | โ | โ |
| In-memory VFS (testing) | โ | โ | โ |
| Sub-agent delegation | โ | โ | โ |
| Planning / TodoList | โ | โ | โ |
| Persistent memory (AGENTS.md) | โ | โ | โ |
| Human-in-the-loop | โ | โ | โ |
| Dangling tool call repair | โ | โ | โ |
| Auto-summarization + offloading | โ | โ | โ |
| Lane-based command queue | โ | โ | โ |
| Gateway HTTP server + SSE | โ | โ | โ |
| Tool access control | โ | โ | โ |
think tool (structured reasoning) |
โ | โ | โ |
| LangChain tool adapter | โ | N/A | โ |
| Streaming with stall detection | โ | โ | โ |
| Trajectory logging + run scoring | โ | โ | โ |
| Consecutive-failure rethink | โ | โ | โ |
| Discrete reward bands (RL-inspired) | โ | โ | โ |
| Weighted execution scoring | โ | โ | โ |
| Truncated JSON repair + retry | โ | โ | โ |
| Model-specific temperature override | โ | โ | โ |
Architecture
Core Components (~2,500 LOC)
clawagents/
โโโ agent.py # ClawAgent class โ ReAct loop, hooks, compaction
โโโ __main__.py # CLI entrypoint
โโโ config/ # Env-based configuration (incl. TEMPERATURE, CLAW_*)
โโโ providers/ # LLM backends (OpenAI, Gemini, Fallback)
โ โโโ llm.py # max_completion_tokens, temperature override, JSON repair
โโโ tools/ # 14+ built-in tools
โ โโโ filesystem.py # ls, read_file, write_file, edit_file
โ โโโ advanced_fs.py # tree, diff, insert_lines
โ โโโ search.py # grep, glob
โ โโโ execute.py # Shell command execution
โ โโโ planning.py # write_todos, update_todo
โ โโโ delegation.py # Sub-agent task delegation
โ โโโ think.py # Structured reasoning (no side effects)
โ โโโ web.py # URL fetching with HTML cleanup
โ โโโ interactive.py # ask_user (stdin-based)
โโโ sandbox/ # Pluggable backend protocol
โ โโโ protocol.py # SandboxBackend interface (15+ methods)
โ โโโ local.py # LocalBackend (pathlib + asyncio)
โ โโโ in_memory.py # InMemoryBackend (VFS for testing)
โโโ trajectory/ # RL-inspired run analysis (v5.9+)
โ โโโ recorder.py # TrajectoryRecorder, discrete scoring, quality grading
โโโ gateway/ # Production HTTP server
โ โโโ server.py # FastAPI + SSE streaming
โ โโโ queue.py # 4-lane FIFO command queue
โโโ graph/ # Agent loop orchestration + failure tracking
โโโ memory/ # AGENTS.md discovery + compaction
โโโ process/ # Process management
โโโ logging/ # Structured logging
Built-in Tools
Every agent includes these โ no setup needed:
| Tool | Description |
|---|---|
ls |
List directory with size + modified time |
read_file |
Read file with line numbers + pagination |
write_file |
Write/create file (auto-creates directories) |
edit_file |
Replace text with pattern matching |
grep |
Search โ single file or recursive with glob filter |
glob |
Find files by pattern (**/*.py) |
execute |
Shell command execution |
tree |
Recursive directory tree with smart ignoring |
diff |
Unified diff between two files |
insert_lines |
Precise line-level insertion |
think |
Structured reasoning without side effects |
web_fetch |
URL fetching with HTML stripping (50KB cap) |
write_todos |
Plan tasks as a checklist |
update_todo |
Mark plan items complete |
task |
Delegate to a sub-agent with isolated context |
ask_user |
Interactive stdin-based user input |
use_skill |
Load a skill's instructions (when skills exist) |
Tool Examples
๐ Filesystem โ ls, read_file, write_file, edit_file
The agent calls tools by emitting JSON blocks. Here's what happens under the hood when you ask the agent to work with files:
# The agent autonomously emits tool calls like:
# List a directory
{"tool": "ls", "args": {"path": "src/"}}
# โ Returns: drwxr-xr-x 4.0 KB 2026-02-24 components/
# -rw-r--r-- 1.2 KB 2026-02-24 main.py
# Read a file with pagination
{"tool": "read_file", "args": {"path": "src/main.py", "offset": 0, "limit": 50}}
# โ Returns: 1 | import asyncio
# 2 | from clawagents import create_claw_agent
# ...
# Write a new file (parent directories auto-created)
{"tool": "write_file", "args": {"path": "src/utils/helpers.py", "content": "def greet(name):\n return f'Hello, {name}!'"}}
# โ Returns: โ
Wrote 45 bytes to src/utils/helpers.py
# Edit an existing file by pattern match
{"tool": "edit_file", "args": {
"path": "src/main.py",
"old": "print('hello')",
"new": "print('Hello, World!')"
}}
# โ Returns: โ
1 replacement made in src/main.py
๐ Search โ grep, glob
# Recursive grep across all Python files
{"tool": "grep", "args": {"pattern": "TODO", "path": "src/", "include": "*.py"}}
# โ Returns: src/agent.py:42: # TODO: add retry logic
# src/tools/web.py:15: # TODO: handle redirects
# Single-file search
{"tool": "grep", "args": {"pattern": "class.*Tool", "path": "src/tools/registry.py"}}
# โ Returns: 15: class ToolResult:
# 24: class Tool(Protocol):
# Find files by pattern
{"tool": "glob", "args": {"pattern": "**/*.md", "path": "."}}
# โ Returns: ./README.md (15.3 KB)
# ./docs/ARCHITECTURE.md (4.1 KB)
# ./AGENTS.md (892 B)
โก Shell Execution
# Run any shell command
{"tool": "execute", "args": {"command": "python -m pytest tests/ -v"}}
# โ Returns full stdout/stderr with exit code
# With custom timeout (in milliseconds)
{"tool": "execute", "args": {"command": "pip install requests", "timeout": 60000}}
# Dangerous commands are auto-blocked
{"tool": "execute", "args": {"command": "rm -rf /"}}
# โ Error: Blocked potentially destructive command
๐ง Think โ structured reasoning
# The agent can reason without side effects
{"tool": "think", "args": {
"thought": "The user wants me to refactor the database layer. Let me plan: 1) Read the current schema, 2) Identify coupled components, 3) Extract a repository pattern, 4) Update tests."
}}
# โ [Thought recorded] โ no files touched, no commands run
This reduces unnecessary tool calls by giving the agent a structured space to plan.
๐ Planning โ write_todos, update_todo
# Create a structured plan
{"tool": "write_todos", "args": {
"todos": ["Read the existing codebase", "Fix the auth bug", "Add unit tests", "Update docs"]
}}
# โ ## Progress: 0/4 complete
# 0. [ ] Read the existing codebase
# 1. [ ] Fix the auth bug
# 2. [ ] Add unit tests
# 3. [ ] Update docs
# Mark steps complete as you go
{"tool": "update_todo", "args": {"index": 0}}
# โ ## Progress: 1/4 complete
# 0. [x] Read the existing codebase
# 1. [ ] Fix the auth bug
# ...
๐ค Sub-agent delegation
# Delegate to a fresh sub-agent with isolated context
{"tool": "task", "args": {
"description": "Analyze all Python files in src/ and create a summary of the module structure",
"max_iterations": 10
}}
# โ [Sub-agent completed: 6 tool calls, 4 iterations]
# The src/ directory contains 3 modules: ...
# With named specialized sub-agents (configured at creation)
{"tool": "task", "args": {
"description": "Review this pull request for security issues",
"agent": "security-reviewer"
}}
Registering named sub-agents:
from clawagents import create_claw_agent
from clawagents.tools.subagent import SubAgentSpec
agent = create_claw_agent(
"gemini-3-flash",
subagents=[
SubAgentSpec(
name="researcher",
description="Deep research on a topic",
system_prompt="You are a thorough researcher. Always cite sources.",
max_iterations=15,
),
SubAgentSpec(
name="coder",
description="Write and test code",
system_prompt="You are a senior engineer. Write clean, tested code.",
max_iterations=10,
),
],
)
๐ Web Fetch
# Fetch and read a web page (HTML stripped automatically)
{"tool": "web_fetch", "args": {"url": "https://docs.python.org/3/library/asyncio.html"}}
# โ [200] https://docs.python.org/3/library/asyncio.html
# asyncio โ Asynchronous I/O ...
# Fetch a JSON API
{"tool": "web_fetch", "args": {"url": "https://api.github.com/repos/python/cpython", "timeout": 10}}
# โ Returns raw JSON response
Custom Tools
Create your own tools by implementing the Tool protocol:
from clawagents import create_claw_agent
from clawagents.tools.registry import Tool, ToolResult
class DatabaseQueryTool:
name = "query_db"
description = "Run a read-only SQL query against the application database."
parameters = {
"sql": {"type": "string", "description": "The SQL SELECT query", "required": True},
"limit": {"type": "number", "description": "Max rows to return. Default: 100"},
}
async def execute(self, args):
sql = args.get("sql", "")
limit = int(args.get("limit", 100))
# ... your database logic here ...
rows = await run_query(sql, limit=limit)
return ToolResult(success=True, output=format_table(rows))
# Register custom tools alongside built-ins
agent = create_claw_agent("gpt-5", tools=[DatabaseQueryTool()])
You can also wrap LangChain tools directly:
from langchain_community.tools import WikipediaQueryRun
agent = create_claw_agent("gpt-5", tools=[WikipediaQueryRun()])
# LangChain tools are automatically adapted via LangChainToolAdapter
Skills System
Skills are reusable instruction sets that teach the agent domain-specific knowledge โ without polluting the system prompt. They use a progressive disclosure pattern: the agent loads skill instructions on demand via the use_skill tool.
Skill Directory Structure
your-project/
โโโ skills/ # Auto-discovered (or .skills/, skill/, .skill/, Skills/)
โ โโโ code_review/
โ โ โโโ SKILL.md # โ Skill defined as a folder + SKILL.md
โ โโโ sql_expert.md # โ Skill defined as a single .md file
โ โโโ deploy_checklist.md
โโโ AGENTS.md # Project memory (auto-injected)
โโโ src/
โโโ ...
Writing a Skill
Every skill is a Markdown file with optional YAML frontmatter:
Example 1 โ skills/code_review/SKILL.md
---
name: code_review
description: "Perform thorough code reviews following team standards"
allowed-tools: read_file grep glob think
---
# Code Review Skill
When reviewing code, follow these steps:
## 1. Structure Check
- Verify the file follows our module pattern (one class per file)
- Check imports are grouped: stdlib โ third-party โ local
- Ensure `__init__.py` exports are up to date
## 2. Logic Review
- Look for unhandled edge cases (empty inputs, None values)
- Verify error messages are actionable
- Check that async functions are properly awaited
## 3. Security
- No hardcoded secrets or API keys
- SQL queries use parameterized statements
- User input is sanitized before use
## 4. Output Format
Provide your review as:
- โ
**Approved** โ no issues found
- โ ๏ธ **Changes requested** โ list specific issues with file:line references
- ๐ซ **Blocked** โ critical issues that must be fixed
Example 2 โ skills/sql_expert.md (single-file skill)
---
name: sql_expert
description: "Write optimized SQL queries for PostgreSQL"
allowed-tools: execute read_file think
---
# SQL Expert
You are a PostgreSQL expert. When writing queries:
## Rules
1. Always use explicit `JOIN` syntax (never implicit joins in WHERE)
2. Use CTEs (`WITH` clauses) for complex multi-step queries
3. Add `EXPLAIN ANALYZE` when the user asks about performance
4. Use parameterized queries โ never interpolate user values
5. Default to `LIMIT 100` unless the user specifies otherwise
## Patterns
### Pagination
Use keyset pagination for large tables:
```sql
SELECT * FROM events
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;
Aggregation
Always include the raw count alongside percentages:
SELECT
status,
COUNT(*) AS n,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM orders
GROUP BY status
ORDER BY n DESC;
**Example 3 โ `skills/deploy_checklist.md`**
```markdown
---
name: deploy_checklist
description: "Step-by-step production deployment checklist"
---
# Deployment Checklist
Before deploying to production, complete every step:
- [ ] All tests pass: `pytest tests/ -v`
- [ ] No lint errors: `ruff check src/`
- [ ] Version bumped in `pyproject.toml`
- [ ] CHANGELOG.md updated
- [ ] Docker image builds: `docker build -t app:latest .`
- [ ] Smoke test on staging environment
- [ ] Database migrations reviewed and tested
- [ ] Rollback plan documented
How Skills Work at Runtime
# Skills are auto-discovered from ./skills/ directory
agent = create_claw_agent("gemini-3-flash")
# Or specify custom skill directories
agent = create_claw_agent("gpt-5", skills=["./my-skills", "./shared-skills"])
When skills are available, the agent gets two additional tools:
# 1. List available skills
{"tool": "list_skills", "args": {}}
# โ Available skills (3):
# - **code_review**: Perform thorough code reviews following team standards
# โ Allowed tools: read_file, grep, glob, think
# - **sql_expert**: Write optimized SQL queries for PostgreSQL
# โ Allowed tools: execute, read_file, think
# - **deploy_checklist**: Step-by-step production deployment checklist
# 2. Load a specific skill's instructions
{"tool": "use_skill", "args": {"name": "sql_expert"}}
# โ Returns the full skill content, injected into the agent's context
The agent decides on its own when to use a skill. If you ask it to "write a query to find all overdue orders," and a sql_expert skill exists, it will load the skill first, then write the query following those rules.
API Reference
create_claw_agent(model, instruction, ...)
| Param | Type | Default | Description |
|---|---|---|---|
model |
str | LLMProvider | None |
None |
Model name or provider instance. None = auto-detect from env |
instruction |
str |
None |
System instruction for the agent |
tools |
list |
None |
Additional tools (built-in tools always included) |
skills |
str | list |
auto-discover | Skill directories to load |
memory |
str | list |
auto-discover | Memory files to inject |
streaming |
bool |
True |
Enable streaming responses |
sandbox |
SandboxBackend |
LocalBackend |
Pluggable sandbox for file/shell operations |
context_window |
int | None |
from env / 128000 |
Token budget for compaction |
max_tokens |
int | None |
from env / 8192 |
Max output tokens per response |
temperature |
float | None |
from env / 0.0 |
LLM temperature (model-specific overrides apply) |
trajectory |
bool | None |
from CLAW_TRAJECTORY / False |
Enable trajectory logging + run scoring |
rethink |
bool | None |
from CLAW_RETHINK / False |
Enable consecutive-failure detection |
on_event |
callable |
None |
Event callback |
Hooks & Access Control
agent = create_claw_agent("gemini-3-flash", instruction="Code reviewer")
# Block dangerous tools at runtime
agent.block_tools("execute", "write_file")
# Or whitelist only safe tools
agent.allow_only_tools("read_file", "ls", "grep", "glob")
# Inject context into every LLM call
agent.inject_context("Always respond in Spanish")
# Limit tool output size
agent.truncate_output(3000)
Advanced โ raw hooks:
agent.before_llm = lambda messages: messages # modify messages before LLM
agent.before_tool = lambda name, args: True # return False to block
agent.after_tool = lambda name, args, result: result # modify tool results
Auto-Discovery
The agent factory automatically discovers project files:
| What | Default locations checked |
|---|---|
| Memory | ./AGENTS.md, ./CLAWAGENTS.md |
| Skills | ./skills, ./.skills, ./skill, ./.skill, ./Skills |
Override with explicit paths:
agent = create_claw_agent(
"gpt-5",
memory="./docs/AGENTS.md",
skills=["./my-skills", "./shared-skills"]
)
Memory & Context Management
Project Memory
Loads AGENTS.md files and injects content into every LLM call. Use for project-level context and conventions.
Auto-Compaction
When the conversation exceeds 75% of CONTEXT_WINDOW:
- Full history offloaded to
.clawagents/history/compacted_*.json - Older messages summarized into
[Compacted History] - Last 6 messages kept intact
This provides unlimited conversation length with full audit trail preservation.
Gateway Server
Launch a production-ready HTTP server with one line:
from clawagents.gateway import start_gateway
start_gateway(port=3000)
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Synchronous agent invocation |
/chat/stream |
POST | SSE streaming (events: queued, started, agent, done, error) |
/queue |
GET | Queue status for all lanes |
/health |
GET | Health check |
Lane-Based Concurrency
4 lanes with configurable max_concurrent per lane:
mainโ primary user requestscronโ scheduled taskssubagentโ sub-agent delegationnestedโ nested sub-agent calls
Sandbox Backends
ClawAgents uses a pluggable sandbox protocol for all file and shell operations:
from clawagents.sandbox import InMemoryBackend, LocalBackend
# Production: real filesystem
agent = create_claw_agent("gpt-5", sandbox=LocalBackend())
# Testing: pure in-memory VFS
mem = InMemoryBackend()
mem.seed({"src/main.py": "print('hello')", "README.md": "# My Project"})
agent = create_claw_agent("gpt-5", sandbox=mem)
snapshot = mem.snapshot() # deterministic state capture
Environment Variables
| Variable | Default | Description |
|---|---|---|
PROVIDER |
auto-detect | openai or gemini |
OPENAI_API_KEY |
โ | OpenAI API key |
OPENAI_MODEL |
gpt-5-nano |
OpenAI model |
GEMINI_API_KEY |
โ | Gemini API key |
GEMINI_MODEL |
gemini-3-flash-preview |
Gemini model |
STREAMING |
1 |
1 = enabled, 0 = disabled |
CONTEXT_WINDOW |
128000 |
Token budget for compaction |
MAX_TOKENS |
8192 |
Max output tokens per response (sent as max_completion_tokens for OpenAI, max_output_tokens for Gemini) |
TEMPERATURE |
0.0 |
LLM temperature. Automatically overridden for models that require a fixed value (e.g. GPT-5 family โ 1.0, o1/o3 series โ 1.0) |
CLAW_TRAJECTORY |
0 |
1 = enable trajectory logging. Logs every turn and scores each run to .clawagents/trajectories/runs.jsonl |
CLAW_RETHINK |
0 |
1 = enable consecutive-failure detection. Injects a "rethink" prompt after 3 consecutive meaningful tool failures |
Testing
# Install with dev dependencies
pip install -e ".[dev]"
# Run all tests
python -m pytest tests/ -v
# Run benchmarks (requires API keys)
python -m pytest tests/ -v -m benchmark
Changelog
v5.10.0 โ Discrete Reward Bands & Weighted Scoring
| Feature | Description |
|---|---|
| ๐ฏ Discrete reward bands | Run scores mapped to -1 โฆ +3 bands (inspired by CUDA-Agent PPO reward shaping) |
| โ๏ธ Weighted execution scoring | execute, shell, run_code weighted 2ร higher than generic tools |
| ๐ท๏ธ Run quality grading | Each run classified as clean, noisy, or failed for trajectory filtering |
| ๐ก๏ธ Gameable tool exclusion | think, todolist, use_skill, etc. excluded from scoring to prevent reward hacking |
v5.9.0 โ Trajectory Logging & Rethink
| Feature | Description |
|---|---|
| ๐ Trajectory logging | Structured recording of every turn, tool call, and outcome to runs.jsonl |
| ๐ Consecutive-failure rethink | After 3 consecutive meaningful failures, injects a system "rethink" prompt |
| ๐๏ธ Opt-in flags | trajectory=True / CLAW_TRAJECTORY=1 and rethink=True / CLAW_RETHINK=1 |
v5.8.0 โ JSON Resilience
| Feature | Description |
|---|---|
| ๐ง JSON repair | _repair_json() utility fixes truncated JSON from hitting max_completion_tokens |
| ๐ Truncated JSON retry | Detects incomplete JSON tool calls and prompts the LLM to resend |
v5.7.0 โ Model-Specific Temperature
| Feature | Description |
|---|---|
| ๐ก๏ธ Fixed-temperature models | GPT-5 family and o1/o3/o4 series auto-override to temperature=1.0 |
| ๐ก๏ธ Configurable temperature | TEMPERATURE env var + temperature parameter on create_claw_agent |
v5.6.0 โ LLM Parameter Fixes
| Feature | Description |
|---|---|
๐ max_completion_tokens |
OpenAI calls now use max_completion_tokens (replacing deprecated max_tokens) |
๐ max_output_tokens |
Gemini calls now pass max_output_tokens correctly |
| โ๏ธ Config priority | Explicit param > .env > default โ no more shadowing of env values |
v5.5.0 โ Foundation
| Feature | Description |
|---|---|
| ๐ Pluggable Sandbox | SandboxBackend protocol with LocalBackend + InMemoryBackend |
| ๐ Gateway Server | FastAPI server with SSE streaming and 4-lane queue |
| ๐๏ธ Advanced FS Tools | tree, diff, insert_lines |
| ๐ง Think Tool | Structured reasoning without side effects |
| ๐ Web Fetch | URL fetching with HTML cleanup |
| ๐ฌ Ask User | Interactive stdin-based input |
| ๐ History Offloading | Full audit trail preserved after compaction |
| ๐ Tool Access Control | block_tools() / allow_only_tools() at runtime |
| ๐ Context Injection | inject_context() hook for every LLM call |
| โ๏ธ Output Truncation | truncate_output() to cap tool output size |
Trajectory Logging & RL-Inspired Scoring
ClawAgents includes an optional trajectory system inspired by reinforcement learning techniques from CUDA-Agent and OpenClaw-RL. Enable it with trajectory=True or CLAW_TRAJECTORY=1.
What gets logged
Every agent run records:
- Turn-level data: tool calls, arguments, success/failure, output previews
- Weighted turn scores: execution tools (shell, code runners) weighted 2ร higher than generic tools
- Run summary: total turns, tool calls, successes/failures, elapsed time
Discrete reward bands
Each run receives a score from -1 to +3:
| Score | Meaning |
|---|---|
| +3 | All tools succeeded, task completed cleanly |
| +2 | Minor hiccups but overall success |
| +1 | Partial success with some failures |
| 0 | Inconclusive โ mixed results |
| -1 | Majority of tool calls failed |
Quality grading
Runs are classified for downstream filtering:
| Quality | Criteria |
|---|---|
clean |
Score โฅ 2 and โค 2 mid-run failures |
noisy |
Score โฅ 0 but too many mid-run failures |
failed |
Score < 0 |
Anti-gaming protections
Tools like think, todolist, use_skill, list_skills, and update_todo are excluded from scoring โ they can't inflate success rates.
Consecutive-failure rethink
With rethink=True or CLAW_RETHINK=1, the agent monitors tool outcomes in real-time. After 3 consecutive meaningful failures, it injects a system message:
"You have had 3 consecutive tool failures. Stop and rethink your approach before continuing."
This simple mechanism prevents the agent from spiraling into repeated failed attempts.
Output
Run summaries are appended to .clawagents/trajectories/runs.jsonl:
{
"run_id": "a1b2c3d4",
"model": "gpt-5-mini",
"total_turns": 8,
"tool_calls": 12,
"successes": 10,
"failures": 2,
"run_score": 2,
"quality": "clean",
"elapsed_ms": 45230,
"turns": [...]
}
Roadmap
- Docker sandbox backend (protocol ready)
- Semantic browser automation (accessibility tree)
- Prompt caching (Anthropic-style)
- Persistent memory learning from trajectory data
- Post-run self-analysis (skill extraction from successful runs)
- Trajectory logging + discrete reward bands โ (v5.9โ5.10)
- Consecutive-failure rethink injection โ (v5.9)
- Weighted execution scoring + quality grading โ (v5.10)
- JSON repair + truncated JSON retry โ (v5.8)
- Model-specific temperature override โ (v5.7)
- Configurable temperature / max_completion_tokens โ (v5.6)
- Pluggable sandbox backend โ (v5.5)
- Lane-based queue serialization โ (v5.5)
- Skill progressive disclosure โ (v5.5)
- Gateway HTTP server โ (v5.5)
License
MIT
Built with ๐ฆ by the ClawAgents team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clawagents-5.10.1.tar.gz.
File metadata
- Download URL: clawagents-5.10.1.tar.gz
- Upload date:
- Size: 98.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f2aed4d3437c47dc7da1073c158ae3f1cfe7601b2595075d69250a1bcd5ec30
|
|
| MD5 |
07a30c05134b0a9479a3f46f1e900098
|
|
| BLAKE2b-256 |
3aa942c2e30b8dbb5754940ca9a95c9b933c1f593664553addbce9b7015d24a3
|
File details
Details for the file clawagents-5.10.1-py3-none-any.whl.
File metadata
- Download URL: clawagents-5.10.1-py3-none-any.whl
- Upload date:
- Size: 73.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a49967512775db339e94bcdee03cb58478b594857837f88fd7495eb846ae0701
|
|
| MD5 |
ce4ee15cf54c1e7a420817d1e4625412
|
|
| BLAKE2b-256 |
57ec16ea86276fbb49ba3727a3c2724cfa1c576cd8aa8633661fc21b0957dd51
|