Python port of the ClawAgents framework

Project description

🦞 ClawAgents

A lean, full-stack agentic AI framework — ~2,500 LOC

Version Python License LOC

ClawAgents is a production-ready agentic framework that gives LLMs the ability to read, write, and execute code — with built-in planning, memory, sandboxing, and a gateway server. It supports OpenAI GPT-5 and Google Gemini out of the box, with a pluggable provider architecture for any LLM.

Built by extracting and unifying the best architectural patterns from OpenClaw (~5,800 files) and DeepAgents (~1,400 LOC core), ClawAgents delivers the same power at a fraction of the complexity.

Installation

pip install clawagents

Version 5.11.0 — Latest stable release (February 2026)

Quick Start

1. Configure your environment

Create a .env file:

PROVIDER=gemini                    # or "openai"
GEMINI_API_KEY=AIza...             # Your Gemini API key
GEMINI_MODEL=gemini-3-flash-preview
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=0                      # Model-specific overrides apply (see below)

# Optional: RL-inspired agent improvements
CLAW_TRAJECTORY=1                  # Enable trajectory logging + scoring
CLAW_RETHINK=1                     # Enable consecutive-failure detection

OpenAI configuration

PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-nano
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=1                      # GPT-5 family requires temperature=1
CLAW_TRAJECTORY=1
CLAW_RETHINK=1

2. One-line agent

from clawagents import create_claw_agent

agent = create_claw_agent("gemini-3-flash")
result = await agent.invoke("List all Python files in src/")
print(result.result)

3. With custom instructions

agent = create_claw_agent(
    "gpt-5",
    instruction="You are a senior code reviewer. Be thorough and concise."
)
result = await agent.invoke("Review this codebase and suggest improvements")

4. With trajectory logging & rethink

agent = create_claw_agent(
    "gpt-5-mini",
    trajectory=True,   # logs every turn + scores the run
    rethink=True,       # auto-injects "rethink" after 3 consecutive failures
)
result = await agent.invoke("Refactor the auth module and add tests")
# Run summary written to .clawagents/trajectories/runs.jsonl

5. CLI mode

python -m clawagents --task "Find all TODO comments in the codebase"

🏆 Performance: ClawAgents vs Traditional Frameworks

ClawAgents v5.10 outperforms traditional multi-layer agentic frameworks through architectural simplicity. Here's how it stacks up against DeepAgents (LangGraph/LangChain-based) in head-to-head benchmarks.

Benchmark Results (February 2026)

TypeScript — 5 tasks × 2 models × 2 frameworks (20/20 ✅)

Framework	Gemini-2.5-flash	GPT-5-mini
ClawAgents v5.5	2.3s avg · 1.4 tools	13.6s avg · 1.4 tools
DeepAgents	2.5s avg · 1.8 tools	15.7s avg · 2.4 tools

Per-Task Breakdown

Task	ClawAgents (Gemini)	DeepAgents (Gemini)	ClawAgents (GPT-5)	DeepAgents (GPT-5)
File Listing	3.7s, 1 tool	1.9s, 1 tool	8.9s, 1 tool	8.4s, 1 tool
Read & Analyze	1.6s, 1 tool	3.6s, 3 tools	5.4s, 1 tool	13.0s, 2 tools
Write File	2.1s, 2 tools	2.6s, 2 tools	5.2s, 2 tools	7.5s, 2 tools
Multi-Step	3.4s, 3 tools	3.7s, 3 tools	46.2s, 3 tools	46.9s, 7 tools
Reasoning	0.7s, 0 tools	0.9s, 0 tools	2.3s, 0 tools	2.8s, 0 tools

Python — 18/20 completed (DeepAgents hung on GPT-5 multi_step)

Task	ClawAgents (Gemini)	DeepAgents (Gemini)	ClawAgents (GPT-5)	DeepAgents (GPT-5)
File Listing	2.8s, 1 tool	1.0s, 0 tools*	9.9s, 1 tool	3.4s, 1 tool
Read & Analyze	2.0s, 1 tool	9.8s, 4 tools	5.5s, 1 tool	8.4s, 3 tools
Write File	2.0s, 2 tools	1.0s, 0 tools*	5.0s, 2 tools	9.3s, 3 tools
Multi-Step	4.1s, 3 tools	0.9s, 0 tools*	16.0s, 3 tools	❌ hung >5min
Reasoning	0.7s, 0 tools	1.0s, 0 tools	—	—

* DeepAgents 0-tool results mean the model answered without using filesystem tools — faster but lower-quality (unverified answers). ClawAgents consistently uses tools to verify answers.

Why ClawAgents Wins

Traditional Stack (DeepAgents):           ClawAgents:
┌─────────────────────────┐               ┌──────────────────┐
│  Your Code              │               │  Your Code       │
├─────────────────────────┤               ├──────────────────┤
│  LangGraph              │               │  ClawAgents      │
├─────────────────────────┤               │  (direct SDK)    │
│  LangChain              │               └────────┬─────────┘
├─────────────────────────┤                        │
│  ChatOpenAI / ChatGemini│                        ▼
├─────────────────────────┤               ┌──────────────────┐
│  Responses API          │               │  Responses API   │
└─────────────────────────┘               └──────────────────┘
        4 layers                                1 layer

Advantage	Impact
Direct SDK calls (1 layer vs 4)	Lower latency, fewer failure points
Working directory awareness	Tools operate from CWD; DeepAgents has no CWD concept
Soft + hard loop detection	Catches repetitive tool calls at 3 repeats, hard-stops at 6
Efficiency rules in system prompt	~30% reduction in redundant tool calls
Fewer tool calls overall	1.4 avg vs 1.8–2.4 (20–40% more efficient)
No OpenAI lock-in	Native Gemini + OpenAI support with FallbackProvider chain

Feature Matrix

Feature	ClawAgents v5.10	DeepAgents	OpenClaw
ReAct loop	✅	✅	✅
Tool loop detection	✅ soft + hard	❌	✅
Efficiency rules (system prompt)	✅	❌	❌
Adaptive token estimation	✅	❌	❌
Model-aware context budgeting	✅	❌	❌
Pluggable sandbox backend	✅	✅	✅
In-memory VFS (testing)	✅	❌	❌
Sub-agent delegation	✅	✅	✅
Planning / TodoList	✅	✅	❌
Persistent memory (AGENTS.md)	✅	✅	✅
Human-in-the-loop	✅	✅	✅
Dangling tool call repair	✅	✅	❌
Auto-summarization + offloading	✅	✅	✅
Lane-based command queue	✅	❌	✅
Gateway HTTP server + SSE	✅	❌	✅
Tool access control	✅	❌	❌
`think` tool (structured reasoning)	✅	❌	❌
LangChain tool adapter	✅	N/A	❌
Streaming with stall detection	✅	❌	✅
Trajectory logging + run scoring	✅	❌	❌
Consecutive-failure rethink	✅	❌	❌
Discrete reward bands (RL-inspired)	✅	❌	❌
Weighted execution scoring	✅	❌	❌
Truncated JSON repair + retry	✅	❌	❌
Model-specific temperature override	✅	❌	❌

Architecture

Core Components (~2,500 LOC)

clawagents/
├── agent.py            # ClawAgent class — ReAct loop, hooks, compaction
├── __main__.py          # CLI entrypoint
├── config/              # Env-based configuration (incl. TEMPERATURE, CLAW_*)
├── providers/           # LLM backends (OpenAI, Gemini, Fallback)
│   └── llm.py           # max_completion_tokens, temperature override, JSON repair
├── tools/               # 14+ built-in tools
│   ├── filesystem.py    # ls, read_file, write_file, edit_file
│   ├── advanced_fs.py   # tree, diff, insert_lines
│   ├── search.py        # grep, glob
│   ├── execute.py       # Shell command execution
│   ├── planning.py      # write_todos, update_todo
│   ├── delegation.py    # Sub-agent task delegation
│   ├── think.py         # Structured reasoning (no side effects)
│   ├── web.py           # URL fetching with HTML cleanup
│   └── interactive.py   # ask_user (stdin-based)
├── sandbox/             # Pluggable backend protocol
│   ├── protocol.py      # SandboxBackend interface (15+ methods)
│   ├── local.py         # LocalBackend (pathlib + asyncio)
│   └── in_memory.py     # InMemoryBackend (VFS for testing)
├── trajectory/          # RL-inspired run analysis (v5.9+)
│   └── recorder.py      # TrajectoryRecorder, discrete scoring, quality grading
├── gateway/             # Production HTTP server
│   ├── server.py        # FastAPI + SSE streaming
│   └── queue.py         # 4-lane FIFO command queue
├── graph/               # Agent loop orchestration + failure tracking
├── memory/              # AGENTS.md discovery + compaction
├── process/             # Process management
└── logging/             # Structured logging

Built-in Tools

Every agent includes these — no setup needed:

Tool	Description
`ls`	List directory with size + modified time
`read_file`	Read file with line numbers + pagination
`write_file`	Write/create file (auto-creates directories)
`edit_file`	Replace text with pattern matching
`grep`	Search — single file or recursive with glob filter
`glob`	Find files by pattern (`*/.py`)
`execute`	Shell command execution
`tree`	Recursive directory tree with smart ignoring
`diff`	Unified diff between two files
`insert_lines`	Precise line-level insertion
`think`	Structured reasoning without side effects
`web_fetch`	URL fetching with HTML stripping (50KB cap)
`write_todos`	Plan tasks as a checklist
`update_todo`	Mark plan items complete
`task`	Delegate to a sub-agent with isolated context
`ask_user`	Interactive stdin-based user input
`use_skill`	Load a skill's instructions (when skills exist)

Tool Examples

📂 Filesystem — ls, read_file, write_file, edit_file

The agent calls tools by emitting JSON blocks. Here's what happens under the hood when you ask the agent to work with files:

# The agent autonomously emits tool calls like:

# List a directory
{"tool": "ls", "args": {"path": "src/"}}
# → Returns:  drwxr-xr-x  4.0 KB  2026-02-24  components/
#             -rw-r--r--  1.2 KB  2026-02-24  main.py

# Read a file with pagination
{"tool": "read_file", "args": {"path": "src/main.py", "offset": 0, "limit": 50}}
# → Returns:  1 | import asyncio
#             2 | from clawagents import create_claw_agent
#             ...

# Write a new file (parent directories auto-created)
{"tool": "write_file", "args": {"path": "src/utils/helpers.py", "content": "def greet(name):\n    return f'Hello, {name}!'"}}
# → Returns:  ✅ Wrote 45 bytes to src/utils/helpers.py

# Edit an existing file by pattern match
{"tool": "edit_file", "args": {
    "path": "src/main.py",
    "old": "print('hello')",
    "new": "print('Hello, World!')"
}}
# → Returns:  ✅ 1 replacement made in src/main.py

🔍 Search — grep, glob

# Recursive grep across all Python files
{"tool": "grep", "args": {"pattern": "TODO", "path": "src/", "include": "*.py"}}
# → Returns:  src/agent.py:42:  # TODO: add retry logic
#             src/tools/web.py:15:  # TODO: handle redirects

# Single-file search
{"tool": "grep", "args": {"pattern": "class.*Tool", "path": "src/tools/registry.py"}}
# → Returns:  15: class ToolResult:
#             24: class Tool(Protocol):

# Find files by pattern
{"tool": "glob", "args": {"pattern": "**/*.md", "path": "."}}
# → Returns:  ./README.md (15.3 KB)
#             ./docs/ARCHITECTURE.md (4.1 KB)
#             ./AGENTS.md (892 B)

⚡ Shell Execution

# Run any shell command
{"tool": "execute", "args": {"command": "python -m pytest tests/ -v"}}
# → Returns full stdout/stderr with exit code

# With custom timeout (in milliseconds)
{"tool": "execute", "args": {"command": "pip install requests", "timeout": 60000}}

# Dangerous commands are auto-blocked
{"tool": "execute", "args": {"command": "rm -rf /"}}
# → Error: Blocked potentially destructive command

🧠 Think — structured reasoning

# The agent can reason without side effects
{"tool": "think", "args": {
    "thought": "The user wants me to refactor the database layer. Let me plan: 1) Read the current schema, 2) Identify coupled components, 3) Extract a repository pattern, 4) Update tests."
}}
# → [Thought recorded] — no files touched, no commands run

This reduces unnecessary tool calls by giving the agent a structured space to plan.

📋 Planning — write_todos, update_todo

# Create a structured plan
{"tool": "write_todos", "args": {
    "todos": ["Read the existing codebase", "Fix the auth bug", "Add unit tests", "Update docs"]
}}
# → ## Progress: 0/4 complete
#   0. [ ] Read the existing codebase
#   1. [ ] Fix the auth bug
#   2. [ ] Add unit tests
#   3. [ ] Update docs

# Mark steps complete as you go
{"tool": "update_todo", "args": {"index": 0}}
# → ## Progress: 1/4 complete
#   0. [x] Read the existing codebase
#   1. [ ] Fix the auth bug
#   ...

🤖 Sub-agent delegation

# Delegate to a fresh sub-agent with isolated context
{"tool": "task", "args": {
    "description": "Analyze all Python files in src/ and create a summary of the module structure",
    "max_iterations": 10
}}
# → [Sub-agent completed: 6 tool calls, 4 iterations]
#   The src/ directory contains 3 modules: ...

# With named specialized sub-agents (configured at creation)
{"tool": "task", "args": {
    "description": "Review this pull request for security issues",
    "agent": "security-reviewer"
}}

Registering named sub-agents:

from clawagents import create_claw_agent
from clawagents.tools.subagent import SubAgentSpec

agent = create_claw_agent(
    "gemini-3-flash",
    subagents=[
        SubAgentSpec(
            name="researcher",
            description="Deep research on a topic",
            system_prompt="You are a thorough researcher. Always cite sources.",
            max_iterations=15,
        ),
        SubAgentSpec(
            name="coder",
            description="Write and test code",
            system_prompt="You are a senior engineer. Write clean, tested code.",
            max_iterations=10,
        ),
    ],
)

🌐 Web Fetch

# Fetch and read a web page (HTML stripped automatically)
{"tool": "web_fetch", "args": {"url": "https://docs.python.org/3/library/asyncio.html"}}
# → [200] https://docs.python.org/3/library/asyncio.html
#   asyncio — Asynchronous I/O ...

# Fetch a JSON API
{"tool": "web_fetch", "args": {"url": "https://api.github.com/repos/python/cpython", "timeout": 10}}
# → Returns raw JSON response

Custom Tools

Create your own tools by implementing the Tool protocol:

from clawagents import create_claw_agent
from clawagents.tools.registry import Tool, ToolResult

class DatabaseQueryTool:
    name = "query_db"
    description = "Run a read-only SQL query against the application database."
    parameters = {
        "sql": {"type": "string", "description": "The SQL SELECT query", "required": True},
        "limit": {"type": "number", "description": "Max rows to return. Default: 100"},
    }

    async def execute(self, args):
        sql = args.get("sql", "")
        limit = int(args.get("limit", 100))
        # ... your database logic here ...
        rows = await run_query(sql, limit=limit)
        return ToolResult(success=True, output=format_table(rows))

# Register custom tools alongside built-ins
agent = create_claw_agent("gpt-5", tools=[DatabaseQueryTool()])

You can also wrap LangChain tools directly:

from langchain_community.tools import WikipediaQueryRun

agent = create_claw_agent("gpt-5", tools=[WikipediaQueryRun()])
# LangChain tools are automatically adapted via LangChainToolAdapter

Skills System

Skills are reusable instruction sets that teach the agent domain-specific knowledge — without polluting the system prompt. They use a progressive disclosure pattern: the agent loads skill instructions on demand via the use_skill tool.

Skill Directory Structure

your-project/
├── skills/                  # Auto-discovered (or .skills/, skill/, .skill/, Skills/)
│   ├── code_review/
│   │   └── SKILL.md         # ← Skill defined as a folder + SKILL.md
│   ├── sql_expert.md         # ← Skill defined as a single .md file
│   └── deploy_checklist.md
├── AGENTS.md                 # Project memory (auto-injected)
└── src/
    └── ...

Writing a Skill

Every skill is a Markdown file with optional YAML frontmatter:

Example 1 — skills/code_review/SKILL.md

---
name: code_review
description: "Perform thorough code reviews following team standards"
allowed-tools: read_file grep glob think
---

# Code Review Skill

When reviewing code, follow these steps:

## 1. Structure Check
- Verify the file follows our module pattern (one class per file)
- Check imports are grouped: stdlib → third-party → local
- Ensure `__init__.py` exports are up to date

## 2. Logic Review
- Look for unhandled edge cases (empty inputs, None values)
- Verify error messages are actionable
- Check that async functions are properly awaited

## 3. Security
- No hardcoded secrets or API keys
- SQL queries use parameterized statements
- User input is sanitized before use

## 4. Output Format
Provide your review as:
- ✅ **Approved** — no issues found
- ⚠️ **Changes requested** — list specific issues with file:line references
- 🚫 **Blocked** — critical issues that must be fixed

Example 2 — skills/sql_expert.md (single-file skill)

---
name: sql_expert
description: "Write optimized SQL queries for PostgreSQL"
allowed-tools: execute read_file think
---

# SQL Expert

You are a PostgreSQL expert. When writing queries:

## Rules
1. Always use explicit `JOIN` syntax (never implicit joins in WHERE)
2. Use CTEs (`WITH` clauses) for complex multi-step queries
3. Add `EXPLAIN ANALYZE` when the user asks about performance
4. Use parameterized queries — never interpolate user values
5. Default to `LIMIT 100` unless the user specifies otherwise

## Patterns

### Pagination
Use keyset pagination for large tables:
```sql
SELECT * FROM events
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;

Aggregation

Always include the raw count alongside percentages:

SELECT
    status,
    COUNT(*) AS n,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM orders
GROUP BY status
ORDER BY n DESC;


**Example 3 — `skills/deploy_checklist.md`**

```markdown
---
name: deploy_checklist
description: "Step-by-step production deployment checklist"
---

# Deployment Checklist

Before deploying to production, complete every step:

- [ ] All tests pass: `pytest tests/ -v`
- [ ] No lint errors: `ruff check src/`
- [ ] Version bumped in `pyproject.toml`
- [ ] CHANGELOG.md updated
- [ ] Docker image builds: `docker build -t app:latest .`
- [ ] Smoke test on staging environment
- [ ] Database migrations reviewed and tested
- [ ] Rollback plan documented

How Skills Work at Runtime

# Skills are auto-discovered from ./skills/ directory
agent = create_claw_agent("gemini-3-flash")

# Or specify custom skill directories
agent = create_claw_agent("gpt-5", skills=["./my-skills", "./shared-skills"])

When skills are available, the agent gets two additional tools:

# 1. List available skills
{"tool": "list_skills", "args": {}}
# → Available skills (3):
#   - **code_review**: Perform thorough code reviews following team standards
#     → Allowed tools: read_file, grep, glob, think
#   - **sql_expert**: Write optimized SQL queries for PostgreSQL
#     → Allowed tools: execute, read_file, think
#   - **deploy_checklist**: Step-by-step production deployment checklist

# 2. Load a specific skill's instructions
{"tool": "use_skill", "args": {"name": "sql_expert"}}
# → Returns the full skill content, injected into the agent's context

The agent decides on its own when to use a skill. If you ask it to "write a query to find all overdue orders," and a sql_expert skill exists, it will load the skill first, then write the query following those rules.

API Reference

`create_claw_agent(model, instruction, ...)`

Param	Type	Default	Description
`model`	`str \| LLMProvider \| None`	`None`	Model name or provider instance. `None` = auto-detect from env
`instruction`	`str`	`None`	System instruction for the agent
`tools`	`list`	`None`	Additional tools (built-in tools always included)
`skills`	`str \| list`	auto-discover	Skill directories to load
`memory`	`str \| list`	auto-discover	Memory files to inject
`streaming`	`bool`	`True`	Enable streaming responses
`sandbox`	`SandboxBackend`	`LocalBackend`	Pluggable sandbox for file/shell operations
`context_window`	`int \| None`	from env / `128000`	Token budget for compaction
`max_tokens`	`int \| None`	from env / `8192`	Max output tokens per response
`temperature`	`float \| None`	from env / `0.0`	LLM temperature (model-specific overrides apply)
`trajectory`	`bool \| None`	from `CLAW_TRAJECTORY` / `False`	Enable trajectory logging + run scoring
`rethink`	`bool \| None`	from `CLAW_RETHINK` / `False`	Enable consecutive-failure detection
`max_iterations`	`int \| None`	from `MAX_ITERATIONS` / `200`	Max tool rounds before the agent stops
`preview_chars`	`int \| None`	from `CLAW_PREVIEW_CHARS` / `120`	Max chars for tool-output previews in trajectory logs
`response_chars`	`int \| None`	from `CLAW_RESPONSE_CHARS` / `500`	Max chars for LLM response text in trajectory records
`on_event`	`callable`	`None`	Event callback

Hooks & Access Control

agent = create_claw_agent("gemini-3-flash", instruction="Code reviewer")

# Block dangerous tools at runtime
agent.block_tools("execute", "write_file")

# Or whitelist only safe tools
agent.allow_only_tools("read_file", "ls", "grep", "glob")

# Inject context into every LLM call
agent.inject_context("Always respond in Spanish")

# Limit tool output size
agent.truncate_output(3000)

Advanced — raw hooks:

agent.before_llm = lambda messages: messages        # modify messages before LLM
agent.before_tool = lambda name, args: True          # return False to block
agent.after_tool = lambda name, args, result: result # modify tool results

Auto-Discovery

The agent factory automatically discovers project files:

What	Default locations checked
Memory	`./AGENTS.md`, `./CLAWAGENTS.md`
Skills	`./skills`, `./.skills`, `./skill`, `./.skill`, `./Skills`

Override with explicit paths:

agent = create_claw_agent(
    "gpt-5",
    memory="./docs/AGENTS.md",
    skills=["./my-skills", "./shared-skills"]
)

Memory & Context Management

Project Memory

Loads AGENTS.md files and injects content into every LLM call. Use for project-level context and conventions.

Auto-Compaction

When the conversation exceeds 75% of CONTEXT_WINDOW:

Full history offloaded to .clawagents/history/compacted_*.json
Older messages summarized into [Compacted History]
Last 6 messages kept intact

This provides unlimited conversation length with full audit trail preservation.

Gateway Server

Launch a production-ready HTTP server with one line:

from clawagents.gateway import start_gateway

start_gateway(port=3000)

Endpoints

Endpoint	Method	Description
`/chat`	POST	Synchronous agent invocation
`/chat/stream`	POST	SSE streaming (events: `queued`, `started`, `agent`, `done`, `error`)
`/queue`	GET	Queue status for all lanes
`/health`	GET	Health check

Lane-Based Concurrency

4 lanes with configurable max_concurrent per lane:

main — primary user requests
cron — scheduled tasks
subagent — sub-agent delegation
nested — nested sub-agent calls

Sandbox Backends

ClawAgents uses a pluggable sandbox protocol for all file and shell operations:

from clawagents.sandbox import InMemoryBackend, LocalBackend

# Production: real filesystem
agent = create_claw_agent("gpt-5", sandbox=LocalBackend())

# Testing: pure in-memory VFS
mem = InMemoryBackend()
mem.seed({"src/main.py": "print('hello')", "README.md": "# My Project"})
agent = create_claw_agent("gpt-5", sandbox=mem)
snapshot = mem.snapshot()  # deterministic state capture

Environment Variables

Variable	Default	Description
`PROVIDER`	auto-detect	`openai` or `gemini`
`OPENAI_API_KEY`	—	OpenAI API key
`OPENAI_MODEL`	`gpt-5-nano`	OpenAI model
`GEMINI_API_KEY`	—	Gemini API key
`GEMINI_MODEL`	`gemini-3-flash-preview`	Gemini model
`STREAMING`	`1`	`1` = enabled, `0` = disabled
`CONTEXT_WINDOW`	`128000`	Token budget for compaction
`MAX_TOKENS`	`8192`	Max output tokens per response (sent as `max_completion_tokens` for OpenAI, `max_output_tokens` for Gemini)
`TEMPERATURE`	`0.0`	LLM temperature. Automatically overridden for models that require a fixed value (e.g. GPT-5 family → 1.0, o1/o3 series → 1.0)
`CLAW_TRAJECTORY`	`0`	`1` = enable trajectory logging. Logs every turn and scores each run to `.clawagents/trajectories/runs.jsonl`
`CLAW_RETHINK`	`0`	`1` = enable consecutive-failure detection. Injects a "rethink" prompt after 3 consecutive meaningful tool failures
`MAX_ITERATIONS`	`200`	Max tool rounds before the agent stops. Override per-run via `agent.invoke(task, max_iterations=N)`
`CLAW_PREVIEW_CHARS`	`120`	Max chars for tool-output previews in trajectory logs and events
`CLAW_RESPONSE_CHARS`	`500`	Max chars for LLM response text stored in trajectory records

Testing

# Install with dev dependencies
pip install -e ".[dev]"

# Run all tests
python -m pytest tests/ -v

# Run benchmarks (requires API keys)
python -m pytest tests/ -v -m benchmark

Changelog

v5.11.0 — Configurable Limits

Feature	Description
🔢 `max_iterations`	Now settable at construction or via `MAX_ITERATIONS` env (default 200, was hardcoded in caller)
📏 `preview_chars`	Tool-output preview length configurable via `CLAW_PREVIEW_CHARS` env (default 120)
📄 `response_chars`	Response text length in trajectory records via `CLAW_RESPONSE_CHARS` env (default 500)
⚙️ Priority	Explicit param > env var > default for all three

v5.10.0 — Discrete Reward Bands & Weighted Scoring

Feature	Description
🎯 Discrete reward bands	Run scores mapped to -1 … +3 bands (inspired by CUDA-Agent PPO reward shaping)
⚖️ Weighted execution scoring	`execute`, `shell`, `run_code` weighted 2× higher than generic tools
🏷️ Run quality grading	Each run classified as `clean`, `noisy`, or `failed` for trajectory filtering
🛡️ Gameable tool exclusion	`think`, `todolist`, `use_skill`, etc. excluded from scoring to prevent reward hacking

v5.9.0 — Trajectory Logging & Rethink

Feature	Description
📊 Trajectory logging	Structured recording of every turn, tool call, and outcome to `runs.jsonl`
🔄 Consecutive-failure rethink	After 3 consecutive meaningful failures, injects a system "rethink" prompt
🎛️ Opt-in flags	`trajectory=True` / `CLAW_TRAJECTORY=1` and `rethink=True` / `CLAW_RETHINK=1`

v5.8.0 — JSON Resilience

Feature	Description
🔧 JSON repair	`_repair_json()` utility fixes truncated JSON from hitting `max_completion_tokens`
🔁 Truncated JSON retry	Detects incomplete JSON tool calls and prompts the LLM to resend

v5.7.0 — Model-Specific Temperature

Feature	Description
🌡️ Fixed-temperature models	GPT-5 family and o1/o3/o4 series auto-override to `temperature=1.0`
🌡️ Configurable temperature	`TEMPERATURE` env var + `temperature` parameter on `create_claw_agent`

v5.6.0 — LLM Parameter Fixes

Feature	Description
🔑 `max_completion_tokens`	OpenAI calls now use `max_completion_tokens` (replacing deprecated `max_tokens`)
🔑 `max_output_tokens`	Gemini calls now pass `max_output_tokens` correctly
⚙️ Config priority	Explicit param > `.env` > default — no more shadowing of env values

v5.5.0 — Foundation

Feature	Description
🔌 Pluggable Sandbox	`SandboxBackend` protocol with `LocalBackend` + `InMemoryBackend`
🌐 Gateway Server	FastAPI server with SSE streaming and 4-lane queue
🗂️ Advanced FS Tools	`tree`, `diff`, `insert_lines`
🧠 Think Tool	Structured reasoning without side effects
🌍 Web Fetch	URL fetching with HTML cleanup
💬 Ask User	Interactive stdin-based input
📜 History Offloading	Full audit trail preserved after compaction
🔒 Tool Access Control	`block_tools()` / `allow_only_tools()` at runtime
💉 Context Injection	`inject_context()` hook for every LLM call
✂️ Output Truncation	`truncate_output()` to cap tool output size

Trajectory Logging & RL-Inspired Scoring

ClawAgents includes an optional trajectory system inspired by reinforcement learning techniques from CUDA-Agent and OpenClaw-RL. Enable it with trajectory=True or CLAW_TRAJECTORY=1.

What gets logged

Every agent run records:

Turn-level data: tool calls, arguments, success/failure, output previews
Weighted turn scores: execution tools (shell, code runners) weighted 2× higher than generic tools
Run summary: total turns, tool calls, successes/failures, elapsed time

Discrete reward bands

Each run receives a score from -1 to +3:

Score	Meaning
+3	All tools succeeded, task completed cleanly
+2	Minor hiccups but overall success
+1	Partial success with some failures
0	Inconclusive — mixed results
-1	Majority of tool calls failed

Quality grading

Runs are classified for downstream filtering:

Quality	Criteria
`clean`	Score ≥ 2 and ≤ 2 mid-run failures
`noisy`	Score ≥ 0 but too many mid-run failures
`failed`	Score < 0

Anti-gaming protections

Tools like think, todolist, use_skill, list_skills, and update_todo are excluded from scoring — they can't inflate success rates.

Consecutive-failure rethink

With rethink=True or CLAW_RETHINK=1, the agent monitors tool outcomes in real-time. After 3 consecutive meaningful failures, it injects a system message:

"You have had 3 consecutive tool failures. Stop and rethink your approach before continuing."

This simple mechanism prevents the agent from spiraling into repeated failed attempts.

Output

Run summaries are appended to .clawagents/trajectories/runs.jsonl:

{
  "run_id": "a1b2c3d4",
  "model": "gpt-5-mini",
  "total_turns": 8,
  "tool_calls": 12,
  "successes": 10,
  "failures": 2,
  "run_score": 2,
  "quality": "clean",
  "elapsed_ms": 45230,
  "turns": [...]
}

Roadmap

Docker sandbox backend (protocol ready)
Semantic browser automation (accessibility tree)
Prompt caching (Anthropic-style)
Persistent memory learning from trajectory data
Post-run self-analysis (skill extraction from successful runs)
Trajectory logging + discrete reward bands ✅ (v5.9–5.10)
Consecutive-failure rethink injection ✅ (v5.9)
Weighted execution scoring + quality grading ✅ (v5.10)
JSON repair + truncated JSON retry ✅ (v5.8)
Model-specific temperature override ✅ (v5.7)
Configurable temperature / max_completion_tokens ✅ (v5.6)
Pluggable sandbox backend ✅ (v5.5)
Lane-based queue serialization ✅ (v5.5)
Skill progressive disclosure ✅ (v5.5)
Gateway HTTP server ✅ (v5.5)

License

MIT

Built with 🦞 by the ClawAgents team

Project details

Release history Release notifications | RSS feed

6.7.1

Apr 26, 2026

6.7.0

Apr 26, 2026

6.6.4

Apr 26, 2026

6.6.3

Apr 26, 2026

6.6.2

Apr 26, 2026

6.6.1

Apr 26, 2026

6.6.0

Apr 25, 2026

6.5.0

Apr 25, 2026

6.4.1

Apr 25, 2026

6.4.0

Apr 25, 2026

6.3.0.post1

Apr 25, 2026

6.3.0

Apr 25, 2026

6.2.1

Apr 25, 2026

6.2.0

Apr 24, 2026

6.1.1

Apr 14, 2026

6.1.0

Apr 14, 2026

6.0.0

Apr 11, 2026

5.28.0

Apr 11, 2026

5.27.3

Apr 1, 2026

5.27.2

Apr 1, 2026

5.27.1

Apr 1, 2026

5.27.0

Apr 1, 2026

5.26.0

Mar 22, 2026

5.25.0

Mar 17, 2026

5.24.0

Mar 14, 2026

5.23.1

Mar 14, 2026

5.23.0

Mar 14, 2026

5.22.0

Mar 13, 2026

5.21.0

Mar 12, 2026

5.20.4

Mar 10, 2026

5.20.3

Mar 10, 2026

5.20.2

Mar 10, 2026

5.20.1

Mar 10, 2026

5.20.0

Mar 10, 2026

5.19.0

Mar 8, 2026

5.18.0

Mar 8, 2026

5.17.0

Mar 8, 2026

5.14.0

Mar 7, 2026

5.13.0

Mar 6, 2026

5.12.2

Mar 4, 2026

5.12.1

Mar 4, 2026

5.12.0

Mar 4, 2026

This version

5.11.1

Mar 4, 2026

5.11.0

Mar 4, 2026

5.10.1

Mar 3, 2026

5.10.0

Mar 3, 2026

5.9.0

Mar 3, 2026

5.8.0

Mar 3, 2026

5.7.0

Feb 26, 2026

5.6.0

Feb 26, 2026

5.5.0

Feb 22, 2026

5.0.0

Feb 22, 2026

4.3.2

Feb 22, 2026

4.3.1

Feb 22, 2026

4.3.0

Feb 22, 2026

4.2.0

Feb 22, 2026

4.1.0

Feb 22, 2026

4.0.0

Feb 22, 2026

3.3.0

Feb 22, 2026

3.2.0

Feb 21, 2026

3.1.1

Feb 21, 2026

3.1.0

Feb 21, 2026

3.0.0

Feb 21, 2026

2.7.0

Feb 21, 2026

2.6.0

Feb 21, 2026

2.1.0

Feb 21, 2026

2.0.0

Feb 21, 2026

1.5.1

Feb 21, 2026

1.5.0

Feb 21, 2026

1.4.0

Feb 21, 2026

1.3.0

Feb 21, 2026

1.2.0

Feb 21, 2026

1.1.0

Feb 21, 2026

1.0.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawagents-5.11.1.tar.gz (99.7 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clawagents-5.11.1-py3-none-any.whl (73.7 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file clawagents-5.11.1.tar.gz.

File metadata

Download URL: clawagents-5.11.1.tar.gz
Upload date: Mar 4, 2026
Size: 99.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for clawagents-5.11.1.tar.gz
Algorithm	Hash digest
SHA256	`26ba7f73cea52d7a4aca96f25ed2fd3515e1f2c13b757b0dc156ef0238a62c80`
MD5	`4cdf96b4bdf616286d33fea70d9c8e52`
BLAKE2b-256	`ef053955f202cca8ea4cd6d42fd1a8037bab40513cfd92c0319766de6bb05642`

See more details on using hashes here.

File details

Details for the file clawagents-5.11.1-py3-none-any.whl.

File metadata

Download URL: clawagents-5.11.1-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 73.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for clawagents-5.11.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c254eb16d9f6c914d5d4e00ade217cbf2933f8e92e01df29fc613ec125a260d0`
MD5	`eb241fcf0e0697685464344c525bbe07`
BLAKE2b-256	`0284c589f6ea4cdc2796f4ec00ff2e79baec00c897f1f3ed27c37afce792d8ac`

See more details on using hashes here.

clawagents 5.11.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🦞 ClawAgents

Installation

Quick Start

1. Configure your environment

2. One-line agent

3. With custom instructions

4. With trajectory logging & rethink

5. CLI mode

🏆 Performance: ClawAgents vs Traditional Frameworks

Benchmark Results (February 2026)

TypeScript — 5 tasks × 2 models × 2 frameworks (20/20 ✅)

Per-Task Breakdown

Python — 18/20 completed (DeepAgents hung on GPT-5 multi_step)

Why ClawAgents Wins

Feature Matrix

Architecture

Core Components (~2,500 LOC)

Built-in Tools

Tool Examples

Custom Tools

Skills System

Skill Directory Structure

Writing a Skill

Aggregation

How Skills Work at Runtime

API Reference

create_claw_agent(model, instruction, ...)

Hooks & Access Control

Auto-Discovery

Memory & Context Management

Project Memory

Auto-Compaction

Gateway Server

Endpoints

Lane-Based Concurrency

Sandbox Backends

Environment Variables

Testing

Changelog

v5.11.0 — Configurable Limits

v5.10.0 — Discrete Reward Bands & Weighted Scoring

v5.9.0 — Trajectory Logging & Rethink

v5.8.0 — JSON Resilience

v5.7.0 — Model-Specific Temperature

v5.6.0 — LLM Parameter Fixes

v5.5.0 — Foundation

Trajectory Logging & RL-Inspired Scoring

What gets logged

Discrete reward bands

Quality grading

Anti-gaming protections

Consecutive-failure rethink

Output

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`create_claw_agent(model, instruction, ...)`