Skip to main content

Python port of the ClawAgents framework

Project description

๐Ÿฆž ClawAgents

A lean, full-stack agentic AI framework โ€” ~2,500 LOC

Version Python License LOC


ClawAgents is a production-ready agentic framework that gives LLMs the ability to read, write, and execute code โ€” with built-in planning, memory, sandboxing, and a gateway server. It supports OpenAI GPT-5 and Google Gemini out of the box, with a pluggable provider architecture for any LLM.

Built by extracting and unifying the best architectural patterns from OpenClaw (~5,800 files) and DeepAgents (~1,400 LOC core), ClawAgents delivers the same power at a fraction of the complexity.

Installation

pip install clawagents

Version 5.11.0 โ€” Latest stable release (February 2026)


Quick Start

1. Configure your environment

Create a .env file:

PROVIDER=gemini                    # or "openai"
GEMINI_API_KEY=AIza...             # Your Gemini API key
GEMINI_MODEL=gemini-3-flash-preview
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=0                      # Model-specific overrides apply (see below)

# Optional: RL-inspired agent improvements
CLAW_TRAJECTORY=1                  # Enable trajectory logging + scoring
CLAW_RETHINK=1                     # Enable consecutive-failure detection
OpenAI configuration
PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-nano
STREAMING=1
CONTEXT_WINDOW=128000
MAX_TOKENS=8192
TEMPERATURE=1                      # GPT-5 family requires temperature=1
CLAW_TRAJECTORY=1
CLAW_RETHINK=1

2. One-line agent

from clawagents import create_claw_agent

agent = create_claw_agent("gemini-3-flash")
result = await agent.invoke("List all Python files in src/")
print(result.result)

3. With custom instructions

agent = create_claw_agent(
    "gpt-5",
    instruction="You are a senior code reviewer. Be thorough and concise."
)
result = await agent.invoke("Review this codebase and suggest improvements")

4. With trajectory logging & rethink

agent = create_claw_agent(
    "gpt-5-mini",
    trajectory=True,   # logs every turn + scores the run
    rethink=True,       # auto-injects "rethink" after 3 consecutive failures
)
result = await agent.invoke("Refactor the auth module and add tests")
# Run summary written to .clawagents/trajectories/runs.jsonl

5. CLI mode

python -m clawagents --task "Find all TODO comments in the codebase"

๐Ÿ† Performance: ClawAgents vs Traditional Frameworks

ClawAgents v5.10 outperforms traditional multi-layer agentic frameworks through architectural simplicity. Here's how it stacks up against DeepAgents (LangGraph/LangChain-based) in head-to-head benchmarks.

Benchmark Results (February 2026)

TypeScript โ€” 5 tasks ร— 2 models ร— 2 frameworks (20/20 โœ…)

Framework Gemini-2.5-flash GPT-5-mini
ClawAgents v5.5 2.3s avg ยท 1.4 tools 13.6s avg ยท 1.4 tools
DeepAgents 2.5s avg ยท 1.8 tools 15.7s avg ยท 2.4 tools

Per-Task Breakdown

Task ClawAgents (Gemini) DeepAgents (Gemini) ClawAgents (GPT-5) DeepAgents (GPT-5)
File Listing 3.7s, 1 tool 1.9s, 1 tool 8.9s, 1 tool 8.4s, 1 tool
Read & Analyze 1.6s, 1 tool 3.6s, 3 tools 5.4s, 1 tool 13.0s, 2 tools
Write File 2.1s, 2 tools 2.6s, 2 tools 5.2s, 2 tools 7.5s, 2 tools
Multi-Step 3.4s, 3 tools 3.7s, 3 tools 46.2s, 3 tools 46.9s, 7 tools
Reasoning 0.7s, 0 tools 0.9s, 0 tools 2.3s, 0 tools 2.8s, 0 tools

Python โ€” 18/20 completed (DeepAgents hung on GPT-5 multi_step)

Task ClawAgents (Gemini) DeepAgents (Gemini) ClawAgents (GPT-5) DeepAgents (GPT-5)
File Listing 2.8s, 1 tool 1.0s, 0 tools* 9.9s, 1 tool 3.4s, 1 tool
Read & Analyze 2.0s, 1 tool 9.8s, 4 tools 5.5s, 1 tool 8.4s, 3 tools
Write File 2.0s, 2 tools 1.0s, 0 tools* 5.0s, 2 tools 9.3s, 3 tools
Multi-Step 4.1s, 3 tools 0.9s, 0 tools* 16.0s, 3 tools โŒ hung >5min
Reasoning 0.7s, 0 tools 1.0s, 0 tools โ€” โ€”

* DeepAgents 0-tool results mean the model answered without using filesystem tools โ€” faster but lower-quality (unverified answers). ClawAgents consistently uses tools to verify answers.

Why ClawAgents Wins

Traditional Stack (DeepAgents):           ClawAgents:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your Code              โ”‚               โ”‚  Your Code       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค               โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  LangGraph              โ”‚               โ”‚  ClawAgents      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค               โ”‚  (direct SDK)    โ”‚
โ”‚  LangChain              โ”‚               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                        โ”‚
โ”‚  ChatOpenAI / ChatGeminiโ”‚                        โ–ผ
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Responses API          โ”‚               โ”‚  Responses API   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        4 layers                                1 layer
Advantage Impact
Direct SDK calls (1 layer vs 4) Lower latency, fewer failure points
Working directory awareness Tools operate from CWD; DeepAgents has no CWD concept
Soft + hard loop detection Catches repetitive tool calls at 3 repeats, hard-stops at 6
Efficiency rules in system prompt ~30% reduction in redundant tool calls
Fewer tool calls overall 1.4 avg vs 1.8โ€“2.4 (20โ€“40% more efficient)
No OpenAI lock-in Native Gemini + OpenAI support with FallbackProvider chain

Feature Matrix

Feature ClawAgents v5.10 DeepAgents OpenClaw
ReAct loop โœ… โœ… โœ…
Tool loop detection โœ… soft + hard โŒ โœ…
Efficiency rules (system prompt) โœ… โŒ โŒ
Adaptive token estimation โœ… โŒ โŒ
Model-aware context budgeting โœ… โŒ โŒ
Pluggable sandbox backend โœ… โœ… โœ…
In-memory VFS (testing) โœ… โŒ โŒ
Sub-agent delegation โœ… โœ… โœ…
Planning / TodoList โœ… โœ… โŒ
Persistent memory (AGENTS.md) โœ… โœ… โœ…
Human-in-the-loop โœ… โœ… โœ…
Dangling tool call repair โœ… โœ… โŒ
Auto-summarization + offloading โœ… โœ… โœ…
Lane-based command queue โœ… โŒ โœ…
Gateway HTTP server + SSE โœ… โŒ โœ…
Tool access control โœ… โŒ โŒ
think tool (structured reasoning) โœ… โŒ โŒ
LangChain tool adapter โœ… N/A โŒ
Streaming with stall detection โœ… โŒ โœ…
Trajectory logging + run scoring โœ… โŒ โŒ
Consecutive-failure rethink โœ… โŒ โŒ
Discrete reward bands (RL-inspired) โœ… โŒ โŒ
Weighted execution scoring โœ… โŒ โŒ
Truncated JSON repair + retry โœ… โŒ โŒ
Model-specific temperature override โœ… โŒ โŒ

Architecture

Core Components (~2,500 LOC)

clawagents/
โ”œโ”€โ”€ agent.py            # ClawAgent class โ€” ReAct loop, hooks, compaction
โ”œโ”€โ”€ __main__.py          # CLI entrypoint
โ”œโ”€โ”€ config/              # Env-based configuration (incl. TEMPERATURE, CLAW_*)
โ”œโ”€โ”€ providers/           # LLM backends (OpenAI, Gemini, Fallback)
โ”‚   โ””โ”€โ”€ llm.py           # max_completion_tokens, temperature override, JSON repair
โ”œโ”€โ”€ tools/               # 14+ built-in tools
โ”‚   โ”œโ”€โ”€ filesystem.py    # ls, read_file, write_file, edit_file
โ”‚   โ”œโ”€โ”€ advanced_fs.py   # tree, diff, insert_lines
โ”‚   โ”œโ”€โ”€ search.py        # grep, glob
โ”‚   โ”œโ”€โ”€ execute.py       # Shell command execution
โ”‚   โ”œโ”€โ”€ planning.py      # write_todos, update_todo
โ”‚   โ”œโ”€โ”€ delegation.py    # Sub-agent task delegation
โ”‚   โ”œโ”€โ”€ think.py         # Structured reasoning (no side effects)
โ”‚   โ”œโ”€โ”€ web.py           # URL fetching with HTML cleanup
โ”‚   โ””โ”€โ”€ interactive.py   # ask_user (stdin-based)
โ”œโ”€โ”€ sandbox/             # Pluggable backend protocol
โ”‚   โ”œโ”€โ”€ protocol.py      # SandboxBackend interface (15+ methods)
โ”‚   โ”œโ”€โ”€ local.py         # LocalBackend (pathlib + asyncio)
โ”‚   โ””โ”€โ”€ in_memory.py     # InMemoryBackend (VFS for testing)
โ”œโ”€โ”€ trajectory/          # RL-inspired run analysis (v5.9+)
โ”‚   โ””โ”€โ”€ recorder.py      # TrajectoryRecorder, discrete scoring, quality grading
โ”œโ”€โ”€ gateway/             # Production HTTP server
โ”‚   โ”œโ”€โ”€ server.py        # FastAPI + SSE streaming
โ”‚   โ””โ”€โ”€ queue.py         # 4-lane FIFO command queue
โ”œโ”€โ”€ graph/               # Agent loop orchestration + failure tracking
โ”œโ”€โ”€ memory/              # AGENTS.md discovery + compaction
โ”œโ”€โ”€ process/             # Process management
โ””โ”€โ”€ logging/             # Structured logging

Built-in Tools

Every agent includes these โ€” no setup needed:

Tool Description
ls List directory with size + modified time
read_file Read file with line numbers + pagination
write_file Write/create file (auto-creates directories)
edit_file Replace text with pattern matching
grep Search โ€” single file or recursive with glob filter
glob Find files by pattern (**/*.py)
execute Shell command execution
tree Recursive directory tree with smart ignoring
diff Unified diff between two files
insert_lines Precise line-level insertion
think Structured reasoning without side effects
web_fetch URL fetching with HTML stripping (50KB cap)
write_todos Plan tasks as a checklist
update_todo Mark plan items complete
task Delegate to a sub-agent with isolated context
ask_user Interactive stdin-based user input
use_skill Load a skill's instructions (when skills exist)

Tool Examples

๐Ÿ“‚ Filesystem โ€” ls, read_file, write_file, edit_file

The agent calls tools by emitting JSON blocks. Here's what happens under the hood when you ask the agent to work with files:

# The agent autonomously emits tool calls like:

# List a directory
{"tool": "ls", "args": {"path": "src/"}}
# โ†’ Returns:  drwxr-xr-x  4.0 KB  2026-02-24  components/
#             -rw-r--r--  1.2 KB  2026-02-24  main.py

# Read a file with pagination
{"tool": "read_file", "args": {"path": "src/main.py", "offset": 0, "limit": 50}}
# โ†’ Returns:  1 | import asyncio
#             2 | from clawagents import create_claw_agent
#             ...

# Write a new file (parent directories auto-created)
{"tool": "write_file", "args": {"path": "src/utils/helpers.py", "content": "def greet(name):\n    return f'Hello, {name}!'"}}
# โ†’ Returns:  โœ… Wrote 45 bytes to src/utils/helpers.py

# Edit an existing file by pattern match
{"tool": "edit_file", "args": {
    "path": "src/main.py",
    "old": "print('hello')",
    "new": "print('Hello, World!')"
}}
# โ†’ Returns:  โœ… 1 replacement made in src/main.py
๐Ÿ” Search โ€” grep, glob
# Recursive grep across all Python files
{"tool": "grep", "args": {"pattern": "TODO", "path": "src/", "include": "*.py"}}
# โ†’ Returns:  src/agent.py:42:  # TODO: add retry logic
#             src/tools/web.py:15:  # TODO: handle redirects

# Single-file search
{"tool": "grep", "args": {"pattern": "class.*Tool", "path": "src/tools/registry.py"}}
# โ†’ Returns:  15: class ToolResult:
#             24: class Tool(Protocol):

# Find files by pattern
{"tool": "glob", "args": {"pattern": "**/*.md", "path": "."}}
# โ†’ Returns:  ./README.md (15.3 KB)
#             ./docs/ARCHITECTURE.md (4.1 KB)
#             ./AGENTS.md (892 B)
โšก Shell Execution
# Run any shell command
{"tool": "execute", "args": {"command": "python -m pytest tests/ -v"}}
# โ†’ Returns full stdout/stderr with exit code

# With custom timeout (in milliseconds)
{"tool": "execute", "args": {"command": "pip install requests", "timeout": 60000}}

# Dangerous commands are auto-blocked
{"tool": "execute", "args": {"command": "rm -rf /"}}
# โ†’ Error: Blocked potentially destructive command
๐Ÿง  Think โ€” structured reasoning
# The agent can reason without side effects
{"tool": "think", "args": {
    "thought": "The user wants me to refactor the database layer. Let me plan: 1) Read the current schema, 2) Identify coupled components, 3) Extract a repository pattern, 4) Update tests."
}}
# โ†’ [Thought recorded] โ€” no files touched, no commands run

This reduces unnecessary tool calls by giving the agent a structured space to plan.

๐Ÿ“‹ Planning โ€” write_todos, update_todo
# Create a structured plan
{"tool": "write_todos", "args": {
    "todos": ["Read the existing codebase", "Fix the auth bug", "Add unit tests", "Update docs"]
}}
# โ†’ ## Progress: 0/4 complete
#   0. [ ] Read the existing codebase
#   1. [ ] Fix the auth bug
#   2. [ ] Add unit tests
#   3. [ ] Update docs

# Mark steps complete as you go
{"tool": "update_todo", "args": {"index": 0}}
# โ†’ ## Progress: 1/4 complete
#   0. [x] Read the existing codebase
#   1. [ ] Fix the auth bug
#   ...
๐Ÿค– Sub-agent delegation
# Delegate to a fresh sub-agent with isolated context
{"tool": "task", "args": {
    "description": "Analyze all Python files in src/ and create a summary of the module structure",
    "max_iterations": 10
}}
# โ†’ [Sub-agent completed: 6 tool calls, 4 iterations]
#   The src/ directory contains 3 modules: ...

# With named specialized sub-agents (configured at creation)
{"tool": "task", "args": {
    "description": "Review this pull request for security issues",
    "agent": "security-reviewer"
}}

Registering named sub-agents:

from clawagents import create_claw_agent
from clawagents.tools.subagent import SubAgentSpec

agent = create_claw_agent(
    "gemini-3-flash",
    subagents=[
        SubAgentSpec(
            name="researcher",
            description="Deep research on a topic",
            system_prompt="You are a thorough researcher. Always cite sources.",
            max_iterations=15,
        ),
        SubAgentSpec(
            name="coder",
            description="Write and test code",
            system_prompt="You are a senior engineer. Write clean, tested code.",
            max_iterations=10,
        ),
    ],
)
๐ŸŒ Web Fetch
# Fetch and read a web page (HTML stripped automatically)
{"tool": "web_fetch", "args": {"url": "https://docs.python.org/3/library/asyncio.html"}}
# โ†’ [200] https://docs.python.org/3/library/asyncio.html
#   asyncio โ€” Asynchronous I/O ...

# Fetch a JSON API
{"tool": "web_fetch", "args": {"url": "https://api.github.com/repos/python/cpython", "timeout": 10}}
# โ†’ Returns raw JSON response

Custom Tools

Create your own tools by implementing the Tool protocol:

from clawagents import create_claw_agent
from clawagents.tools.registry import Tool, ToolResult

class DatabaseQueryTool:
    name = "query_db"
    description = "Run a read-only SQL query against the application database."
    parameters = {
        "sql": {"type": "string", "description": "The SQL SELECT query", "required": True},
        "limit": {"type": "number", "description": "Max rows to return. Default: 100"},
    }

    async def execute(self, args):
        sql = args.get("sql", "")
        limit = int(args.get("limit", 100))
        # ... your database logic here ...
        rows = await run_query(sql, limit=limit)
        return ToolResult(success=True, output=format_table(rows))

# Register custom tools alongside built-ins
agent = create_claw_agent("gpt-5", tools=[DatabaseQueryTool()])

You can also wrap LangChain tools directly:

from langchain_community.tools import WikipediaQueryRun

agent = create_claw_agent("gpt-5", tools=[WikipediaQueryRun()])
# LangChain tools are automatically adapted via LangChainToolAdapter

Skills System

Skills are reusable instruction sets that teach the agent domain-specific knowledge โ€” without polluting the system prompt. They use a progressive disclosure pattern: the agent loads skill instructions on demand via the use_skill tool.

Skill Directory Structure

your-project/
โ”œโ”€โ”€ skills/                  # Auto-discovered (or .skills/, skill/, .skill/, Skills/)
โ”‚   โ”œโ”€โ”€ code_review/
โ”‚   โ”‚   โ””โ”€โ”€ SKILL.md         # โ† Skill defined as a folder + SKILL.md
โ”‚   โ”œโ”€โ”€ sql_expert.md         # โ† Skill defined as a single .md file
โ”‚   โ””โ”€โ”€ deploy_checklist.md
โ”œโ”€โ”€ AGENTS.md                 # Project memory (auto-injected)
โ””โ”€โ”€ src/
    โ””โ”€โ”€ ...

Writing a Skill

Every skill is a Markdown file with optional YAML frontmatter:

Example 1 โ€” skills/code_review/SKILL.md

---
name: code_review
description: "Perform thorough code reviews following team standards"
allowed-tools: read_file grep glob think
---

# Code Review Skill

When reviewing code, follow these steps:

## 1. Structure Check
- Verify the file follows our module pattern (one class per file)
- Check imports are grouped: stdlib โ†’ third-party โ†’ local
- Ensure `__init__.py` exports are up to date

## 2. Logic Review
- Look for unhandled edge cases (empty inputs, None values)
- Verify error messages are actionable
- Check that async functions are properly awaited

## 3. Security
- No hardcoded secrets or API keys
- SQL queries use parameterized statements
- User input is sanitized before use

## 4. Output Format
Provide your review as:
- โœ… **Approved** โ€” no issues found
- โš ๏ธ **Changes requested** โ€” list specific issues with file:line references
- ๐Ÿšซ **Blocked** โ€” critical issues that must be fixed

Example 2 โ€” skills/sql_expert.md (single-file skill)

---
name: sql_expert
description: "Write optimized SQL queries for PostgreSQL"
allowed-tools: execute read_file think
---

# SQL Expert

You are a PostgreSQL expert. When writing queries:

## Rules
1. Always use explicit `JOIN` syntax (never implicit joins in WHERE)
2. Use CTEs (`WITH` clauses) for complex multi-step queries
3. Add `EXPLAIN ANALYZE` when the user asks about performance
4. Use parameterized queries โ€” never interpolate user values
5. Default to `LIMIT 100` unless the user specifies otherwise

## Patterns

### Pagination
Use keyset pagination for large tables:
```sql
SELECT * FROM events
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;

Aggregation

Always include the raw count alongside percentages:

SELECT
    status,
    COUNT(*) AS n,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM orders
GROUP BY status
ORDER BY n DESC;

**Example 3 โ€” `skills/deploy_checklist.md`**

```markdown
---
name: deploy_checklist
description: "Step-by-step production deployment checklist"
---

# Deployment Checklist

Before deploying to production, complete every step:

- [ ] All tests pass: `pytest tests/ -v`
- [ ] No lint errors: `ruff check src/`
- [ ] Version bumped in `pyproject.toml`
- [ ] CHANGELOG.md updated
- [ ] Docker image builds: `docker build -t app:latest .`
- [ ] Smoke test on staging environment
- [ ] Database migrations reviewed and tested
- [ ] Rollback plan documented

How Skills Work at Runtime

# Skills are auto-discovered from ./skills/ directory
agent = create_claw_agent("gemini-3-flash")

# Or specify custom skill directories
agent = create_claw_agent("gpt-5", skills=["./my-skills", "./shared-skills"])

When skills are available, the agent gets two additional tools:

# 1. List available skills
{"tool": "list_skills", "args": {}}
# โ†’ Available skills (3):
#   - **code_review**: Perform thorough code reviews following team standards
#     โ†’ Allowed tools: read_file, grep, glob, think
#   - **sql_expert**: Write optimized SQL queries for PostgreSQL
#     โ†’ Allowed tools: execute, read_file, think
#   - **deploy_checklist**: Step-by-step production deployment checklist

# 2. Load a specific skill's instructions
{"tool": "use_skill", "args": {"name": "sql_expert"}}
# โ†’ Returns the full skill content, injected into the agent's context

The agent decides on its own when to use a skill. If you ask it to "write a query to find all overdue orders," and a sql_expert skill exists, it will load the skill first, then write the query following those rules.


API Reference

create_claw_agent(model, instruction, ...)

Param Type Default Description
model str | LLMProvider | None None Model name or provider instance. None = auto-detect from env
instruction str None System instruction for the agent
tools list None Additional tools (built-in tools always included)
skills str | list auto-discover Skill directories to load
memory str | list auto-discover Memory files to inject
streaming bool True Enable streaming responses
sandbox SandboxBackend LocalBackend Pluggable sandbox for file/shell operations
context_window int | None from env / 128000 Token budget for compaction
max_tokens int | None from env / 8192 Max output tokens per response
temperature float | None from env / 0.0 LLM temperature (model-specific overrides apply)
trajectory bool | None from CLAW_TRAJECTORY / False Enable trajectory logging + run scoring
rethink bool | None from CLAW_RETHINK / False Enable consecutive-failure detection
max_iterations int | None from MAX_ITERATIONS / 200 Max tool rounds before the agent stops
preview_chars int | None from CLAW_PREVIEW_CHARS / 120 Max chars for tool-output previews in trajectory logs
response_chars int | None from CLAW_RESPONSE_CHARS / 500 Max chars for LLM response text in trajectory records
on_event callable None Event callback

Hooks & Access Control

agent = create_claw_agent("gemini-3-flash", instruction="Code reviewer")

# Block dangerous tools at runtime
agent.block_tools("execute", "write_file")

# Or whitelist only safe tools
agent.allow_only_tools("read_file", "ls", "grep", "glob")

# Inject context into every LLM call
agent.inject_context("Always respond in Spanish")

# Limit tool output size
agent.truncate_output(3000)

Advanced โ€” raw hooks:

agent.before_llm = lambda messages: messages        # modify messages before LLM
agent.before_tool = lambda name, args: True          # return False to block
agent.after_tool = lambda name, args, result: result # modify tool results

Auto-Discovery

The agent factory automatically discovers project files:

What Default locations checked
Memory ./AGENTS.md, ./CLAWAGENTS.md
Skills ./skills, ./.skills, ./skill, ./.skill, ./Skills

Override with explicit paths:

agent = create_claw_agent(
    "gpt-5",
    memory="./docs/AGENTS.md",
    skills=["./my-skills", "./shared-skills"]
)

Memory & Context Management

Project Memory

Loads AGENTS.md files and injects content into every LLM call. Use for project-level context and conventions.

Auto-Compaction

When the conversation exceeds 75% of CONTEXT_WINDOW:

  1. Full history offloaded to .clawagents/history/compacted_*.json
  2. Older messages summarized into [Compacted History]
  3. Last 6 messages kept intact

This provides unlimited conversation length with full audit trail preservation.


Gateway Server

Launch a production-ready HTTP server with one line:

from clawagents.gateway import start_gateway

start_gateway(port=3000)

Endpoints

Endpoint Method Description
/chat POST Synchronous agent invocation
/chat/stream POST SSE streaming (events: queued, started, agent, done, error)
/queue GET Queue status for all lanes
/health GET Health check

Lane-Based Concurrency

4 lanes with configurable max_concurrent per lane:

  • main โ€” primary user requests
  • cron โ€” scheduled tasks
  • subagent โ€” sub-agent delegation
  • nested โ€” nested sub-agent calls

Sandbox Backends

ClawAgents uses a pluggable sandbox protocol for all file and shell operations:

from clawagents.sandbox import InMemoryBackend, LocalBackend

# Production: real filesystem
agent = create_claw_agent("gpt-5", sandbox=LocalBackend())

# Testing: pure in-memory VFS
mem = InMemoryBackend()
mem.seed({"src/main.py": "print('hello')", "README.md": "# My Project"})
agent = create_claw_agent("gpt-5", sandbox=mem)
snapshot = mem.snapshot()  # deterministic state capture

Environment Variables

Variable Default Description
PROVIDER auto-detect openai or gemini
OPENAI_API_KEY โ€” OpenAI API key
OPENAI_MODEL gpt-5-nano OpenAI model
GEMINI_API_KEY โ€” Gemini API key
GEMINI_MODEL gemini-3-flash-preview Gemini model
STREAMING 1 1 = enabled, 0 = disabled
CONTEXT_WINDOW 128000 Token budget for compaction
MAX_TOKENS 8192 Max output tokens per response (sent as max_completion_tokens for OpenAI, max_output_tokens for Gemini)
TEMPERATURE 0.0 LLM temperature. Automatically overridden for models that require a fixed value (e.g. GPT-5 family โ†’ 1.0, o1/o3 series โ†’ 1.0)
CLAW_TRAJECTORY 0 1 = enable trajectory logging. Logs every turn and scores each run to .clawagents/trajectories/runs.jsonl
CLAW_RETHINK 0 1 = enable consecutive-failure detection. Injects a "rethink" prompt after 3 consecutive meaningful tool failures
MAX_ITERATIONS 200 Max tool rounds before the agent stops. Override per-run via agent.invoke(task, max_iterations=N)
CLAW_PREVIEW_CHARS 120 Max chars for tool-output previews in trajectory logs and events
CLAW_RESPONSE_CHARS 500 Max chars for LLM response text stored in trajectory records

Testing

# Install with dev dependencies
pip install -e ".[dev]"

# Run all tests
python -m pytest tests/ -v

# Run benchmarks (requires API keys)
python -m pytest tests/ -v -m benchmark

Changelog

v5.11.0 โ€” Configurable Limits

Feature Description
๐Ÿ”ข max_iterations Now settable at construction or via MAX_ITERATIONS env (default 200, was hardcoded in caller)
๐Ÿ“ preview_chars Tool-output preview length configurable via CLAW_PREVIEW_CHARS env (default 120)
๐Ÿ“„ response_chars Response text length in trajectory records via CLAW_RESPONSE_CHARS env (default 500)
โš™๏ธ Priority Explicit param > env var > default for all three

v5.10.0 โ€” Discrete Reward Bands & Weighted Scoring

Feature Description
๐ŸŽฏ Discrete reward bands Run scores mapped to -1 โ€ฆ +3 bands (inspired by CUDA-Agent PPO reward shaping)
โš–๏ธ Weighted execution scoring execute, shell, run_code weighted 2ร— higher than generic tools
๐Ÿท๏ธ Run quality grading Each run classified as clean, noisy, or failed for trajectory filtering
๐Ÿ›ก๏ธ Gameable tool exclusion think, todolist, use_skill, etc. excluded from scoring to prevent reward hacking

v5.9.0 โ€” Trajectory Logging & Rethink

Feature Description
๐Ÿ“Š Trajectory logging Structured recording of every turn, tool call, and outcome to runs.jsonl
๐Ÿ”„ Consecutive-failure rethink After 3 consecutive meaningful failures, injects a system "rethink" prompt
๐ŸŽ›๏ธ Opt-in flags trajectory=True / CLAW_TRAJECTORY=1 and rethink=True / CLAW_RETHINK=1

v5.8.0 โ€” JSON Resilience

Feature Description
๐Ÿ”ง JSON repair _repair_json() utility fixes truncated JSON from hitting max_completion_tokens
๐Ÿ” Truncated JSON retry Detects incomplete JSON tool calls and prompts the LLM to resend

v5.7.0 โ€” Model-Specific Temperature

Feature Description
๐ŸŒก๏ธ Fixed-temperature models GPT-5 family and o1/o3/o4 series auto-override to temperature=1.0
๐ŸŒก๏ธ Configurable temperature TEMPERATURE env var + temperature parameter on create_claw_agent

v5.6.0 โ€” LLM Parameter Fixes

Feature Description
๐Ÿ”‘ max_completion_tokens OpenAI calls now use max_completion_tokens (replacing deprecated max_tokens)
๐Ÿ”‘ max_output_tokens Gemini calls now pass max_output_tokens correctly
โš™๏ธ Config priority Explicit param > .env > default โ€” no more shadowing of env values

v5.5.0 โ€” Foundation

Feature Description
๐Ÿ”Œ Pluggable Sandbox SandboxBackend protocol with LocalBackend + InMemoryBackend
๐ŸŒ Gateway Server FastAPI server with SSE streaming and 4-lane queue
๐Ÿ—‚๏ธ Advanced FS Tools tree, diff, insert_lines
๐Ÿง  Think Tool Structured reasoning without side effects
๐ŸŒ Web Fetch URL fetching with HTML cleanup
๐Ÿ’ฌ Ask User Interactive stdin-based input
๐Ÿ“œ History Offloading Full audit trail preserved after compaction
๐Ÿ”’ Tool Access Control block_tools() / allow_only_tools() at runtime
๐Ÿ’‰ Context Injection inject_context() hook for every LLM call
โœ‚๏ธ Output Truncation truncate_output() to cap tool output size

Trajectory Logging & RL-Inspired Scoring

ClawAgents includes an optional trajectory system inspired by reinforcement learning techniques from CUDA-Agent and OpenClaw-RL. Enable it with trajectory=True or CLAW_TRAJECTORY=1.

What gets logged

Every agent run records:

  • Turn-level data: tool calls, arguments, success/failure, output previews
  • Weighted turn scores: execution tools (shell, code runners) weighted 2ร— higher than generic tools
  • Run summary: total turns, tool calls, successes/failures, elapsed time

Discrete reward bands

Each run receives a score from -1 to +3:

Score Meaning
+3 All tools succeeded, task completed cleanly
+2 Minor hiccups but overall success
+1 Partial success with some failures
0 Inconclusive โ€” mixed results
-1 Majority of tool calls failed

Quality grading

Runs are classified for downstream filtering:

Quality Criteria
clean Score โ‰ฅ 2 and โ‰ค 2 mid-run failures
noisy Score โ‰ฅ 0 but too many mid-run failures
failed Score < 0

Anti-gaming protections

Tools like think, todolist, use_skill, list_skills, and update_todo are excluded from scoring โ€” they can't inflate success rates.

Consecutive-failure rethink

With rethink=True or CLAW_RETHINK=1, the agent monitors tool outcomes in real-time. After 3 consecutive meaningful failures, it injects a system message:

"You have had 3 consecutive tool failures. Stop and rethink your approach before continuing."

This simple mechanism prevents the agent from spiraling into repeated failed attempts.

Output

Run summaries are appended to .clawagents/trajectories/runs.jsonl:

{
  "run_id": "a1b2c3d4",
  "model": "gpt-5-mini",
  "total_turns": 8,
  "tool_calls": 12,
  "successes": 10,
  "failures": 2,
  "run_score": 2,
  "quality": "clean",
  "elapsed_ms": 45230,
  "turns": [...]
}

Roadmap

  • Docker sandbox backend (protocol ready)
  • Semantic browser automation (accessibility tree)
  • Prompt caching (Anthropic-style)
  • Persistent memory learning from trajectory data
  • Post-run self-analysis (skill extraction from successful runs)
  • Trajectory logging + discrete reward bands โœ… (v5.9โ€“5.10)
  • Consecutive-failure rethink injection โœ… (v5.9)
  • Weighted execution scoring + quality grading โœ… (v5.10)
  • JSON repair + truncated JSON retry โœ… (v5.8)
  • Model-specific temperature override โœ… (v5.7)
  • Configurable temperature / max_completion_tokens โœ… (v5.6)
  • Pluggable sandbox backend โœ… (v5.5)
  • Lane-based queue serialization โœ… (v5.5)
  • Skill progressive disclosure โœ… (v5.5)
  • Gateway HTTP server โœ… (v5.5)

License

MIT


Built with ๐Ÿฆž by the ClawAgents team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawagents-5.11.1.tar.gz (99.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clawagents-5.11.1-py3-none-any.whl (73.7 kB view details)

Uploaded Python 3

File details

Details for the file clawagents-5.11.1.tar.gz.

File metadata

  • Download URL: clawagents-5.11.1.tar.gz
  • Upload date:
  • Size: 99.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for clawagents-5.11.1.tar.gz
Algorithm Hash digest
SHA256 26ba7f73cea52d7a4aca96f25ed2fd3515e1f2c13b757b0dc156ef0238a62c80
MD5 4cdf96b4bdf616286d33fea70d9c8e52
BLAKE2b-256 ef053955f202cca8ea4cd6d42fd1a8037bab40513cfd92c0319766de6bb05642

See more details on using hashes here.

File details

Details for the file clawagents-5.11.1-py3-none-any.whl.

File metadata

  • Download URL: clawagents-5.11.1-py3-none-any.whl
  • Upload date:
  • Size: 73.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for clawagents-5.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c254eb16d9f6c914d5d4e00ade217cbf2933f8e92e01df29fc613ec125a260d0
MD5 eb241fcf0e0697685464344c525bbe07
BLAKE2b-256 0284c589f6ea4cdc2796f4ec00ff2e79baec00c897f1f3ed27c37afce792d8ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page