Terminal AI agent with built-in execution tracing and observability
Project description
BlueClaw treats AI agents like debuggable programs, not black boxes.
Built on Strands Agents SDK
Quickstart • Tracing • Features • Models • Configuration • Architecture
What is BlueClaw?
BlueClaw is a terminal-based AI agent with built-in execution tracing, enabling developers to inspect, replay, and debug agent behavior step by step.
Most AI agents are black boxes — when something goes wrong, you don't know if it was the model reasoning, the tool input, the tool output, or a bad retry. BlueClaw records every tool call with timing, inputs, and outputs, then gives you CLI tools to understand what happened.
blueclaw> research the MCP ecosystem, focus on Python SDKs
● web_search({"query": "MCP Model Context Protocol Python SDK"})
✓ 1.2s
● http_request({"url": "https://modelcontextprotocol.io/..."})
✓ 0.8s
Done · 2 steps · 1840 tokens · $0.0073 · 4.1s
Quickstart
# Install
pip install -e .
# Initialize workspace
blueclaw init
# Set your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
# Start an interactive session
blueclaw
# Or run a single prompt
blueclaw run "summarize the latest Python 3.13 release notes"
Tracing & Observability
Every agent run is recorded as a structured JSON trace with per-step timing, tool inputs, outputs, and errors. Eight CLI commands let you inspect runs after the fact — no dashboards, no external services, no setup.
See what happened: trace graph
Quick view of the tool call sequence for any run.
$ blueclaw trace graph 20260315-054426
search for Python 3.13 new features
├── web_search (1ms) ✓ query: Python 3.13 new features
├── web_search (1ms) ✓ query: Python 3.13 new features list 2024
└── http_request (366ms) ✓ url: https://docs.python.org/3.13/whatsnew/3.13.html
Find the bottleneck: trace timeline
See where time actually goes — tool execution vs. model reasoning overhead.
$ blueclaw trace timeline 20260315-054426
Goal: search for Python 3.13 new features
Model: claude-sonnet-4-6 · 3 steps · 1840 tokens · $0.0073
# Tool Start Duration Cumulative Bar
1 web_search +0ms 1ms 1ms █
2 web_search +120ms 1ms 2ms █
3 http_request +250ms 366ms 368ms ████████████████████████████████████████
Tool time: 368ms · Wall time: 4100ms · Overhead: 3732ms (91%)
Understand why: trace explain
Feed a recorded trace to an LLM for post-hoc explanation. Useful when the agent took an unexpected path.
$ blueclaw trace explain 20260315-054426
The agent searched for Python 3.13 features, found the results too generic,
refined its query to include "list 2024", then fetched the official changelog
from docs.python.org. The two-step search pattern suggests the first results
didn't contain enough detail...
Post-hoc explanation · not the agent's actual reasoning
Compare two runs: trace diff
Did your prompt change make things better or worse?
$ blueclaw trace diff 20260315-054426 20260315-071830
Run A: 20260315-054426 Run B: 20260315-071830
Goal A: search for Python 3.13 new features
Goal B: search for Python 3.13 new features
Steps: 3 → 2 (-1)
Tokens: 1840 → 1200 (-640)
Cost: $0.0073 → $0.0048
Time: 368ms → 420ms (+52ms)
Debug step by step: trace replay
Interactive step-through — see inputs and outputs for each tool call.
$ blueclaw trace replay 20260315-054426
Step 1: web_search (1ms) ✓
input query: Python 3.13 new features
output: Found 10 results...
[Enter] next · [q] quit >
Track performance over time: trace stats
Aggregate metrics across all your runs. Answer "how is my agent performing?" at a glance.
$ blueclaw trace stats --since 7
Trace Stats · 23 runs · last 7 days
Overview
Total runs: 23
Total steps: 87
Avg steps/run: 3.8
Avg tokens/run: 2,450
Avg cost/run: $0.0082
Total cost: $0.19
Timing
Avg duration: 5.1s
Median duration: 4.2s
p95 duration: 12.3s
Avg tool time: 2.1s (41% of wall)
Top Tools (by frequency)
shell_command 34 calls (39%)
web_search 28 calls (32%)
http_request 18 calls (21%)
file_read 7 calls (8%)
Failed Steps (3 across 2 runs · 3.4% step failure rate)
timeout 2 (67%)
network 1 (33%)
Filter by model to compare providers:
$ blueclaw trace stats --model ollama/llama3
$ blueclaw trace stats --model claude-sonnet-4-6 --since 30
All trace commands
| Command | Use case |
|---|---|
trace list |
Find a run ID to inspect |
trace show <id> |
Detailed step table with timing |
trace graph <id> |
Quick tree view of tool sequence |
trace timeline <id> |
Find bottlenecks — where does time go? |
trace explain <id> |
LLM explains what happened and why |
trace diff <id1> <id2> |
Compare two runs (A/B test prompts) |
trace replay <id> |
Step-through debugger for tool calls |
trace stats |
Aggregate performance across all runs |
Features
- Execution tracing — structured JSON traces with full observability tooling (see above)
- Model-agnostic — swap between Claude, Ollama, OpenAI, Gemini with one flag
- Web search — DuckDuckGo search via
ddgs, returns top 5 results with titles, URLs, and snippets - Persistent memory —
CONTEXT.mdupdates in the background after each turn (instant exit),history.jsonllogs every run - Interactive + scripted modes —
blueclawfor chat,blueclaw run "..."for one-shot - Shell execution — sandboxed
shell_commandtool with deny-list, 30s timeout, and interactive approval - Workspace sandbox — path validation + destructive command deny-list
- Approval hooks — interactive confirmation for shell commands and new web domains
- Crash recovery — per-turn checkpoints in
.blueclaw/last_turn.md - Output truncation — 12k char limit prevents context blowout
- MCP support — bundled
pdf-mcpserver, custom stdio/SSE servers via config - Skill system — progressive loading, index in prompt, full content on demand
Model Support
# Anthropic (default)
blueclaw
# Ollama (local, no data leaves your machine)
blueclaw --model ollama/llama3
# OpenAI
blueclaw --model openai/gpt-4.1-mini
# Gemini via LiteLLM
blueclaw --model litellm/gemini/gemini-2.0-flash
Set API keys in .env:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
Commands
| Command | Description |
|---|---|
blueclaw |
Start interactive session |
blueclaw run "..." |
Execute a single prompt and exit |
blueclaw init |
Initialize workspace directory |
blueclaw history |
View past run history |
blueclaw trace list |
List recent execution traces |
blueclaw trace show <run_id> |
Show detailed trace for a run |
blueclaw trace explain <run_id> |
LLM-powered explanation of a recorded trace |
blueclaw trace graph <run_id> |
Tree view of tool call sequence |
blueclaw trace diff <id1> <id2> |
Compare two traces side by side |
blueclaw trace replay <run_id> |
Step through a trace interactively |
blueclaw trace timeline <run_id> |
Waterfall timeline with timing and overhead |
blueclaw trace stats |
Aggregate metrics across all traces |
blueclaw --version |
Print version |
blueclaw --model provider/model |
Override model for this session |
Configuration
blueclaw.yaml in your project root:
model:
provider: anthropic
model_id: claude-sonnet-4-6
workspace:
path: ~/blueclaw/workspace/
tools:
- web
- shell # sandboxed shell execution (enables gh, git, etc.)
- pdf
- mcp:https://localhost:8080/sse # custom MCP server
allowlist_domains:
- github.com
- docs.python.org
Architecture
Terminal input → cli.py → session.py → Strands Agent → Tools → workspace.py (sandbox) → observer.py (trace) → Response
| Module | Purpose | Lines |
|---|---|---|
cli.py |
Typer entrypoints, welcome banner, trace tooling | ~714 |
session.py |
Config, model factory, agent, chat loop, background context updater | ~537 |
workspace.py |
Sandbox enforcement, context/history/trace I/O | ~201 |
observer.py |
Structured tool tracing + output truncation | ~151 |
models.py |
Pydantic models, trace schema, cost calculation, error classification | ~124 |
tools/ |
Web, shell, MCP wiring (factory pattern) | ~155 |
approval.py |
Shell command + domain allowlist hooks | ~51 |
Workspace Structure
~/blueclaw/workspace/
├── CONTEXT.md # Persistent agent knowledge (human-editable)
└── .blueclaw/
├── history.jsonl # Append-only run log
├── last_turn.md # Crash recovery checkpoint
└── traces/ # Structured execution traces
└── 20260315-101201.json # One JSON file per run
Development
# Install in dev mode
pip install -e ".[dev]"
# Run tests
pytest
# Lint
flake8 blueclaw/ tests/
black --check blueclaw/ tests/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blueclaw-1.2.4.tar.gz.
File metadata
- Download URL: blueclaw-1.2.4.tar.gz
- Upload date:
- Size: 54.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
725e81d71437510b850778ebc81f4459c33d8436f05ee7fab6e9f9e2b75339fd
|
|
| MD5 |
ef1b381c63622d98f7557778d1cabc98
|
|
| BLAKE2b-256 |
f1a774dc5d48d07eda97eec209da7c2f49d1b358eac78d3fd6e66ba2fd58906f
|
File details
Details for the file blueclaw-1.2.4-py3-none-any.whl.
File metadata
- Download URL: blueclaw-1.2.4-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60ba13d4c765f2adb36e590130adfedff7ac9b72563f09b204e8edf99a1595b9
|
|
| MD5 |
97f0439286c3d24948fd26e418fce5fa
|
|
| BLAKE2b-256 |
0d8fb46189dff82fa65b396502d863c80dd7bdf6fc7f38ebf782050c415b104e
|