Skip to main content

Terminal AI agent with built-in execution tracing and observability

Project description

BlueClaw

BlueClaw treats AI agents like debuggable programs, not black boxes.
Built on Strands Agents SDK

QuickstartTracingFeaturesModelsConfigurationArchitecture

PyPI Version License Python Version GitHub Issues CI


What is BlueClaw?

BlueClaw is a terminal-based AI agent with built-in execution tracing, enabling developers to inspect, replay, and debug agent behavior step by step.

Most AI agents are black boxes — when something goes wrong, you don't know if it was the model reasoning, the tool input, the tool output, or a bad retry. BlueClaw records every tool call with timing, inputs, and outputs, then gives you CLI tools to understand what happened.

blueclaw> research the MCP ecosystem, focus on Python SDKs
● web_search({"query": "MCP Model Context Protocol Python SDK"})
  ✓ 1.2s
● http_request({"url": "https://modelcontextprotocol.io/..."})
  ✓ 0.8s
Done · 2 steps · 1840 tokens · $0.0073 · 4.1s

Quickstart

# Install
pip install -e .

# Initialize workspace
blueclaw init

# Set your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env

# Start an interactive session
blueclaw

# Or run a single prompt
blueclaw run "summarize the latest Python 3.13 release notes"

Tracing & Observability

Every agent run is recorded as a structured JSON trace with per-step timing, tool inputs, outputs, and errors. Eight CLI commands let you inspect runs after the fact — no dashboards, no external services, no setup.

See what happened: trace graph

Quick view of the tool call sequence for any run.

$ blueclaw trace graph 20260315-054426

search for Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features list 2024
└── http_request (366ms) ✓  url: https://docs.python.org/3.13/whatsnew/3.13.html

Find the bottleneck: trace timeline

See where time actually goes — tool execution vs. model reasoning overhead.

$ blueclaw trace timeline 20260315-054426

Goal: search for Python 3.13 new features
Model: claude-sonnet-4-6 · 3 steps · 1840 tokens · $0.0073

 #    Tool             Start     Duration  Cumulative  Bar
 1    web_search         +0ms       1ms         1ms    █
 2    web_search       +120ms       1ms         2ms    █
 3    http_request     +250ms     366ms       368ms    ████████████████████████████████████████

Tool time: 368ms · Wall time: 4100ms · Overhead: 3732ms (91%)

Understand why: trace explain

Feed a recorded trace to an LLM for post-hoc explanation. Useful when the agent took an unexpected path.

$ blueclaw trace explain 20260315-054426

The agent searched for Python 3.13 features, found the results too generic,
refined its query to include "list 2024", then fetched the official changelog
from docs.python.org. The two-step search pattern suggests the first results
didn't contain enough detail...

Post-hoc explanation · not the agent's actual reasoning

Compare two runs: trace diff

Did your prompt change make things better or worse?

$ blueclaw trace diff 20260315-054426 20260315-071830

Run A: 20260315-054426  Run B: 20260315-071830
Goal A: search for Python 3.13 new features
Goal B: search for Python 3.13 new features

Steps:  3 → 2 (-1)
Tokens: 1840 → 1200 (-640)
Cost:   $0.0073 → $0.0048
Time:   368ms → 420ms (+52ms)

Debug step by step: trace replay

Interactive step-through — see inputs and outputs for each tool call.

$ blueclaw trace replay 20260315-054426

Step 1: web_search (1ms) ✓
  input query: Python 3.13 new features
  output: Found 10 results...
[Enter] next · [q] quit >

Track performance over time: trace stats

Aggregate metrics across all your runs. Answer "how is my agent performing?" at a glance.

$ blueclaw trace stats --since 7

Trace Stats · 23 runs · last 7 days

Overview
  Total runs:     23
  Total steps:    87
  Avg steps/run:  3.8
  Avg tokens/run: 2,450
  Avg cost/run:   $0.0082
  Total cost:     $0.19

Timing
  Avg duration:    5.1s
  Median duration: 4.2s
  p95 duration:    12.3s
  Avg tool time:   2.1s (41% of wall)

Top Tools (by frequency)
  shell_command        34 calls (39%)
  web_search           28 calls (32%)
  http_request         18 calls (21%)
  file_read             7 calls (8%)

Failed Steps (3 across 2 runs · 3.4% step failure rate)
  timeout              2 (67%)
  network              1 (33%)

Filter by model to compare providers:

$ blueclaw trace stats --model ollama/llama3
$ blueclaw trace stats --model claude-sonnet-4-6 --since 30

All trace commands

Command Use case
trace list Find a run ID to inspect
trace show <id> Detailed step table with timing
trace graph <id> Quick tree view of tool sequence
trace timeline <id> Find bottlenecks — where does time go?
trace explain <id> LLM explains what happened and why
trace diff <id1> <id2> Compare two runs (A/B test prompts)
trace replay <id> Step-through debugger for tool calls
trace stats Aggregate performance across all runs

Features

  • Execution tracing — structured JSON traces with full observability tooling (see above)
  • Model-agnostic — swap between Claude, Ollama, OpenAI, Gemini with one flag
  • Web search — DuckDuckGo search via ddgs, returns top 5 results with titles, URLs, and snippets
  • Persistent memoryCONTEXT.md updates in the background after each turn (instant exit), history.jsonl logs every run
  • Interactive + scripted modesblueclaw for chat, blueclaw run "..." for one-shot
  • Shell execution — sandboxed shell_command tool with deny-list, 30s timeout, and interactive approval
  • Workspace sandbox — path validation + destructive command deny-list
  • Approval hooks — interactive confirmation for shell commands and new web domains
  • Crash recovery — per-turn checkpoints in .blueclaw/last_turn.md
  • Output truncation — 12k char limit prevents context blowout
  • MCP support — bundled pdf-mcp server, custom stdio/SSE servers via config
  • Skill system — progressive loading, index in prompt, full content on demand

Model Support

# Anthropic (default)
blueclaw

# Ollama (local, no data leaves your machine)
blueclaw --model ollama/llama3

# OpenAI
blueclaw --model openai/gpt-4.1-mini

# Gemini via LiteLLM
blueclaw --model litellm/gemini/gemini-2.0-flash

Set API keys in .env:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Commands

Command Description
blueclaw Start interactive session
blueclaw run "..." Execute a single prompt and exit
blueclaw init Initialize workspace directory
blueclaw history View past run history
blueclaw trace list List recent execution traces
blueclaw trace show <run_id> Show detailed trace for a run
blueclaw trace explain <run_id> LLM-powered explanation of a recorded trace
blueclaw trace graph <run_id> Tree view of tool call sequence
blueclaw trace diff <id1> <id2> Compare two traces side by side
blueclaw trace replay <run_id> Step through a trace interactively
blueclaw trace timeline <run_id> Waterfall timeline with timing and overhead
blueclaw trace stats Aggregate metrics across all traces
blueclaw --version Print version
blueclaw --model provider/model Override model for this session

Configuration

blueclaw.yaml in your project root:

model:
  provider: anthropic
  model_id: claude-sonnet-4-6

workspace:
  path: ~/blueclaw/workspace/

tools:
  - web
  - shell                              # sandboxed shell execution (enables gh, git, etc.)
  - pdf
  - mcp:https://localhost:8080/sse     # custom MCP server

allowlist_domains:
  - github.com
  - docs.python.org

Architecture

Terminal input → cli.py → session.py → Strands Agent → Tools → workspace.py (sandbox) → observer.py (trace) → Response
Module Purpose Lines
cli.py Typer entrypoints, welcome banner, trace tooling ~714
session.py Config, model factory, agent, chat loop, background context updater ~537
workspace.py Sandbox enforcement, context/history/trace I/O ~201
observer.py Structured tool tracing + output truncation ~151
models.py Pydantic models, trace schema, cost calculation, error classification ~124
tools/ Web, shell, MCP wiring (factory pattern) ~155
approval.py Shell command + domain allowlist hooks ~51

Workspace Structure

~/blueclaw/workspace/
├── CONTEXT.md                    # Persistent agent knowledge (human-editable)
└── .blueclaw/
    ├── history.jsonl             # Append-only run log
    ├── last_turn.md              # Crash recovery checkpoint
    └── traces/                   # Structured execution traces
        └── 20260315-101201.json  # One JSON file per run

Development

# Install in dev mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
flake8 blueclaw/ tests/
black --check blueclaw/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blueclaw-1.2.3.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blueclaw-1.2.3-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file blueclaw-1.2.3.tar.gz.

File metadata

  • Download URL: blueclaw-1.2.3.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for blueclaw-1.2.3.tar.gz
Algorithm Hash digest
SHA256 61cddc0bb85e17016517d178e36fb17e170f156c3e1d2d2915d0470f11472d03
MD5 b728850eba79ec9d089ce2f6dd3cabea
BLAKE2b-256 266101686d216b13e82b2238973f11095524cc0b267503cc7355481a25e1c2b8

See more details on using hashes here.

File details

Details for the file blueclaw-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: blueclaw-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for blueclaw-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 77c6885355bcdb6258c71c02a26941b983cc01a77625366f68322a8edb83c0d6
MD5 1b0ff5a6d3069277a422313f43dc8d36
BLAKE2b-256 29aa94db9cd2d38f93e6411b6a6ff65160e5ec5a84d80c4dd607f323c01c2493

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page