Skip to main content

Terminal AI agent with built-in execution tracing and observability

Project description

BlueClaw

Observable AI agent runtime with structured execution tracing.
Inspect, replay, diff, and debug agent behavior from the terminal.

Quickstart · Tracing · Why BlueClaw · Models · Roadmap

PyPI Version License Python Version GitHub Issues CI


Why BlueClaw

Most AI agents are black boxes. When something goes wrong, you don't know whether the problem was the model reasoning, a tool input, a tool output, a retry loop, or a bad prompt.

BlueClaw records structured execution traces for every run and provides CLI tools to analyze them. Agents become observable systems, not mystery machines.

blueclaw> research the MCP ecosystem, focus on Python SDKs
● web_search({"query": "MCP Model Context Protocol Python SDK"})
  ✓ 1.2s
● http_request({"url": "https://modelcontextprotocol.io/..."})
  ✓ 0.8s
Done · 2 steps · 1840 tokens · $0.0073 · 4.1s

Every tool call is recorded with timing, inputs, outputs, and errors. Then you debug it.

Quickstart

pip install blueclaw
blueclaw init
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
blueclaw

Tracing & Observability

Every agent run produces a structured JSON trace. Nine CLI commands let you inspect runs after the fact — no dashboards, no external services, no setup.

See what happened: trace graph

$ blueclaw trace graph 20260315-054426

search for Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features list 2024
└── http_request (366ms) ✓  url: https://docs.python.org/3.13/whatsnew/3.13.html

Find the bottleneck: trace timeline

$ blueclaw trace timeline 20260315-054426

Goal: search for Python 3.13 new features
Model: claude-sonnet-4-6 · 3 steps · 1840 tokens · $0.0073

 #  Tool          Start    Duration  Cumulative  Bar
 1  web_search      +0ms      1ms         1ms    █
 2  web_search    +120ms      1ms         2ms    █
 3  http_request  +250ms    366ms       368ms    ██████████████████████

Tool time: 368ms · Wall time: 4100ms · Overhead: 91%

Understand why: trace explain

Feed a recorded trace to an LLM for post-hoc explanation.

$ blueclaw trace explain 20260315-054426

The agent searched for Python 3.13 features, found the results too generic,
refined its query to include "list 2024", then fetched the official changelog
from docs.python.org. The two-step search pattern suggests the first results
didn't contain enough detail...

Post-hoc explanation · not the agent's actual reasoning

Compare two runs: trace diff

$ blueclaw trace diff 20260315-054426 20260315-071830

Run A: 20260315-054426  Run B: 20260315-071830
Goal A: search for Python 3.13 new features
Goal B: search for Python 3.13 new features

Steps:  3 → 2 (-1)
Tokens: 1840 → 1200 (-640)
Cost:   $0.0073 → $0.0048
Time:   368ms → 420ms (+52ms)

Debug step by step: trace replay

$ blueclaw trace replay 20260315-054426

Step 1: web_search (1ms) ✓
  input query: Python 3.13 new features
  output: Found 10 results...
[Enter] next · [q] quit >

Track performance: trace stats

$ blueclaw trace stats --since 7

Trace Stats · 23 runs · last 7 days

Overview
  Total runs:     23
  Total steps:    87
  Avg steps/run:  3.8
  Avg tokens/run: 2,450
  Avg cost/run:   $0.0082
  Total cost:     $0.19

Timing
  Avg duration:    5.1s
  Median duration: 4.2s
  p95 duration:    12.3s
  Avg tool time:   2.1s (41% of wall)

Top Tools (by frequency)
  shell_command        34 calls (39%)
  web_search           28 calls (32%)
  http_request         18 calls (21%)
  file_read             7 calls (8%)

Failed Steps (3 across 2 runs · 3.4% step failure rate)
  timeout              2 (67%)
  network              1 (33%)

All trace commands

Command Use case
trace list Find a run ID to inspect
trace show <id> Detailed step table with timing
trace graph <id> Quick tree view of tool sequence
trace timeline <id> Find bottlenecks — where does time go?
trace explain <id> LLM explains what happened and why
trace diff <id1> <id2> Compare two runs (A/B test prompts)
trace replay <id> Step-through debugger for tool calls
trace stats Aggregate performance across all runs
trace purge Delete old traces (default: 30 days)

What Makes BlueClaw Different

Most agent tools focus on capability. BlueClaw focuses on observability.

BlueClaw Typical agent frameworks
Structured execution traces Every run, automatic None or manual logging
Trace replay Step-through debugger Not available
Trace diff A/B test prompt changes Not available
Trace explain LLM post-hoc analysis Not available
Aggregate stats Cost, timing, failure rates Not available
CLI-first debugging No dashboards required Dashboard or nothing

Future versions will use trace history to generate runtime hints, allowing agents to avoid repeating past failures.

Features

  • Execution tracing — structured JSON traces with full observability tooling
  • Model-agnostic — Claude, Ollama, OpenAI, Gemini with one flag
  • Web search — DuckDuckGo via ddgs, top 5 results with snippets
  • Persistent memoryCONTEXT.md updates in the background, history.jsonl logs every run
  • Interactive + scriptedblueclaw for chat, blueclaw run "..." for one-shot
  • Shell execution — sandboxed with deny-list, 30s timeout, interactive approval
  • Workspace sandbox — path validation + destructive command deny-list
  • Crash recovery — per-turn checkpoints
  • Output truncation — 12k char limit prevents context blowout
  • MCP support — bundled pdf-mcp, custom stdio/SSE servers via config
  • Trace lessons — past failures automatically inform future runs

Model Support

blueclaw                                    # Anthropic (default)
blueclaw --model ollama/llama3              # Ollama (local)
blueclaw --model openai/gpt-4.1-mini       # OpenAI
blueclaw --model litellm/gemini/gemini-2.0-flash  # Gemini via LiteLLM

Set API keys in .env:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Configuration

blueclaw.yaml in your project root:

model:
  provider: anthropic
  model_id: claude-sonnet-4-6

workspace:
  path: ~/blueclaw/workspace/
  trace_retention_days: 30             # auto-purge old traces; 0 = keep forever

tools:
  - web
  - shell
  - pdf
  - mcp:https://localhost:8080/sse     # custom MCP server

allowlist_domains:
  - github.com
  - docs.python.org

Architecture

BlueClaw Architecture

Module Purpose
cli.py Typer entrypoints, welcome banner, trace tooling
session.py Config, model factory, agent, chat loop, background context updater
workspace.py Sandbox enforcement, context/history/trace I/O
observer.py Structured tool tracing + output truncation
models.py Pydantic models, trace schema, cost calculation, error classification
tools/ Web, shell, MCP wiring (factory pattern)
approval.py Shell command + domain allowlist hooks

Built on Strands Agents SDK. The agent loop, tool execution, streaming, and model switching are all handled by Strands.

Roadmap

BlueClaw evolves in layers:

  1. Observable agent runtime — structured tracing, model-agnostic, CLI tools (done)
  2. Trace analytics — aggregate stats, timeline, failure classification (done)
  3. Agent regression testing — define expected behavior in YAML, CI for agents (next)
  4. Production observability — SQLite backend, trace query, OpenTelemetry export
  5. API gatewayblueclaw serve with webhook endpoint
  6. Multi-channel runtime — Slack, Discord, Telegram adapters

Development

pip install -e ".[dev]"
pytest
flake8 blueclaw/ tests/
black --check blueclaw/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blueclaw-1.2.5.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blueclaw-1.2.5-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file blueclaw-1.2.5.tar.gz.

File metadata

  • Download URL: blueclaw-1.2.5.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for blueclaw-1.2.5.tar.gz
Algorithm Hash digest
SHA256 0a6b6f3b614d6754566a54149794bc51f33bbaa1270635cc5bb5d75b876c45d3
MD5 141b13aa654e43542d65af2ffd8b11cc
BLAKE2b-256 52218c9798bcbce8410b6a90cb0c36d915c052283be71b111d051b8a014f3348

See more details on using hashes here.

File details

Details for the file blueclaw-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: blueclaw-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for blueclaw-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 56785a83ef2bf0c50721cfdea8c23eb212576d50fd2febda0a171c00c2eb2e74
MD5 e9b8c9dbeeabf5a1e8e4e0fda5e9fac6
BLAKE2b-256 6c9f8d7ca637e6bb83dd51e9e5b1832079f62425b5110f4deadc061aedaa2f49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page