Skip to main content

Terminal AI agent with built-in execution tracing and observability

Project description

BlueClaw

Understand, debug, and control AI agent behavior.
Structured tracing, context management, and reproducible runs — all from the terminal.

Quickstart · Features · Models · Configuration · Roadmap · Contributing · License

PyPI Version License Python Version GitHub Issues CI Downloads Awesome Strands Agents


  • Structured traces — every run writes a structured JSON trace, queryable from the terminal with no external service
  • Regression testing — define expected behavior in YAML; run as CI with TAP or JUnit output and Wilson CI scoring
  • Context management — observation masking keeps token cost low across long sessions without losing quality
  • Trace replay & diff — step through any recorded run interactively, or compare steps, tokens, and cost between two runs
  • HTTP API + stateful conversationsblueclaw serve exposes the agent over HTTP with bearer auth, SSE streaming, a concurrency cap, per-conversation_id history persisted via FileSessionManager, plus POST /upload for attaching files (PDF, text, images, csv, json, zip) to a conversation
  • File attachments with native vision — drop @<path> (or just paste a bare/quoted absolute path) into any CLI prompt; PNG/JPEG/GIF/WEBP attachments reach vision-capable models as Strands image blocks, while PDFs and text reuse the shell/pdf-mcp tools. Works the same way over HTTP via POST /upload + file_ids
  • Built-in playgroundGET /playground ships a single-page chat UI with blueclaw serve for manual stateful + streaming testing, including paperclip + drag-drop file attachments

Quickstart

pip install blueclaw
blueclaw init
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
blueclaw

Install the extra for the model provider you want:

pip install "blueclaw[anthropic]"  # Claude (default)
pip install "blueclaw[ollama]"     # local models via Ollama
pip install "blueclaw[openai]"     # OpenAI
pip install "blueclaw[gemini]"     # Google Gemini (via LiteLLM)

Attach a file in one shot — @<path> or a bare absolute/quoted path both work:

blueclaw run "@~/Downloads/screenshot.png what is this?"
blueclaw run "'/Users/me/notes.pdf' summarize this"

Features

Tracing & Observability — docs/tracing.md

Every run produces a structured JSON trace. Ten CLI commands let you inspect, compare, and replay runs without a hosted dashboard.

$ blueclaw trace graph 20260315-054426

search for Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features list 2024
└── http_request (366ms) ✓  url: https://docs.python.org/3.13/whatsnew/3.13.html

trace list · trace show · trace graph · trace timeline · trace diff · trace explain · trace replay · trace stats · trace ui · trace purge

Regression Testing — docs/testing.md

Define expected behavior in YAML, run as a CI test suite with TAP or JUnit output. Multi-run Wilson CI scoring handles non-determinism.

blueclaw test spec.yaml
blueclaw test spec.yaml --format junit -o results.xml

11 deterministic assertions: tools called, output content, file existence, cost, step count, duration, tool order.

Context Management

Tool outputs from older turns are automatically masked to keep token cost low across long sessions without losing model reasoning quality. A hybrid summarization mode is available for very long conversations.

HTTP API — docs/api.md

Expose the agent over HTTP for programmatic access or tool integration.

blueclaw serve                          # http://127.0.0.1:8420
curl -X POST http://127.0.0.1:8420/message \
  -d '{"message": "what is in the workspace?"}' | jq .

# Stream tokens as they're generated:
curl -N -X POST http://127.0.0.1:8420/message/stream \
  -d '{"message": "what is in the workspace?"}'

# Attach a file, then reference its file_id in /message:
FID=$(curl -s -X POST http://127.0.0.1:8420/upload \
  -F "file=@photo.jpg" -F "conversation_id=c-1" | jq -r .file_id)
curl -X POST http://127.0.0.1:8420/message \
  -d "{\"message\":\"describe this\",\"conversation_id\":\"c-1\",\"file_ids\":[\"$FID\"]}"

Bearer token auth (BLUECLAW_API_KEY), 1 MB body cap on JSON, 25 MB on /upload, 300 s timeout, CORS for localhost. A shared asyncio.Semaphore (default 4, configurable via --max-concurrent) caps simultaneous agent runs. Every API request writes a trace visible in blueclaw trace ui.

Model Support — docs/models.md

blueclaw                                    # Anthropic (default)
blueclaw --model ollama/llama3.1:8b         # Ollama (local)
blueclaw --model openai/gpt-4.1-mini       # OpenAI
blueclaw --model litellm/gemini/gemini-2.0-flash  # Gemini via LiteLLM

Set API keys in .env:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Configuration

blueclaw.yaml in your project root:

model:
  provider: anthropic
  model_id: claude-sonnet-4-6

workspace:
  path: ~/blueclaw/workspace/
  trace_retention_days: 30

tools:
  - web
  - shell
  - pdf
  - mcp:http://localhost:8080/sse        # SSE MCP server (use mcp:<command> for stdio)

allowlist_domains:
  - github.com
  - docs.python.org

Architecture

BlueClaw Architecture

Module Purpose
cli.py Typer entrypoints, welcome banner, trace tooling
session.py Config, model factory, agent, chat loop, background context updater
server.py HTTP API gateway (blueclaw serve) — /message, /message/stream, /playground, /health, /api/traces; bearer auth, CORS, per-conversation locks
workspace.py Sandbox enforcement, context/history/trace I/O
observer.py Structured tool tracing + output truncation
context.py Observation masking and hybrid summarization for context management
lessons.py Extracts behavioral hints from past traces and injects into system prompt
models.py Pydantic models, trace schema, cost calculation, error classification
testing.py Test spec loading, runner, assertions, formatters, stub replay
tools/ Web, shell, MCP wiring (factory pattern)
approval.py Shell command + domain allowlist hooks

Built on Strands Agents SDK.

Roadmap

See docs/roadmap.md for the full roadmap with milestone details.

Contributing

pip install -e ".[dev]"
pip install pre-commit && pre-commit install   # mirrors CI lint locally
pytest
flake8 blueclaw/ tests/
black --check blueclaw/ tests/

Bug reports and pull requests are welcome. See docs/contributing.md for the full guide.

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blueclaw-2.3.0.tar.gz (292.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blueclaw-2.3.0-py3-none-any.whl (248.2 kB view details)

Uploaded Python 3

File details

Details for the file blueclaw-2.3.0.tar.gz.

File metadata

  • Download URL: blueclaw-2.3.0.tar.gz
  • Upload date:
  • Size: 292.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for blueclaw-2.3.0.tar.gz
Algorithm Hash digest
SHA256 d35ea6f35224daffe2609e373b2fb1ea068692529ed84bb70d3c31b5b449daac
MD5 5441a0630b7dc459ba52b499a3e7acda
BLAKE2b-256 362384c9d444f37b6fb15ba1fbd760197db327e3cbcb433a3fedeb4e994815d1

See more details on using hashes here.

File details

Details for the file blueclaw-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: blueclaw-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 248.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for blueclaw-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 076a77e62b8960e2d0569fe993a46a01e74138387f90711300b0c07978d28bde
MD5 1622bc73d4d7530b58e2db749b1edd89
BLAKE2b-256 42a440b0af8763e2dda14095696a1482defa0f95371cc7c9b4ef731e75beb218

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page