blueclaw

Terminal AI agent with built-in execution tracing and observability

These details have not been verified by PyPI

Project links

Project description

BlueClaw

Understand, debug, and control AI agent behavior.
Structured tracing, context management, and reproducible runs — all from the terminal.

Quickstart · Features · Models · Configuration · Roadmap · Contributing · License

Structured traces — every run writes a structured JSON trace, queryable from the terminal with no external service
Regression testing — define expected behavior in YAML; run as CI with TAP or JUnit output and Wilson CI scoring
Context management — observation masking keeps token cost low across long sessions without losing quality
Trace replay & diff — step through any recorded run interactively, or compare steps, tokens, and cost between two runs
HTTP API + stateful conversations — blueclaw serve exposes the agent over HTTP with bearer auth, SSE streaming, a concurrency cap, per-conversation_id history persisted via FileSessionManager, plus POST /upload for attaching files (PDF, text, images, csv, json, zip) to a conversation
Talk to blueclaw from your phone — blueclaw telegram exposes the agent over Telegram with per-chat workspaces (each chat gets its own CONTEXT.md and history.jsonl), allowlist-enforced authorization, and Strands-backed conversation continuity. Long-polling by default (no public URL needed), webhook mode opt-in. Install with pip install -e ".[telegram]". See docs/bridges/telegram.md
File attachments with native vision — drop @<path> (or just paste a bare/quoted absolute path) into any CLI prompt; PNG/JPEG/GIF/WEBP attachments reach vision-capable models as Strands image blocks, while PDFs and text reuse the shell/pdf-mcp tools. Works the same way over HTTP via POST /upload + file_ids
Built-in playground — GET /playground ships a single-page chat UI with blueclaw serve for manual stateful + streaming testing, including paperclip + drag-drop file attachments
Skills — package agent behavior as SKILL.md directories (AgentSkills.io standard). Install from a local path, a git URL (with optional #subdir), or a direct HTTPS URL pointing at raw SKILL.md. Project skills under <project>/.blueclaw/skills/ shadow user-global skills under ~/blueclaw/skills/
Agent identity via SOUL.md — drop a SOUL.md into your workspace to define the agent's persona, values, and communication style. Separate from CONTEXT.md (factual memory): one is who the agent is, the other is what the agent knows. blueclaw init writes a default template; edits are picked up live on the next turn

Quickstart

pip install blueclaw
blueclaw init
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
blueclaw

Install the extra for the model provider you want:

pip install "blueclaw[anthropic]"  # Claude (default)
pip install "blueclaw[ollama]"     # local models via Ollama
pip install "blueclaw[openai]"     # OpenAI
pip install "blueclaw[gemini]"     # Google Gemini (via LiteLLM)

Attach a file in one shot — @<path> or a bare absolute/quoted path both work:

blueclaw run "@~/Downloads/screenshot.png what is this?"
blueclaw run "'/Users/me/notes.pdf' summarize this"

Features

Tracing & Observability — docs/tracing.md

Every run produces a structured JSON trace. Ten CLI commands let you inspect, compare, and replay runs without a hosted dashboard.

$ blueclaw trace graph 20260315-054426

search for Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features
├── web_search (1ms) ✓  query: Python 3.13 new features list 2024
└── http_request (366ms) ✓  url: https://docs.python.org/3.13/whatsnew/3.13.html

trace list · trace show · trace graph · trace timeline · trace diff · trace explain · trace replay · trace stats · trace ui · trace purge

All ten readers also accept --chat <id> (target one Telegram chat) and, where union makes sense, --all-chats (default + every chat). See the Telegram bridge section.

Regression Testing — docs/testing.md

Define expected behavior in YAML, run as a CI test suite with TAP or JUnit output. Multi-run Wilson CI scoring handles non-determinism.

blueclaw test spec.yaml
blueclaw test spec.yaml --format junit -o results.xml

11 deterministic assertions: tools called, output content, file existence, cost, step count, duration, tool order.

Context Management

Tool outputs from older turns are automatically masked to keep token cost low across long sessions without losing model reasoning quality. A hybrid summarization mode is available for very long conversations.

HTTP API — docs/api.md

Expose the agent over HTTP for programmatic access or tool integration.

blueclaw serve                          # http://127.0.0.1:8420
curl -X POST http://127.0.0.1:8420/message \
  -d '{"message": "what is in the workspace?"}' | jq .

# Stream tokens as they're generated:
curl -N -X POST http://127.0.0.1:8420/message/stream \
  -d '{"message": "what is in the workspace?"}'

# Attach a file, then reference its file_id in /message:
FID=$(curl -s -X POST http://127.0.0.1:8420/upload \
  -F "file=@photo.jpg" -F "conversation_id=c-1" | jq -r .file_id)
curl -X POST http://127.0.0.1:8420/message \
  -d "{\"message\":\"describe this\",\"conversation_id\":\"c-1\",\"file_ids\":[\"$FID\"]}"

Bearer token auth (BLUECLAW_API_KEY), 1 MB body cap on JSON, 25 MB on /upload, 300 s timeout, CORS for localhost. A shared asyncio.Semaphore (default 4, configurable via --max-concurrent) caps simultaneous agent runs. Every API request writes a trace visible in blueclaw trace ui.

Telegram Bridge — docs/bridges/telegram.md

Talk to blueclaw from your phone. Allowlist-enforced; each chat gets its own workspace under ~/blueclaw/chats/<chat_id>/ with its own CONTEXT.md and history.jsonl. Long-polling by default (no public URL needed); webhook mode opt-in for production.

pip install -e ".[telegram]"
export TELEGRAM_BOT_TOKEN=123456:abc...
blueclaw telegram                       # starts long-polling
blueclaw telegram --echo --allow 12345  # smoke test, no model calls
blueclaw telegram --webhook https://your.host/telegram

Commands: /whoami (returns your IDs, works even unauthorized — for onboarding), /start, /reset (clears history, keeps CONTEXT.md), /forget (wipes both).

Inspect per-chat history from the host: blueclaw history --chat <id> or blueclaw history --all-chats (aggregates default workspace + every chat, labeled by source). Every blueclaw trace * reader also accepts --chat <id> and, where union makes sense (list, stats, purge, ui), --all-chats. blueclaw trace ui --all-chats opens the dashboard with a workspace dropdown and a Source column.

Skills — docs/skills.md

Skills are directories containing a SKILL.md (YAML frontmatter + markdown body) that the agent loads on demand. Built on the Strands AgentSkills plugin and the AgentSkills.io standard, so skills are portable between blueclaw and any other compliant runtime.

blueclaw skill install ./my-skill                          # local directory
blueclaw skill install https://github.com/u/repo.git       # git URL
blueclaw skill install https://github.com/u/repo.git#sub   # monorepo subdir
blueclaw skill install https://example.com/raw/SKILL.md    # single-file URL
blueclaw skill list
blueclaw skill show my-skill
blueclaw skill uninstall my-skill --yes

User-global skills live under ~/blueclaw/skills/; per-project skills live under <project>/.blueclaw/skills/ and take precedence on name collision. Install confirms before copying and refuses non-interactive runs without --yes.

Docker Sandbox — docs/sandbox.md

Opt-in container isolation for the entire agent process. When sandbox.mode: docker is set in blueclaw.yaml, blueclaw transparently re-execs into a short-lived container with the workspace bind-mounted, read-only root FS, no-new-privileges, all capabilities dropped, and configurable CPU / memory / pid caps. TTY and signals pass through; the container is invisible unless it fails to start.

blueclaw sandbox build      # build the runtime image (once per release / dev SHA)
blueclaw sandbox doctor     # diagnose docker + image state

sandbox:
  mode: docker            # "inprocess" (default) | "docker"
  network: bridge         # "bridge" | "none" | "proxy" (reserved for v3)
  cpu: 1.0
  memory_mb: 1024
  on_unavailable: error   # fail-loud by default; "fallback" degrades to in-process

Secrets and host env vars flow in through a layered composition: built-in allowlist → ~/blueclaw/.env → <project>/.env.docker → extra_env in YAML. Dotenv files are added to .gitignore by blueclaw init.

Model Support — docs/models.md

blueclaw                                    # Anthropic (default)
blueclaw --model ollama/llama3.1:8b         # Ollama (local)
blueclaw --model openai/gpt-4.1-mini       # OpenAI
blueclaw --model litellm/gemini/gemini-2.0-flash  # Gemini via LiteLLM

Set API keys in .env:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Configuration

blueclaw.yaml in your project root:

model:
  provider: anthropic
  model_id: claude-sonnet-4-6

workspace:
  path: ~/blueclaw/workspace/
  trace_retention_days: 30

tools:
  - web
  - shell
  - pdf
  - mcp:http://localhost:8080/sse        # SSE MCP server (use mcp:<command> for stdio)

allowlist_domains:
  - github.com
  - docs.python.org

Architecture

BlueClaw Architecture

Module	Purpose
`cli.py`	Typer entrypoints, welcome banner, trace tooling
`session.py`	Config, model factory, agent, chat loop, background context updater
`server.py`	HTTP API gateway (`blueclaw serve`) — `/message`, `/message/stream`, `/playground`, `/health`, `/api/traces`; bearer auth, CORS, per-conversation locks
`bridges/`	Messenger bridges. `core.py` holds platform-agnostic `Allowlist`, `ChatContext`, `BridgeRouter` (mirrors `server.py`'s create_agent + FileSessionManager pattern). `telegram.py` is the python-telegram-bot adapter (long-polling default, webhook opt-in). Each chat gets its own workspace under `~/blueclaw/chats/<chat_id>/`
`workspace.py`	Sandbox enforcement, context/history/trace I/O; multi-workspace resolver for trace + history readers
`observer.py`	Structured tool tracing + output truncation
`context.py`	Observation masking and hybrid summarization for context management
`skills.py`	Skill discovery: project + global scope resolution for the Strands `AgentSkills` plugin
`lessons.py`	Extracts behavioral hints from past traces and injects into system prompt
`models.py`	Pydantic models, trace schema, cost calculation, error classification
`launcher.py`	Docker sandbox decision: subcommand routing, env composition, argv assembly, `execvp` into `docker run`
`dotenv.py`	Minimal KEY=VALUE parser for `~/blueclaw/.env` and `<project>/.env.docker`
`testing.py`	Test spec loading, runner, assertions, formatters, stub replay
`tools/`	Web, shell, MCP wiring (factory pattern)
`approval.py`	Shell command + domain allowlist hooks

Built on Strands Agents SDK.

Roadmap

See docs/roadmap.md for the full roadmap with milestone details.

Contributing

pip install -e ".[dev]"
pip install pre-commit && pre-commit install   # mirrors CI lint locally
pytest
flake8 blueclaw/ tests/
black --check blueclaw/ tests/

Bug reports and pull requests are welcome. See docs/contributing.md for the full guide.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.0

May 19, 2026

This version

2.5.0

May 16, 2026

2.4.0

May 16, 2026

2.3.0

May 10, 2026

2.2.0

May 10, 2026

2.1.0

May 6, 2026

2.0.0

Mar 22, 2026

1.5.0

Mar 21, 2026

1.4.1

Mar 20, 2026

1.3.0

Mar 19, 2026

1.2.5

Mar 16, 2026

1.2.4

Mar 15, 2026

1.2.3

Mar 15, 2026

1.2.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blueclaw-2.5.0.tar.gz (334.2 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

blueclaw-2.5.0-py3-none-any.whl (275.0 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file blueclaw-2.5.0.tar.gz.

File metadata

Download URL: blueclaw-2.5.0.tar.gz
Upload date: May 16, 2026
Size: 334.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for blueclaw-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`3c5a1c4fc8d890d1a39ee4a69cf5c804da33cc43ae62de9aba28a0846d48eecd`
MD5	`86586a833ab7b41d9238d71ddf95c545`
BLAKE2b-256	`7f9d953c83d58859b2aa4e4f69fa81757878b57d5fcf707265152ed440463b3d`

See more details on using hashes here.

File details

Details for the file blueclaw-2.5.0-py3-none-any.whl.

File metadata

Download URL: blueclaw-2.5.0-py3-none-any.whl
Upload date: May 16, 2026
Size: 275.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for blueclaw-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5dd7624178f9831e7add68ca0ced213d78bc3bfdf58ebad3969a70faaafb6b11`
MD5	`cd03fb09348a71798ad7d8e0e4e40809`
BLAKE2b-256	`fac31aa29eebc59fc1a8847f92ae125be5b8c7daffedcf74f42c1009c14a4596`

See more details on using hashes here.

blueclaw 2.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quickstart

Features

Tracing & Observability — docs/tracing.md

Regression Testing — docs/testing.md

Context Management

HTTP API — docs/api.md

Telegram Bridge — docs/bridges/telegram.md

Skills — docs/skills.md

Docker Sandbox — docs/sandbox.md

Model Support — docs/models.md

Configuration

Architecture

Roadmap

Contributing

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes