Stress-test agents. Capture production. Replay incidents on demand.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yonke-735

These details have not been verified by PyPI

Project links

Homepage

Project description

Tool Pouch

Stress-test agents. Capture production. Replay incidents on demand.

Tool Pouch is the reliability layer for AI agents. It catches silent failures pre-deploy with pouch scan, captures every production request with pouch.wrap_anthropic, and replays any captured trace under chaos so you can answer "would this incident reproduce?" in one command — before you ship a fix.

pip install tool-pouch

import tool_pouch as pouch
from anthropic import Anthropic

# Wrap once. Every messages.create from here on is captured.
client = pouch.wrap_anthropic(Anthropic())

# Pre-deploy
pouch init && pouch scan --quick

# In production, after the wrap()
pouch traces --since 1h --failed       # what's blowing up?
pouch trace --request-id req-abc       # one specific request
pouch replay <trace_id> --repeat 100   # would it reproduce?

Installed as pip install tool-pouch, imported as import tool_pouch as pouch, and run as pouch (the long form tool-pouch also works).

What Tool Pouch is for

Three problems, one toolkit:

Layer	Command	What it answers
Pre-deploy	`pouch scan`	"What does my agent do when its tools break?"
Production	`pouch.wrap_anthropic`	"What did my agent actually receive and emit?"
Incident response	`pouch replay`	"Would this 3am incident reproduce?"

You can adopt any one independently. They share the same data model (local SQLite by default; pluggable destinations for production), so captured traces become testable scenarios with no extra plumbing.

Install

pip install tool-pouch

For OpenAI or Ollama support:

pip install tool-pouch[openai]   # OpenAI or any OpenAI-compatible endpoint
pip install tool-pouch[ollama]   # Local Ollama

LLM provider

Tool Pouch uses an LLM to classify failures (hallucinated vs handled, silent_wrong, etc.) and suggest fixes. One API key is enough — pouch init autodetects which one you have, and pouch scan mirrors the agent provider to the judge by default.

export OPENAI_API_KEY=...        # → provider = openai, judge = openai
# or
export ANTHROPIC_API_KEY=...     # → provider = anthropic, judge = anthropic

Override the judge for a single run:

pouch scan --judge ollama        # local, fully offline
pouch run my_agent.py --judge openai

If the judge can't reach the LLM (no network, model down), Tool Pouch still runs — crashes, timeouts, and loops are detected without it. Only the nuanced "did this hallucinate?" classification needs the judge.

Supported: Anthropic, OpenAI, Ollama (local, fully offline).

Pick your path

Four integration paths. Each one is five minutes or less. Pick whichever matches your existing setup.

Use this if...	Path	Jump to
Tools are plain `.py` functions you control	A. Decorator	`@pouch.tool` + `pouch scan`
You already use Anthropic / OpenAI tool calling	B. Adapter	`test_anthropic` / `test_openai`
LangGraph, MCP, or your own loop	C. Custom orchestration	`agent_fn` + `pouch run`
You want production capture + replay	D. wrap()	`wrap_anthropic` / `wrap_openai`

What success looks like in any of them: see What the output looks like.

Path A — Decorator (the simplest)

~5 min. Use this when tools are plain Python functions in your own files.

pouch init             # autodetects tools/, provider, model
pouch scan --quick     # ~15s; runs the highest-signal scenarios first

Tag the functions you want tested:

# tools/web.py
from tool_pouch import tool

@tool
def search(q: str) -> dict:
    """Search the web for q."""
    return search_api(q)

@tool
def fetch(url: str) -> dict:
    """Fetch the URL and return content."""
    return requests.get(url).json()

pouch init finds your tools folder, picks the right provider based on your API key, and writes .tool_pouch.toml. The judge defaults to the same provider as the agent — one API key is enough.

--quick mode runs one input across the four highest-signal failure scenarios, designed for fix → re-run → verify cycles. Drop the flag for the full battery (12 scenarios × N inputs).

Path B — Anthropic / OpenAI adapter

~5 min. Use this when you already have a working agent on Anthropic or OpenAI tool calling.

Schemas are derived from each function's signature and docstring — no separate spec file, no rewrite.

import tool_pouch as pouch
from anthropic import Anthropic

def search(q: str) -> dict:
    """Search the web for q."""
    return {"results": [...]}

pouch.test_anthropic(
    client=Anthropic(),
    model="claude-opus-4-7",
    tools=[search],
    test_inputs=["best pizza in NYC"],
)

OpenAI is identical:

from openai import OpenAI

pouch.test_openai(
    client=OpenAI(),
    model="gpt-4o",
    tools=[search],
    test_inputs=["best pizza in NYC"],
)

The adapter drives the model loop, dispatches tool calls through Tool Pouch's failure-injection proxy, and returns a list of run_ids — same coverage as Path A, none of the boilerplate.

Path C — Custom orchestration

~10 min. Use this when you're not on OpenAI / Anthropic directly — LangGraph, Pydantic-AI, MCP, or your own loop.

Define four exports in a Python file:

# my_agent.py

async def agent_fn(user_input, tool_caller):
    # Use tool_caller(name, args) to call your tools.
    result = await tool_caller("search", {"q": user_input})
    return {"output": "...", "tool_calls": [...]}

def real_tool_fn(name, args):
    if name == "search":
        return search_api(args["q"])
    ...

tools = ["search", "fetch"]            # tools to inject failures into
test_inputs = ["best pizza in NYC"]    # what to ask your agent

Run it:

pouch run my_agent.py

Path D — Production wrap + replay

~5 min. Use this when you want every production request captured and any of them replayable on demand.

One line wraps your client:

import tool_pouch as pouch
from anthropic import Anthropic

client = pouch.wrap_anthropic(Anthropic(), agent_name="support_bot")
# That's it. Use client.messages.create exactly as before.

OpenAI is identical:

client = pouch.wrap_openai(OpenAI(), agent_name="support_bot")

Async clients work too (AsyncAnthropic, AsyncOpenAI). Streaming is fully supported — chunks pass through unchanged, and the trace is committed when the stream exhausts.

Querying captured traces

pouch traces                            # everything captured
pouch traces --since 1h --failed        # last hour, failures only
pouch traces --request-id req-abc       # by your request_id
pouch trace <trace_id>                  # full detail of one capture

request_id flows through to traces — pass a string or a callable that extracts it from the request kwargs:

client = pouch.wrap_anthropic(
    Anthropic(),
    request_id=lambda **kw: kw.get("metadata", {}).get("user_id", "anon"),
)

Replaying

# Walk through what actually happened — no API calls.
pouch replay <trace_id> --frozen

# Re-call your model; stub tools with captured outputs.
pouch replay <trace_id> --frozen-tools

# Default: chaos. Real model, real tools, injected scenarios.
pouch replay <trace_id>

# 100 chaos replays → "would this incident reproduce?"
pouch replay <trace_id> --repeat 100

For chaos / frozen-tools modes, Tool Pouch needs your agent_fn and (for chaos) your real_tool_fn. Set agent in .tool_pouch.toml or pass --agent-file my_agent.py (same shape as Path C).

--repeat N aggregates verdicts as percentages per (tool, scenario) cell — useful for surfacing flaky failure rates.

PII redaction

The default redactor scrubs emails, phones, SSNs, credit cards, IPs, and common API keys at capture time:

client = pouch.wrap_anthropic(Anthropic())   # built-in redaction enabled

Extend the regex pack:

client = pouch.wrap_anthropic(
    Anthropic(),
    redact=pouch.redact.builtin(extra_patterns=[
        r"acct_\d{6}",
        r"customer_token=[A-Za-z0-9]+",
    ]),
)

Disable redaction explicitly (if you're handling PII upstream):

client = pouch.wrap_anthropic(Anthropic(), redact=None)

Destinations

Three destinations ship in OSS. Combine them — capture once, pipe anywhere:

client = pouch.wrap_anthropic(
    Anthropic(),
    destinations=[
        pouch.LocalStore(),                       # SQLite, dev/staging
        pouch.JSONLogger(),                       # NDJSON to stderr
        pouch.HTTPSink(url="https://your.api/traces"),
    ],
)

Destination	Use it for
`LocalStore`	Dev / staging. SQLite at `~/.tool_pouch/tool_pouch.db`.
`JSONLogger`	Production. Pipe stderr into Datadog, Honeycomb, Loki, CloudWatch.
`HTTPSink`	In-house observability backends. Batched POST.

A future CloudStore will become a fourth destination after Tool Pouch Cloud ships. The wrap API stays unchanged.

Disabling capture

Set TOOL_POUCH_DISABLE_WRAP=1 and every wrap_anthropic / wrap_openai call becomes a no-op passthrough. Useful in CI and unit tests.

What the wrap costs you

Sub-millisecond p99 enqueue overhead on the request thread. Serialization, redaction, truncation, and destination IO all run on a background writer thread. Multi-process safe (pre-fork models like gunicorn / uvicorn workers). Per-trace size limits prevent runaway payloads. Fail-open at every destination — a misbehaving sink logs to stderr and never propagates.

What the output looks like

============================================================
Agent Test Report (run abc12345)
============================================================
Total scenarios: 24
Failures: 14 (58%)

Breakdown:
  ❌ crashed: 8
  ❌ hallucinated: 4
  ❌ looped: 2
  ✓ handled: 10

For full trace of any failure: pouch show abc12345 --filter <type>

Exit code is 0 when all scenarios pass and 1 when any fail — works in CI out of the box.

Drilling in & re-running

pouch show abc12345 --filter hallucinated      # full trace of one type
pouch scan --scenarios timeout,malformed_json  # re-run a slice
pouch run my_agent.py --tools search           # one tool only
pouch runs --failed                            # history, failures only

Project config (`.tool_pouch.toml`)

[tool-pouch]
# For `pouch scan`
tools = "./my_app/tools/"
provider = "openai"
model = "gpt-4o"
test_inputs = ["best pizza in NYC"]   # optional — autogenerated otherwise

# For `pouch run` and `pouch replay`
agent = "./my_agent.py"

# Common
parallel = 8
# scenarios = ["timeout", "malformed_json"]    # optional filter

Fix bugs in your AI editor

After any run, get a markdown prompt designed for Cursor, Claude Code, Cline, Windsurf, or Aider:

pouch fix-prompt | pbcopy             # latest run → clipboard

The format groups failures by source (control flow, prompt, integration) so your AI editor proposes clustered fixes instead of one-line patches.

Architecture

User-facing surface:

tool.py — @pouch.tool decorator + module-level registry
discover.py — walks a path, returns every @pouch.tool callable
init.py — pouch init, autodetects tools/provider/model
autogen.py — generates test prompts from tool docstrings
adapters/ — drop-in helpers for OpenAI and Anthropic tool calling
_introspect.py — Python callables → JSON tool schemas
fix_prompt.py — renders a past run as markdown for AI coding tools

Wrap / replay:

wrap/proxy.py — wrap_anthropic / wrap_openai client interception
wrap/writer.py — background writer thread, fail-open, fork-safe
wrap/destinations.py — LocalStore, JSONLogger, HTTPSink
wrap/limits.py — per-trace + per-tool-result size truncation
redact.py — PII redaction pack (extensible)
replay.py — build_replay_inputs(trace, mode=...) + verdict aggregation
nudges.py — one-time CLI nudges (cloud upgrade hooks)

Engine:

proxy.py — wraps tool calls during stress testing
runner.py — runs (tool × scenario) in parallel; judge fan-out
scenarios/static.py — built-in failures
judges/llm_judge.py — classifies completed runs
config.py — judge provider resolution
store.py — versioned SQLite (WAL mode, multi-process safe)
migrations/ — versioned schema migrations
report.py — summary + detailed trace view

Status & roadmap

0.1 ships pre-deploy stress testing, production capture, and replay. Tool Pouch Cloud is the next layer: push captured traces from any environment, search by request_id, replay across your team, retain for compliance. Until it ships, the OSS path is already production-ready via JSONLogger and HTTPSink.

Get notified at launch: toolpouch.dev.

License

Apache License 2.0. See LICENSE for the full text and NOTICE for required attribution.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yonke-735

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

May 17, 2026

0.0.1

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tool_pouch-0.1.1.tar.gz (692.9 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tool_pouch-0.1.1-py3-none-any.whl (74.8 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file tool_pouch-0.1.1.tar.gz.

File metadata

Download URL: tool_pouch-0.1.1.tar.gz
Upload date: May 17, 2026
Size: 692.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tool_pouch-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b0d0b7371f2331b68ddb2c81372e63417bb6d2edd2c8bb18953dfc8dabe4cd1e`
MD5	`afe32d8655b52c0532997e7afbedf897`
BLAKE2b-256	`11851a365f8ba54e81c8aeb8c17b807a04a859865f6817d5bc5fd68cd4150d9f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tool_pouch-0.1.1.tar.gz:

Publisher: release.yml on Tool-pouch/tool-pouch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tool_pouch-0.1.1.tar.gz
- Subject digest: b0d0b7371f2331b68ddb2c81372e63417bb6d2edd2c8bb18953dfc8dabe4cd1e
- Sigstore transparency entry: 1564447041
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: Tool-pouch/tool-pouch@24dc7ddad930d05ed28637308dab9ef58ae2cff2
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Tool-pouch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@24dc7ddad930d05ed28637308dab9ef58ae2cff2
- Trigger Event: push

File details

Details for the file tool_pouch-0.1.1-py3-none-any.whl.

File metadata

Download URL: tool_pouch-0.1.1-py3-none-any.whl
Upload date: May 17, 2026
Size: 74.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tool_pouch-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1a607176ec8d01aaa24abba0de9633f435d17ef413862cfeb9b9fd687ae1526`
MD5	`5e0351788dbb0b843c2f6c250b9f6364`
BLAKE2b-256	`292ac89f3358fa9671e1973e15fc809a868e1b8e63074ab7209a35bf335ce379`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tool_pouch-0.1.1-py3-none-any.whl:

Publisher: release.yml on Tool-pouch/tool-pouch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tool_pouch-0.1.1-py3-none-any.whl
- Subject digest: c1a607176ec8d01aaa24abba0de9633f435d17ef413862cfeb9b9fd687ae1526
- Sigstore transparency entry: 1564447055
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: Tool-pouch/tool-pouch@24dc7ddad930d05ed28637308dab9ef58ae2cff2
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Tool-pouch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@24dc7ddad930d05ed28637308dab9ef58ae2cff2
- Trigger Event: push

tool-pouch 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tool Pouch

What Tool Pouch is for

Install

LLM provider

Pick your path

Path A — Decorator (the simplest)

Path B — Anthropic / OpenAI adapter

Path C — Custom orchestration

Path D — Production wrap + replay

Querying captured traces

Replaying

PII redaction

Destinations

Disabling capture

What the wrap costs you

What the output looks like

Drilling in & re-running

Project config (.tool_pouch.toml)

Fix bugs in your AI editor

Architecture

Status & roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Project config (`.tool_pouch.toml`)