Open-source reliability testing for AI agent tool chains. Catch cascading failures before production.
Project description
๐ก๏ธ ToolGuard
Reliability testing for AI agent tool chains.
Catch cascading failures before production. Make agent tool calling as predictable as unit tests made software reliable.
๐ง What ToolGuard Actually Solves
Right now, developers don't deploy AI agents because they are fundamentally unstable. They crash.
There are two layers to AI:
- Layer 1: Intelligence (evals, reasoning, accurate answers)
- Layer 2: Execution (tool calls, chaining, JSON payloads, APIs)
ToolGuard does not test Layer 1. We do not care if your AI is "smart" or makes good decisions. That is what eval frameworks are for.
ToolGuard mathematically proves Layer 2. We solve the problem of agents crashing at 3 AM because the LLM hallucinated a JSON key, passed a string instead of an int, or an external API timed out.
"We don't make AI smarter. We make AI systems not break."
The Solution
Test your agent's tools against edge-cases before you deploy them. ToolGuard acts like unit tests for AI execution.
from toolguard import create_tool, test_chain, score_chain
@create_tool(schema="auto")
def parse_csv(raw_csv: str) -> dict:
lines = raw_csv.strip().split("\n")
headers = lines[0].split(",")
records = [dict(zip(headers, line.split(","))) for line in lines[1:]]
return {"headers": headers, "records": records, "row_count": len(records)}
@create_tool(schema="auto")
def compute_statistics(headers: list, records: list, row_count: int) -> dict:
# Real computation โ mean, median, std dev
...
@create_tool(schema="auto")
def generate_report(total_rows: int, stats: dict) -> dict:
# Real report generation
...
# One line. Full visibility.
report = test_chain(
[parse_csv, compute_statistics, generate_report],
base_input={"raw_csv": "name,age,salary\nAlice,30,75000\nBob,35,92000"},
test_cases=["happy_path", "null_handling", "malformed_data"],
)
score = score_chain(report)
print(score.summary())
Real Output (not mocked):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Reliability Score: parse_csv โ compute_statistics โ generate_report
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Score: 50.0% โ
โ Risk Level: ๐ HIGH โ
โ Deploy: ๐ซ BLOCK โ
โ Confidence: 45.1% โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ โ ๏ธ Top Risk: Schema validation failures โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Failure Distribution: โ
โ schema_violation โโโโโโโโโโโโโโโโโโโโ 4 (67%) โ
โ type_mismatch โโโโโโโโโโโโโโโโโโโโ 2 (33%) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ โ ๏ธ Bottleneck Tools: โ
โ โ parse_csv (50% success) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Suggestion:
Agent hallucinated payload. Schema mismatch:
- Field 'age': Input should be a valid integer (Got: 'thirty' | Type: str)
- Field 'salary': Field required (Got: <unknown> | Type: None)
---
## Quick Start
```bash
pip install toolguard
from toolguard import create_tool, test_chain
@create_tool(schema="auto")
def my_tool(query: str) -> dict:
return {"result": query.upper()}
report = test_chain(
[my_tool],
base_input={"query": "hello"},
test_cases=["happy_path", "null_handling", "malformed_data"],
assert_reliability=0.80,
)
Or scaffold a full project:
toolguard init --name my_agent
Time to value: < 3 minutes.
Features
๐ Schema Validation
Automatic Pydantic input/output validation from type hints. No manual schemas needed.
@create_tool(schema="auto")
def fetch_price(ticker: str) -> dict:
...
๐ Chain Testing
Test multi-tool chains against 8 edge-case categories: null handling, type mismatches, missing fields, malformed data, large payloads, and more.
report = test_chain(
[fetch_price, calculate_position, generate_alert],
base_input={"ticker": "AAPL"},
test_cases=["happy_path", "null_handling", "type_mismatch"],
)
โก Async Support
Works with both def and async def tools transparently. No special flags needed.
@create_tool(schema="auto")
async def fetch_from_api(url: str) -> dict:
async with httpx.AsyncClient() as client:
resp = await client.get(url)
return resp.json()
# Same API โ ToolGuard handles the async automatically
report = test_chain([fetch_from_api, process_data], assert_reliability=0.95)
๐ Reliability Scoring
Quantified trust with risk levels and deployment gates.
score = score_chain(report)
if score.deploy_recommendation.value == "BLOCK":
sys.exit(1) # CI/CD gate
๐ Retry & Circuit Breaker
Production-grade resilience patterns built-in.
from toolguard import with_retry, RetryPolicy, CircuitBreaker, with_circuit_breaker
@with_retry(RetryPolicy(max_retries=3, backoff_base=0.5))
def call_api(data: dict) -> dict: ...
breaker = CircuitBreaker(failure_threshold=5, reset_timeout=60)
@with_circuit_breaker(breaker)
def call_flaky_service(data: dict) -> dict: ...
๐ฅ๏ธ CLI
toolguard test --chain my_chain.yaml # Run chain tests
toolguard test --chain my_chain.yaml --html report.html # HTML report
toolguard check --tools my_tools.py # Check compatibility
toolguard observe --tools my_tools.py # View tool stats
toolguard init --name my_project # Scaffold project
๐ Native Framework Integrations
If you are already using LangChain or CrewAI, you do not need to rewrite your tools to use ToolGuard.
ToolGuard provides native adapters that instantly convert your existing framework tools into GuardedTools so you can stress-test them immediately.
# ๐ฆ๐ LangChain
from toolguard.integrations.langchain import guard_langchain_tool
from my_app import my_langchain_tool
guarded_tool = guard_langchain_tool(my_langchain_tool)
report = test_chain([guarded_tool], ...)
# โ๏ธ CrewAI
from toolguard.integrations.crewai import guard_crewai_tool
from my_app import my_crew_tool
guarded_tool = guard_crewai_tool(my_crew_tool)
report = test_chain([guarded_tool], ...)
# ๐ค OpenAI Function Calling
from toolguard.integrations.openai_func import to_openai_function
from my_app import my_python_tool
# Instantly export any ToolGuard tool to the strict OpenAI JSON schema format
openai_schema = to_openai_function(my_python_tool)
๐ก Observability
OpenTelemetry tracing out of the box โ works with Jaeger, Zipkin, Datadog, and more.
from toolguard.core.tracer import init_tracing, trace_tool
init_tracing(service_name="my-agent")
@trace_tool
def my_tool(data: dict) -> dict: ...
Architecture
toolguard/
โโโ core/
โ โโโ validator.py # @create_tool decorator + GuardedTool (sync + async)
โ โโโ chain.py # Chain testing engine (8 test types, async-aware)
โ โโโ schema.py # Auto Pydantic model generation
โ โโโ scoring.py # Reliability scoring + deploy gates
โ โโโ report.py # Failure analysis + suggestions
โ โโโ errors.py # Exception hierarchy + correlation IDs
โ โโโ retry.py # RetryPolicy + CircuitBreaker
โ โโโ tracer.py # OpenTelemetry integration
โ โโโ compatibility.py # Schema conflict detection
โโโ cli/
โ โโโ commands/ # init, test, check, observe
โโโ reporters/
โ โโโ console.py # Rich terminal output
โ โโโ html.py # Standalone HTML reports
โโโ integrations/
โ โโโ langchain.py # LangChain adapter
โ โโโ crewai.py # CrewAI adapter
โ โโโ openai_func.py # OpenAI function calling
โโโ tests/ # 43 tests (sync + async + storage)
โโโ examples/
โโโ weather_chain/ # Working 3-tool example
โโโ demo_failing_chain/ # Intentionally buggy (aha moment)
โโโ real_world_validation/ # Real CSV pipeline validation
Why ToolGuard?
| Without ToolGuard | With ToolGuard | |
|---|---|---|
| Failure detection | Stack trace at 3 AM | Caught before deploy |
| Root cause | "TypeError in line 47" | "Tool A returned null for 'price'" |
| Fix guidance | None | "Add default value OR validate response" |
| Confidence | "It works on my machine" | "92% reliability, LOW risk" |
| CI/CD | Manual testing | toolguard test in your pipeline |
Tech Stack
| Component | Technology | Why |
|---|---|---|
| Core Language | Python 3.11 - 3.13 | Agent ecosystem standard |
| Schema Validation | Pydantic v2 | 3.5ร faster than JSON Schema |
| Async | Native asyncio | Enterprise-grade concurrency |
| Testing | pytest (43 tests) | CI/CD native |
| Observability | OpenTelemetry | Vendor-neutral |
| CLI | Click + Rich | Beautiful terminal UX |
| Distribution | PyPI | pip install toolguard |
License
MIT โ use it, fork it, ship it.
Built to make AI agents actually work in production.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_toolguard-0.1.0.tar.gz.
File metadata
- Download URL: py_toolguard-0.1.0.tar.gz
- Upload date:
- Size: 69.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c67f11a5dfff90b1aeb67a025773dac0f286e5055bc8659bc82231abfa9dcea8
|
|
| MD5 |
d1f18c2943514a1bec035c6f1e682a20
|
|
| BLAKE2b-256 |
514db45baad5f2dce9017373ea246ee2f8339fb07b4562386bf8e015d02ed25c
|
Provenance
The following attestation bundles were made for py_toolguard-0.1.0.tar.gz:
Publisher:
publish.yml on Harshit-J004/toolguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_toolguard-0.1.0.tar.gz -
Subject digest:
c67f11a5dfff90b1aeb67a025773dac0f286e5055bc8659bc82231abfa9dcea8 - Sigstore transparency entry: 1117834544
- Sigstore integration time:
-
Permalink:
Harshit-J004/toolguard@af8a1147c49323cd2473b4ba9c76965ac9afc5b7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Harshit-J004
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@af8a1147c49323cd2473b4ba9c76965ac9afc5b7 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file py_toolguard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: py_toolguard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 54.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
732d7c091eea1bfe4ac4ac734a5ad554773344d9daa865eb8635b8df9ff56059
|
|
| MD5 |
a0c4c10baed3d9fe91777ebdff41e664
|
|
| BLAKE2b-256 |
62f4f3bac5fdfca1fe70939aeb549bde89e50e2f961af8579c2d90a7ea7e730c
|
Provenance
The following attestation bundles were made for py_toolguard-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Harshit-J004/toolguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_toolguard-0.1.0-py3-none-any.whl -
Subject digest:
732d7c091eea1bfe4ac4ac734a5ad554773344d9daa865eb8635b8df9ff56059 - Sigstore transparency entry: 1117834564
- Sigstore integration time:
-
Permalink:
Harshit-J004/toolguard@af8a1147c49323cd2473b4ba9c76965ac9afc5b7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Harshit-J004
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@af8a1147c49323cd2473b4ba9c76965ac9afc5b7 -
Trigger Event:
workflow_dispatch
-
Statement type: