Exactly-once + circuit breaker for agent tool calls (loops, retries, duplicate side effects).
Project description
Aura Guard
Stop AI agents from looping on tools and accidentally doing the same action twice (double refunds, duplicate emails, endless retries).
Aura Guard is reliability middleware for tool-using agents. It sits between your agent and its tools (search, refund, get_order, send_email, etc.) and focuses on three controls: idempotency, circuit breaking, and loop detection.
Before a tool runs, Aura Guard answers:
- ✅ ALLOW → run the tool
- ♻️ CACHE → reuse the last result (don’t call the tool again)
- ⛔ BLOCK → stop a risky / repetitive call
- ✍️ REWRITE → tell the model “stop looping, do this instead”
- 🧑💼 ESCALATE / FINALIZE → stop the run safely
Core goals
- Prevent duplicate side-effects (refund twice, email twice, cancel twice)
- Contain retry storms (429 / timeouts / 5xx)
- Detect loops early and stop runaways
- Provide deterministic, inspectable decisions
✅ Python 3.10+
✅ Dependency‑free core (optional LangChain adapter)
✅ Framework‑agnostic (works with your custom loop)
Table of contents
- 30-second demo (no API key)
- How it works (1 minute explanation)
- Install
- What problem does this solve
- Why not just max_steps / retries / idempotency keys?
- 2-minute integration (copy/paste)
- Live A/B (real model) — optional
- Configuration (the knobs that matter)
- LangChain (optional)
- Telemetry & persistence (optional)
- Non-goals & limitations
- Security & privacy
- Shadow mode
- Async support
- Quick integration examples
- Docs
- Contributing
- License
30-second demo (no API key)
This is the fastest way to “feel” what Aura Guard does.
Option A (recommended for first-time users): install directly from GitHub
pip install git+https://github.com/auraguardhq/aura-guard.git
aura-guard demo
You should see output like:
================================================================
Aura Guard — Triage Simulation Demo
================================================================
Assumed tool-call cost: $0.04 per call
Variant Calls SideFX Blocks Cache Cost Terminated
────────────────────────────────────────────────────────────────────────
no_guard 11 3 0 0 $0.44 -
call_limit(5) 5 3 0 0 $0.20 call_limit
aura_guard 4 1 0 2 $0.16 escalate
Cost saved vs no_guard: $0.28 (64%)
Side-effects prevented: 2
Rewrites issued: 6
Option B (for contributors/devs): run from a clone
git clone https://github.com/auraguardhq/aura-guard.git
cd aura-guard
pip install -e .
aura-guard demo
Optional: run the full synthetic benchmark suite
aura-guard bench --all
This prints a report showing cost deltas across multiple failure scenarios (looping, retries, side-effects, etc.).
Note: the benchmark uses estimated USD costs based on the configuration. The most important signal is the relative difference under the same config.
How it works (1 minute explanation)
Aura Guard keeps run-scoped state and makes deterministic decisions from it:
- repeated/near-repeated calls (loop detection)
- repeated tool errors (circuit breaker behavior)
- side-effect replay risk (idempotency protection)
- output stall signals and run cost
Install
Option A (recommended): install directly from GitHub
pip install git+https://github.com/auraguardhq/aura-guard.git
Option B (for contributors/devs): install from a cloned repo
git clone https://github.com/auraguardhq/aura-guard.git
cd aura-guard
pip install -e .
Optional: LangChain adapter
pip install langchain-core
What problem does this solve?
Agents that can call tools are powerful — but they fail in very predictable ways:
- They repeat the same tool call (or almost the same call) over and over.
- They “try different keywords” and spiral.
- They hit an error (429 / timeout) and retry forever.
- They see a tool response like “pending” and do the side-effect twice.
- They produce “sorry, still checking…” text and stall.
Why not just max_steps / retries / idempotency keys?
max_stepsis blunt: it does not distinguish productive steps from runaway loops.- Backoff/retry libraries don’t prevent duplicate side effects: they retry transport failures, not business-level replay risk.
- Idempotency keys help side effects but not loops/cost/quarantine: they protect write operations, but they do not stop search spirals, repeated failed reads, or stalled agent outputs.
2-minute integration (copy/paste)
Aura Guard does not call your LLM and does not execute tools.
You keep your agent loop. You just add 3 hook calls:
check_tool(...)before you execute a toolrecord_result(...)after the tool finishes (success or error)check_output(...)after the model produces text (optional but recommended)
Minimal example
from aura_guard import AgentGuard, PolicyAction
guard = AgentGuard(
max_calls_per_tool=3, # stop “search forever”
side_effect_tools={"refund", "cancel"},
max_cost_per_run=1.00, # optional budget (USD)
tool_costs={"search_kb": 0.03}, # optional; improves cost reporting
)
def run_tool(tool_name: str, args: dict):
decision = guard.check_tool(tool_name, args=args, ticket_id="ticket-123")
if decision.action == PolicyAction.ALLOW:
try:
result = execute_tool(tool_name, args) # <-- your tool function
guard.record_result(ok=True, payload=result)
return result
except Exception as e:
# classify errors however you want ("429", "timeout", "5xx", ...)
guard.record_result(ok=False, error_code=type(e).__name__)
raise
if decision.action == PolicyAction.CACHE:
# Aura Guard tells you “reuse the previous result”
return decision.cached_result.payload if decision.cached_result else None
if decision.action == PolicyAction.REWRITE:
# You should inject decision.injected_system into your next prompt
# and re-run the model.
raise RuntimeError(f"Rewrite requested: {decision.reason}")
# BLOCK / ESCALATE / FINALIZE
raise RuntimeError(f"Stopped: {decision.action.value} — {decision.reason}")
Recommended: record real token usage (more accurate costs)
After each LLM call, report usage:
guard.record_tokens(
input_tokens=resp.usage.input_tokens,
output_tokens=resp.usage.output_tokens,
)
Live A/B (real model) — optional
If you want “real model behavior” (not just the synthetic benchmark), run the live A/B harness.
Anthropic example
pip install anthropic
export ANTHROPIC_API_KEY=...
python examples/live_test.py --ab --runs 5 --json-out ab.json
This produces a JSON report (recommended: commit it under reports/).
Tip: Prefer reproducible commands + JSON artifacts over screenshots.
See docs/RESULTS.md and reports/README.md.
Example A/B snapshot (table)
Example numbers (5 runs per scenario):
| Scenario | No Guard (avg) | Aura Guard (avg) | Saved (avg) |
|---|---|---|---|
| A: Jitter Loop (reformulation trap) | $0.2778 | $0.1447 | $0.1331 |
| B: Double Refund (ambiguous response trap) | $0.1396 | $0.1275 | $0.0120 |
| C: Error Retry Spiral | $0.1345 | $0.0952 | $0.0393 |
| D: Smart Reformulation (cap enforcement) | $0.8093 | $0.1464 | $0.6629 |
| E: Flagship — Guard + Good Answer | $0.3497 | $0.1420 | $0.2077 |
Configuration (the knobs that matter)
Most teams start here:
-
Mark side-effect tools
e.g.{"refund", "cancel", "send_email"} -
Cap expensive tools
e.g.max_calls_per_tool=3for search/retrieval -
Set a max budget per run
e.g.max_cost_per_run=1.00 -
Tell Aura Guard your tool costs
so reports are meaningful
For advanced options, see AuraGuardConfig in src/aura_guard/config.py.
Example: per-tool policies (deny / human approval)
from aura_guard import AgentGuard, AuraGuardConfig, ToolPolicy, ToolAccess
guard = AgentGuard(
config=AuraGuardConfig(
tool_policies={
"delete_account": ToolPolicy(access=ToolAccess.DENY, deny_reason="Too risky"),
"large_refund": ToolPolicy(access=ToolAccess.HUMAN_APPROVAL, risk="high"),
"search_kb": ToolPolicy(max_calls=5),
},
),
)
LangChain (optional)
Aura Guard includes a small LangChain callback adapter.
from aura_guard.adapters.langchain_adapter import AuraCallbackHandler
handler = AuraCallbackHandler(max_cost_per_run=1.00)
# pass handler in your callbacks=[handler]
Install requirement:
pip install langchain-core
Telemetry & persistence (optional)
Telemetry
Aura Guard can emit structured events (counts + signatures, not raw args/payloads).
See src/aura_guard/telemetry.py.
Persist state (optional)
You can serialize guard state to JSON and store it in Redis / Postgres / etc.
from aura_guard.serialization import state_to_json, state_from_json
json_str = state_to_json(state)
state = state_from_json(json_str)
Non-goals & limitations
Aura Guard is not:
- a content moderation system
- a factuality/accuracy verifier
- a prompt library that “makes agents smarter”
- an observability product (it can emit telemetry, but that’s not the focus)
It is:
- a deterministic enforcement layer for tool loops, retries, side-effects, and budgets
Security & privacy
- Guard state is designed to store signatures (HMAC hashes), not raw tool args or payloads.
- If you persist state or emit telemetry in production, set a unique
secret_key. - Don’t turn on raw-text persistence unless you understand the privacy impact.
Privacy by design
Aura Guard’s state management uses HMAC-SHA256 signatures exclusively. Raw PII — arguments, result payloads, ticket IDs — is never persisted to disk or emitted in telemetry. Only keyed hashes are stored.
This means:
- Guard state can be safely written to Redis, Postgres, or log aggregators without leaking customer data.
- Telemetry events contain tool names, reason codes, and cost counters — never raw input or output.
- If your application handles EU personal data, Aura Guard is GDPR-friendly by design: no personal data in the guard’s own persistence layer.
Note: Your tool executors still handle raw data — Aura Guard’s privacy guarantee covers only the guard’s own state and telemetry, not your application’s tool implementations.
Shadow mode (evaluate before enforcing)
Run Aura Guard in shadow mode to see what it would block without actually blocking anything. Use this to measure false-positive rates before turning on enforcement in production.
guard = AgentGuard(
max_cost_per_run=0.50,
shadow_mode=True, # log decisions, don't enforce
)
# Your agent loop runs normally — all tools execute.
# After the run, check what the guard would have done:
print(guard.stats["shadow_would_deny"]) # number of would-have-been denials
When you’re confident in the false-positive rate, remove shadow_mode=True to activate enforcement.
Async support
For async agent loops (FastAPI, LangGraph, etc.), use AsyncAgentGuard:
from aura_guard import AsyncAgentGuard, PolicyAction
guard = AsyncAgentGuard(max_cost_per_run=0.50)
decision = await guard.check_tool("search_kb", args={"query": "test"})
if decision.action == PolicyAction.ALLOW:
result = await execute_tool(...)
await guard.record_result(ok=True, payload=result)
stall = await guard.check_output(assistant_text)
The async wrapper calls the same deterministic engine (no I/O, sub-millisecond) — safe to run directly on the event loop.
Quick integration examples
Anthropic (Claude)
import anthropic
from aura_guard import AgentGuard, PolicyAction
client = anthropic.Anthropic()
guard = AgentGuard(max_cost_per_run=1.00, side_effect_tools={"refund", "send_email"})
# In your agent loop, after the model returns tool_use blocks:
for block in response.content:
if block.type == "tool_use":
decision = guard.check_tool(block.name, args=block.input)
if decision.action == PolicyAction.ALLOW:
result = execute_tool(block.name, block.input)
guard.record_result(ok=True, payload=result)
elif decision.action == PolicyAction.CACHE:
result = decision.cached_result.payload # reuse previous result
else:
# BLOCK / REWRITE / ESCALATE — handle accordingly
break
# After each assistant text response:
guard.check_output(assistant_text)
# Track real token spend:
guard.record_tokens(
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
)
OpenAI
from aura_guard import AgentGuard, PolicyAction
from aura_guard.adapters.openai_adapter import (
extract_tool_calls_from_chat_completion,
inject_system_message,
)
guard = AgentGuard(max_cost_per_run=1.00)
# After each OpenAI response:
tool_calls = extract_tool_calls_from_chat_completion(response)
for call in tool_calls:
decision = guard.check_tool(call.name, args=call.args)
if decision.action == PolicyAction.ALLOW:
result = execute_tool(call.name, call.args)
guard.record_result(ok=True, payload=result)
elif decision.action == PolicyAction.REWRITE:
messages = inject_system_message(messages, decision.injected_system)
# Re-call the model with updated messages
LangChain
from aura_guard.adapters.langchain_adapter import AuraCallbackHandler
handler = AuraCallbackHandler(
max_cost_per_run=1.00,
side_effect_tools={"refund", "send_email"},
)
# Pass as a callback — Aura Guard intercepts tool calls automatically:
agent = initialize_agent(tools=tools, llm=llm, callbacks=[handler])
agent.run("Process refund for order ORD-123")
# After the run:
print(handler.summary)
# {"cost_spent_usd": 0.12, "cost_saved_usd": 0.40, "blocks": 3, ...}
Docs
docs/ARCHITECTURE.md— how the engine is structureddocs/EVALUATION_PLAN.md— how to evaluate crediblydocs/RESULTS.md— how to publish results (recommended format)
Contributing
See CONTRIBUTING.md.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aura_guard-0.3.2.tar.gz.
File metadata
- Download URL: aura_guard-0.3.2.tar.gz
- Upload date:
- Size: 54.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d687ad7e942ae7341445f1835651841bf8a538521b0b9dbefd74f49666acdc90
|
|
| MD5 |
c342283ef3057bd59386460d392de388
|
|
| BLAKE2b-256 |
f8347e7c6721417a615b6b16be028c927f42c36205e521b207629c78c0c4b0a9
|
File details
Details for the file aura_guard-0.3.2-py3-none-any.whl.
File metadata
- Download URL: aura_guard-0.3.2-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c94b1b69e2dd81f00fefda135ab439bd210b824f90ccd87f335b420b60a7fae6
|
|
| MD5 |
03739ce18dbb0d3b6601c10c96eec6a3
|
|
| BLAKE2b-256 |
4155011b8dd8a2ff95199a93a9c6eec2cb695836951c39a84b1d80dc64109daa
|