Exactly-once + circuit breaker for agent tool calls (loops, retries, duplicate side effects).

These details have not been verified by PyPI

Project links

Project description

Aura Guard

Stop AI agents from looping on tools and accidentally doing the same action twice (double refunds, duplicate emails, endless retries).

Aura Guard is reliability middleware for tool-using agents. It sits between your agent and its tools (search, refund, get_order, send_email, etc.) and focuses on three controls: idempotency, circuit breaking, and loop detection.
Before a tool runs, Aura Guard answers:

✅ ALLOW → run the tool
♻️ CACHE → reuse the last result (don’t call the tool again)
⛔ BLOCK → stop a risky / repetitive call
✍️ REWRITE → tell the model “stop looping, do this instead”
🧑‍💼 ESCALATE / FINALIZE → stop the run safely

Core goals

Prevent duplicate side-effects (refund twice, email twice, cancel twice)
Contain retry storms (429 / timeouts / 5xx)
Detect loops early and stop runaways
Provide deterministic, inspectable decisions

✅ Python 3.10+
✅ Dependency‑free core (optional LangChain adapter)
✅ Framework‑agnostic (works with your custom loop)

30-second demo (no API key)
How it works (1 minute explanation)
Install
What problem does this solve
Why not just max_steps / retries / idempotency keys?
2-minute integration (copy/paste)
Live A/B (real model) — optional
Configuration (the knobs that matter)
LangChain (optional)
Telemetry & persistence (optional)
Non-goals & limitations
Security & privacy
Shadow mode
Async support
Quick integration examples
Docs
Contributing
License

30-second demo (no API key)

This is the fastest way to “feel” what Aura Guard does.

Option A (recommended for first-time users): install directly from GitHub

pip install git+https://github.com/auraguardhq/aura-guard.git
aura-guard demo

You should see output like:

================================================================
  Aura Guard — Triage Simulation Demo
================================================================
  Assumed tool-call cost: $0.04 per call

  Variant                   Calls  SideFX  Blocks   Cache     Cost  Terminated
  ────────────────────────────────────────────────────────────────────────
  no_guard                     11       3       0       0    $0.44  -
  call_limit(5)                 5       3       0       0    $0.20  call_limit
  aura_guard                    4       1       0       2    $0.16  escalate

  Cost saved vs no_guard:     $0.28 (64%)
  Side-effects prevented:     2
  Rewrites issued:            6

Option B (for contributors/devs): run from a clone

git clone https://github.com/auraguardhq/aura-guard.git
cd aura-guard

pip install -e .
aura-guard demo

Optional: run the full synthetic benchmark suite

aura-guard bench --all

This prints a report showing cost deltas across multiple failure scenarios (looping, retries, side-effects, etc.).

Note: the benchmark uses estimated USD costs based on the configuration. The most important signal is the relative difference under the same config.

How it works (1 minute explanation)

How Aura Guard sits between your agent loop and tools

Aura Guard keeps run-scoped state and makes deterministic decisions from it:

repeated/near-repeated calls (loop detection)
repeated tool errors (circuit breaker behavior)
side-effect replay risk (idempotency protection)
output stall signals and run cost

Install

Option A (recommended): install directly from GitHub

pip install git+https://github.com/auraguardhq/aura-guard.git

Option B (for contributors/devs): install from a cloned repo

git clone https://github.com/auraguardhq/aura-guard.git
cd aura-guard
pip install -e .

Optional: LangChain adapter

pip install langchain-core

What problem does this solve?

Agents that can call tools are powerful — but they fail in very predictable ways:

They repeat the same tool call (or almost the same call) over and over.
They “try different keywords” and spiral.
They hit an error (429 / timeout) and retry forever.
They see a tool response like “pending” and do the side-effect twice.
They produce “sorry, still checking…” text and stall.

Why not just max_steps / retries / idempotency keys?

max_steps is blunt: it does not distinguish productive steps from runaway loops.
Backoff/retry libraries don’t prevent duplicate side effects: they retry transport failures, not business-level replay risk.
Idempotency keys help side effects but not loops/cost/quarantine: they protect write operations, but they do not stop search spirals, repeated failed reads, or stalled agent outputs.

2-minute integration (copy/paste)

Aura Guard does not call your LLM and does not execute tools.
You keep your agent loop. You just add 3 hook calls:

check_tool(...) before you execute a tool
record_result(...) after the tool finishes (success or error)
check_output(...) after the model produces text (optional but recommended)

Minimal example

from aura_guard import AgentGuard, PolicyAction

guard = AgentGuard(
    max_calls_per_tool=3,                 # stop “search forever”
    side_effect_tools={"refund", "cancel"},
    max_cost_per_run=1.00,                # optional budget (USD)
    tool_costs={"search_kb": 0.03},        # optional; improves cost reporting
)

def run_tool(tool_name: str, args: dict):
    decision = guard.check_tool(tool_name, args=args, ticket_id="ticket-123")

    if decision.action == PolicyAction.ALLOW:
        try:
            result = execute_tool(tool_name, args)  # <-- your tool function
            guard.record_result(ok=True, payload=result)
            return result
        except Exception as e:
            # classify errors however you want ("429", "timeout", "5xx", ...)
            guard.record_result(ok=False, error_code=type(e).__name__)
            raise

    if decision.action == PolicyAction.CACHE:
        # Aura Guard tells you “reuse the previous result”
        return decision.cached_result.payload if decision.cached_result else None

    if decision.action == PolicyAction.REWRITE:
        # You should inject decision.injected_system into your next prompt
        # and re-run the model.
        raise RuntimeError(f"Rewrite requested: {decision.reason}")

    # BLOCK / ESCALATE / FINALIZE
    raise RuntimeError(f"Stopped: {decision.action.value} — {decision.reason}")

Recommended: record real token usage (more accurate costs)

After each LLM call, report usage:

guard.record_tokens(
    input_tokens=resp.usage.input_tokens,
    output_tokens=resp.usage.output_tokens,
)

Live A/B (real model) — optional

If you want “real model behavior” (not just the synthetic benchmark), run the live A/B harness.

Anthropic example

pip install anthropic
export ANTHROPIC_API_KEY=...
python examples/live_test.py --ab --runs 5 --json-out ab.json

This produces a JSON report (recommended: commit it under reports/).

Tip: Prefer reproducible commands + JSON artifacts over screenshots.
See docs/RESULTS.md and reports/README.md.

Example A/B snapshot (table)

Example numbers (5 runs per scenario):

Scenario	No Guard (avg)	Aura Guard (avg)	Saved (avg)
A: Jitter Loop (reformulation trap)	$0.2778	$0.1447	$0.1331
B: Double Refund (ambiguous response trap)	$0.1396	$0.1275	$0.0120
C: Error Retry Spiral	$0.1345	$0.0952	$0.0393
D: Smart Reformulation (cap enforcement)	$0.8093	$0.1464	$0.6629
E: Flagship — Guard + Good Answer	$0.3497	$0.1420	$0.2077

Configuration (the knobs that matter)

Most teams start here:

Mark side-effect tools
e.g. {"refund", "cancel", "send_email"}
Cap expensive tools
e.g. max_calls_per_tool=3 for search/retrieval
Set a max budget per run
e.g. max_cost_per_run=1.00
Tell Aura Guard your tool costs
so reports are meaningful

For advanced options, see AuraGuardConfig in src/aura_guard/config.py.

Example: per-tool policies (deny / human approval)

from aura_guard import AgentGuard, AuraGuardConfig, ToolPolicy, ToolAccess

guard = AgentGuard(
    config=AuraGuardConfig(
        tool_policies={
            "delete_account": ToolPolicy(access=ToolAccess.DENY, deny_reason="Too risky"),
            "large_refund": ToolPolicy(access=ToolAccess.HUMAN_APPROVAL, risk="high"),
            "search_kb": ToolPolicy(max_calls=5),
        },
    ),
)

LangChain (optional)

Aura Guard includes a small LangChain callback adapter.

from aura_guard.adapters.langchain_adapter import AuraCallbackHandler

handler = AuraCallbackHandler(max_cost_per_run=1.00)
# pass handler in your callbacks=[handler]

Install requirement:

pip install langchain-core

Telemetry & persistence (optional)

Telemetry

Aura Guard can emit structured events (counts + signatures, not raw args/payloads).
See src/aura_guard/telemetry.py.

Persist state (optional)

You can serialize guard state to JSON and store it in Redis / Postgres / etc.

from aura_guard.serialization import state_to_json, state_from_json

json_str = state_to_json(state)
state = state_from_json(json_str)

Non-goals & limitations

Aura Guard is not:

a content moderation system
a factuality/accuracy verifier
a prompt library that “makes agents smarter”
an observability product (it can emit telemetry, but that’s not the focus)

It is:

a deterministic enforcement layer for tool loops, retries, side-effects, and budgets

Security & privacy

Guard state is designed to store signatures (HMAC hashes), not raw tool args or payloads.
If you persist state or emit telemetry in production, set a unique secret_key.
Don’t turn on raw-text persistence unless you understand the privacy impact.

Privacy by design

Aura Guard’s state management uses HMAC-SHA256 signatures exclusively. Raw PII — arguments, result payloads, ticket IDs — is never persisted to disk or emitted in telemetry. Only keyed hashes are stored.

This means:

Guard state can be safely written to Redis, Postgres, or log aggregators without leaking customer data.
Telemetry events contain tool names, reason codes, and cost counters — never raw input or output.
If your application handles EU personal data, Aura Guard is GDPR-friendly by design: no personal data in the guard’s own persistence layer.

Note: Your tool executors still handle raw data — Aura Guard’s privacy guarantee covers only the guard’s own state and telemetry, not your application’s tool implementations.

Shadow mode (evaluate before enforcing)

Run Aura Guard in shadow mode to see what it would block without actually blocking anything. Use this to measure false-positive rates before turning on enforcement in production.

guard = AgentGuard(
    max_cost_per_run=0.50,
    shadow_mode=True,  # log decisions, don't enforce
)

# Your agent loop runs normally — all tools execute.
# After the run, check what the guard would have done:
print(guard.stats["shadow_would_deny"])  # number of would-have-been denials

When you’re confident in the false-positive rate, remove shadow_mode=True to activate enforcement.

Async support

For async agent loops (FastAPI, LangGraph, etc.), use AsyncAgentGuard:

from aura_guard import AsyncAgentGuard, PolicyAction

guard = AsyncAgentGuard(max_cost_per_run=0.50)

decision = await guard.check_tool("search_kb", args={"query": "test"})
if decision.action == PolicyAction.ALLOW:
    result = await execute_tool(...)
    await guard.record_result(ok=True, payload=result)

stall = await guard.check_output(assistant_text)

The async wrapper calls the same deterministic engine (no I/O, sub-millisecond) — safe to run directly on the event loop.

Quick integration examples

Anthropic (Claude)

import anthropic
from aura_guard import AgentGuard, PolicyAction

client = anthropic.Anthropic()
guard = AgentGuard(max_cost_per_run=1.00, side_effect_tools={"refund", "send_email"})

# In your agent loop, after the model returns tool_use blocks:
for block in response.content:
    if block.type == "tool_use":
        decision = guard.check_tool(block.name, args=block.input)

        if decision.action == PolicyAction.ALLOW:
            result = execute_tool(block.name, block.input)
            guard.record_result(ok=True, payload=result)
        elif decision.action == PolicyAction.CACHE:
            result = decision.cached_result.payload  # reuse previous result
        else:
            # BLOCK / REWRITE / ESCALATE — handle accordingly
            break

# After each assistant text response:
guard.check_output(assistant_text)

# Track real token spend:
guard.record_tokens(
    input_tokens=response.usage.input_tokens,
    output_tokens=response.usage.output_tokens,
)

OpenAI

from aura_guard import AgentGuard, PolicyAction
from aura_guard.adapters.openai_adapter import (
    extract_tool_calls_from_chat_completion,
    inject_system_message,
)

guard = AgentGuard(max_cost_per_run=1.00)

# After each OpenAI response:
tool_calls = extract_tool_calls_from_chat_completion(response)
for call in tool_calls:
    decision = guard.check_tool(call.name, args=call.args)

    if decision.action == PolicyAction.ALLOW:
        result = execute_tool(call.name, call.args)
        guard.record_result(ok=True, payload=result)
    elif decision.action == PolicyAction.REWRITE:
        messages = inject_system_message(messages, decision.injected_system)
        # Re-call the model with updated messages

LangChain

from aura_guard.adapters.langchain_adapter import AuraCallbackHandler

handler = AuraCallbackHandler(
    max_cost_per_run=1.00,
    side_effect_tools={"refund", "send_email"},
)

# Pass as a callback — Aura Guard intercepts tool calls automatically:
agent = initialize_agent(tools=tools, llm=llm, callbacks=[handler])
agent.run("Process refund for order ORD-123")

# After the run:
print(handler.summary)
# {"cost_spent_usd": 0.12, "cost_saved_usd": 0.40, "blocks": 3, ...}

Docs

docs/ARCHITECTURE.md — how the engine is structured
docs/EVALUATION_PLAN.md — how to evaluate credibly
docs/RESULTS.md — how to publish results (recommended format)

Contributing

See CONTRIBUTING.md.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Mar 13, 2026

0.7.0

Mar 13, 2026

0.6.0

Mar 12, 2026

0.5.1

Mar 12, 2026

0.5.0

Mar 12, 2026

0.4.0

Mar 10, 2026

0.3.11

Mar 10, 2026

0.3.10

Mar 7, 2026

0.3.9

Mar 6, 2026

0.3.8

Feb 16, 2026

0.3.7

Feb 16, 2026

0.3.6

Feb 16, 2026

0.3.5

Feb 15, 2026

0.3.4

Feb 15, 2026

0.3.3

Feb 12, 2026

This version

0.3.2

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aura_guard-0.3.2.tar.gz (54.3 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aura_guard-0.3.2-py3-none-any.whl (52.6 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file aura_guard-0.3.2.tar.gz.

File metadata

Download URL: aura_guard-0.3.2.tar.gz
Upload date: Feb 11, 2026
Size: 54.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aura_guard-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`d687ad7e942ae7341445f1835651841bf8a538521b0b9dbefd74f49666acdc90`
MD5	`c342283ef3057bd59386460d392de388`
BLAKE2b-256	`f8347e7c6721417a615b6b16be028c927f42c36205e521b207629c78c0c4b0a9`

See more details on using hashes here.

File details

Details for the file aura_guard-0.3.2-py3-none-any.whl.

File metadata

Download URL: aura_guard-0.3.2-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aura_guard-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c94b1b69e2dd81f00fefda135ab439bd210b824f90ccd87f335b420b60a7fae6`
MD5	`03739ce18dbb0d3b6601c10c96eec6a3`
BLAKE2b-256	`4155011b8dd8a2ff95199a93a9c6eec2cb695836951c39a84b1d80dc64109daa`

See more details on using hashes here.

aura-guard 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Aura Guard

Table of contents

30-second demo (no API key)

Option A (recommended for first-time users): install directly from GitHub

Option B (for contributors/devs): run from a clone

Optional: run the full synthetic benchmark suite

How it works (1 minute explanation)

Install

Option A (recommended): install directly from GitHub

Option B (for contributors/devs): install from a cloned repo

Optional: LangChain adapter

What problem does this solve?

Why not just max_steps / retries / idempotency keys?

2-minute integration (copy/paste)

Minimal example

Recommended: record real token usage (more accurate costs)

Live A/B (real model) — optional

Anthropic example

Configuration (the knobs that matter)

Example: per-tool policies (deny / human approval)

LangChain (optional)

Telemetry & persistence (optional)

Telemetry

Persist state (optional)

Non-goals & limitations

Security & privacy

Privacy by design

Shadow mode (evaluate before enforcing)

Async support

Quick integration examples

Anthropic (Claude)

OpenAI

LangChain

Docs

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes