Detect silent failures, drift, and stuck states in AI agents

These details have not been verified by PyPI

Project links

Project description

FailGuard

Detect silent failures, drift, and stuck states in AI agents.

The Problem

AI agents fail silently. They don't crash - they just slowly degrade:

Latency drift: Response times creep up until timeouts
Stuck states: Same output repeated endlessly
Cycles: A→B→A→B patterns that never progress

Traditional error handling doesn't catch these. Your agent looks "fine" while burning tokens and failing users.

The Solution

from failguard import failguard

@failguard(max_latency_drift=2.0, max_identical_outputs=3)
def agent_step(query: str) -> str:
    return llm.complete(query)

# Raises FailGuardError if:
# - Latency exceeds 2x baseline
# - Same output repeated 3+ times
# - Cycle pattern detected (A→B→A→B)

Installation

pip install failguard

Features

Zero dependencies - Only Python stdlib
Latency drift detection - Catches gradual slowdowns
Stuck detection - Identifies repeated identical outputs
Cycle detection - Finds A→B→A→B patterns (complements LoopGuard)
Thread-safe - Safe for concurrent use
Flexible API - Decorator or inline Monitor class

Usage

Decorator API

from failguard import failguard, FailGuardError

@failguard(
    max_latency_drift=3.0,      # Alert if latency > 3x baseline
    max_identical_outputs=5,    # Alert after 5 identical outputs
    stuck_window=60,            # Within 60 seconds
    detect_cycles=True,         # Detect A→B→A patterns
)
def agent_step(query: str) -> str:
    return llm.complete(query)

try:
    result = agent_step("What is 2+2?")
except FailGuardError as e:
    print(f"Failure detected: {e.failure_type}")
    print(f"Metrics: {e.metrics}")

Custom Failure Handler

def my_handler(status):
    logger.warning(f"Agent failing: {status.failure_types}")
    return "fallback response"

@failguard(max_identical_outputs=3, on_failure=my_handler)
def agent_step(query: str) -> str:
    return llm.complete(query)

# Returns "fallback response" instead of raising

Inline Monitor

from failguard import Monitor

monitor = Monitor(max_identical_outputs=3)

for step in workflow:
    result = agent.run(step)
    status = monitor.check(result, step_name=step)

    if status.is_stuck:
        print(f"Agent stuck: {status.identical_count} repeats")
        break
    if status.has_cycle:
        print(f"Cycle detected: {status.cycle_pattern}")
        break
    if status.has_latency_drift:
        print(f"Slowdown: {status.latency_drift_ratio}x baseline")

With LoopGuard (Full Reliability Suite)

from loopguard import loopguard
from failguard import failguard

@loopguard(max_repeats=5)        # Catch A→A→A (same args)
@failguard(detect_cycles=True)   # Catch A→B→A→B (different outputs)
def agent_action(query):
    return llm.complete(query)

API Reference

`@failguard(**options)`

Decorator for detecting failures.

Option	Default	Description
`max_latency_drift`	3.0	Alert if latency > N × baseline
`max_identical_outputs`	5	Alert after N identical outputs
`stuck_window`	60.0	Time window (seconds) for stuck detection
`detect_cycles`	True	Detect repeating patterns
`cycle_min_length`	2	Minimum cycle length
`cycle_max_length`	5	Maximum cycle length
`on_failure`	None	Callback: `(FailureStatus) -> Any`
`raise_on_failure`	True	Raise FailGuardError on failure

Attached methods:

func.reset() - Clear all state
func.get_status() - Get current status

`Monitor(**options)`

Inline monitor with same options as decorator.

monitor = Monitor()
status = monitor.check(value, step_name="step1", latency_ms=150)
monitor.reset()

`FailureStatus`

Status object returned by checks.

Field	Type	Description
`has_failure`	bool	Any failure detected
`failure_types`	list	List of FailureType values
`has_latency_drift`	bool	Latency exceeded threshold
`latency_drift_ratio`	float	Current/baseline ratio
`is_stuck`	bool	Identical outputs exceeded threshold
`identical_count`	int	Number of identical outputs
`has_cycle`	bool	Cycle pattern detected
`cycle_pattern`	list	The repeating pattern

`FailGuardError`

Exception raised on failure.

try:
    agent_step()
except FailGuardError as e:
    e.failure_type   # "stuck", "cycle", "latency_drift"
    e.message        # Human-readable description
    e.metrics        # Dict with relevant metrics

`FailureType`

Constants for failure types:

FailureType.LATENCY_DRIFT
FailureType.STUCK
FailureType.CYCLE

Part of the Guard Suite

FailGuard is part of a reliability suite for AI agents:

LoopGuard - Prevent infinite loops
EvalGuard - Validate outputs
FailGuard - Detect silent failures (this package)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

failguard-0.1.0.tar.gz (8.0 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

failguard-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file failguard-0.1.0.tar.gz.

File metadata

Download URL: failguard-0.1.0.tar.gz
Upload date: Feb 3, 2026
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for failguard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7466ee8877f83d543c64207e418ac720ef4e3519bbb576be6a0f4ec6613c5033`
MD5	`12d1cd892403b5b1350cfeedd7ed72ba`
BLAKE2b-256	`4656cbb3e3acbb29fc4a2e1f4d17750041449654e97824d2db940667aa56c331`

See more details on using hashes here.

File details

Details for the file failguard-0.1.0-py3-none-any.whl.

File metadata

Download URL: failguard-0.1.0-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for failguard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1116b8c5a442e9c140e0abedc48895fc58aba06f9fe9cfae6e97f734c85c836e`
MD5	`9885bcd1401c0b59c11975df436714db`
BLAKE2b-256	`fbf2088ea784c150d406f357984d0d3637263e4bf1c7d714d7cb18218d85b155`

See more details on using hashes here.

failguard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FailGuard

The Problem

The Solution

Installation

Features

Usage

Decorator API

Custom Failure Handler

Inline Monitor

With LoopGuard (Full Reliability Suite)

API Reference

@failguard(**options)

Monitor(**options)

FailureStatus

FailGuardError

FailureType

Part of the Guard Suite

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`@failguard(**options)`

`Monitor(**options)`

`FailureStatus`

`FailGuardError`

`FailureType`