Skip to main content

Standalone agent health monitor — detect loops, stuck states, thrash, and runaway costs in any AI agent workflow.

Project description

Agent Vitals

CI PyPI version Python License: MIT

Standalone agent health monitor — detect loops, stuck states, thrash, and runaway costs in any AI agent workflow.

Agent Vitals watches your LLM agent's vital signs in real time. Feed it four numbers per step and it tells you when your agent is looping, stuck, thrashing, or burning tokens for nothing.

Install

pip install agent-vitals
# Optional framework integrations
pip install "agent-vitals[langchain,langgraph]"
# Optional observability export (OTLP)
pip install "agent-vitals[otlp]"
# Development and CI tooling (tests, coverage, lint/type checks)
pip install "agent-vitals[dev]"

Quick Start

from agent_vitals import AgentVitals

monitor = AgentVitals(mission_id="my-task")

for step in range(max_steps):
    result = call_llm(prompt)
    findings = extract_findings(result)

    snapshot = monitor.step(
        findings_count=len(findings),
        coverage_score=compute_coverage(findings),
        total_tokens=result.usage.total_tokens,
        error_count=error_tracker.count,
    )

    if snapshot.any_failure:
        print(f"Health issue at step {snapshot.loop_index}: "
              f"{snapshot.stuck_trigger or snapshot.loop_trigger}")
        break

Features

  • 4-field minimum: Only findings_count, coverage_score, total_tokens, error_count required
  • Zero-config defaults: AgentVitals() works out of the box with tuned thresholds
  • Framework-agnostic: No dependency on LangChain, LangGraph, or any agent framework
  • Built-in adapters: LangChain, LangGraph, CrewAI, AutoGen/AG2, DSPy, and Haystack signal extraction
  • Immutable snapshots: Every step() returns a VitalsSnapshot with signals, metrics, and detection results
  • JSONL export: Auto-log every snapshot to structured JSONL files
  • OTLP export: Send metrics to Datadog, Grafana Cloud, or any OTLP backend
  • Backtest harness: Offline evaluation of recorded trajectories with P/R/F1 metrics
  • Context manager: with AgentVitals(...) as monitor: for clean resource management

Detection Modes

Detector What it catches Signal
Loop Agent repeating actions without progress Findings plateau over N steps
Stuck Coverage stagnation despite continued work Low DM + low CV on coverage
Thrash Excessive errors indicating instability Error count above threshold
Runaway Cost Token burn with no output Token spike with flat findings

API Overview

Manual Integration (Recommended)

from agent_vitals import AgentVitals

monitor = AgentVitals(mission_id="research-task")
snapshot = monitor.step(
    findings_count=5,
    coverage_score=0.6,
    total_tokens=12000,
    error_count=0,
)

print(snapshot.health_state)     # "healthy" | "warning" | "critical"
print(snapshot.any_failure)      # True if loop or stuck detected
print(snapshot.stuck_trigger)    # e.g. "coverage_stagnation", "burn_rate_anomaly"

Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import TelemetryAdapter

monitor = AgentVitals(mission_id="my-task", adapter=TelemetryAdapter())
snapshot = monitor.step_from_state({
    "cumulative_outputs": 5,
    "coverage_score": 0.6,
    "cumulative_tokens": 12000,
    "cumulative_errors": 0,
})

LangChain Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import LangChainAdapter

monitor = AgentVitals(mission_id="lc-agent", adapter=LangChainAdapter())
snapshot = monitor.step_from_state({
    "cumulative_outputs": 7,
    "coverage_score": 0.72,
    "llm_output": {"token_usage": {"prompt_tokens": 1200, "completion_tokens": 600, "total_tokens": 1800}},
    "cumulative_errors": 1,
    "intermediate_steps": [("search", "..."), ("summarize", "...")],
})

LangGraph Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import LangGraphAdapter

monitor = AgentVitals(mission_id="lg-agent", adapter=LangGraphAdapter())
snapshot = monitor.step_from_state({
    "findings": ["f1", "f2"],
    "sources_found": [{"url": "https://example.com/a"}],
    "mission_objectives": ["o1", "o2", "o3"],
    "covered_objectives": ["o1", "o2"],
    "total_tokens": 4200,
    "errors": [],
})

CrewAI Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import CrewAIAdapter

monitor = AgentVitals(mission_id="crewai-agent", adapter=CrewAIAdapter())
snapshot = monitor.step_from_state({
    "crew": {
        "usage_metrics": {"prompt_tokens": 300, "completion_tokens": 120, "total_tokens": 420},
        "tasks": [{"status": "completed"}, {"status": "failed"}, {"status": "completed"}],
    },
    "task_outputs": [{"result": "finding-a"}, {"result": "finding-b"}],
})

AutoGen / AG2 Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import AutoGenAdapter

monitor = AgentVitals(mission_id="autogen-agent", adapter=AutoGenAdapter())
snapshot = monitor.step_from_state({
    "usage_summary": {
        "agent_a": {"prompt_tokens": 90, "completion_tokens": 40, "total_tokens": 130},
        "agent_b": {"prompt_tokens": 70, "completion_tokens": 35, "total_tokens": 105},
    },
    "chat_messages": [{"role": "user"}, {"role": "assistant"}, {"role": "assistant"}],
    "total_turns": 6,
})

DSPy Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import DSPyAdapter

monitor = AgentVitals(mission_id="dspy-program", adapter=DSPyAdapter())
snapshot = monitor.step_from_state({
    "lm_usage": {
        "openai/gpt-4o-mini": {
            "prompt_tokens": 1200,
            "completion_tokens": 400,
            "total_tokens": 1600,
        },
    },
    "predictions": [{"answer": "Summary A"}, {"answer": "Analysis B"}],
    "modules_completed": 2,
    "modules_total": 3,
    "errors": [],
})

The DSPy adapter extracts tokens from lm_usage (preferred) or lm.history (fallback), findings from predictions or history outputs, and coverage from module completion state. No dspy dependency required.

Haystack Adapter Integration

from agent_vitals import AgentVitals
from agent_vitals.adapters import HaystackAdapter

monitor = AgentVitals(mission_id="haystack-agent", adapter=HaystackAdapter())
snapshot = monitor.step_from_state({
    "messages": [
        {"role": "user", "content": "Research quantum computing"},
        {
            "role": "assistant",
            "content": "Quantum error correction advances...",
            "_meta": {"usage": {"prompt_tokens": 200, "completion_tokens": 80, "total_tokens": 280}},
        },
    ],
    "state": {"coverage_score": 0.6},
    "sources": [
        {"url": "https://arxiv.org/paper1"},
        {"url": "https://nature.com/article1"},
    ],
})

The Haystack adapter handles both Agent state (messages with _meta.usage) and Pipeline state (component_outputs with replies). Extracts source URLs for domain counting. No haystack-ai dependency required.

LangChain Callback Integration

from agent_vitals.callbacks import LangChainVitalsCallback

callback = LangChainVitalsCallback(
    mission_id="lc-callback",
    on_failure="log",            # "log" | "raise" | "callback"
    export_jsonl_dir="./vitals_logs",
)

# Pass callback into your LangChain runnable/agent callback list.

LangGraph Node Integration

from agent_vitals.callbacks import LangGraphVitalsNode

vitals_node = LangGraphVitalsNode(on_failure="force_finalize")

# Add `vitals_node` to your StateGraph as a normal callable node.
# Returned update includes:
#   - agent_vitals: snapshot payload
#   - force_finalize: True (when failure detected and mode is force_finalize)

Pre-built Signals

from agent_vitals import AgentVitals, RawSignals

monitor = AgentVitals(mission_id="my-task")
signals = RawSignals(findings_count=5, coverage_score=0.6, total_tokens=12000, error_count=0)
snapshot = monitor.step_from_signals(signals)

Export

Log every snapshot to JSONL for offline analysis or observability pipelines.

from agent_vitals import AgentVitals, JSONLExporter

exporter = JSONLExporter(
    directory="./vitals_logs",
    layout="per_run",       # or "append"
    max_bytes=10_000_000,   # rotation threshold (append mode)
)

with AgentVitals(mission_id="my-task", exporters=[exporter]) as monitor:
    for step in range(max_steps):
        monitor.step(findings_count=..., coverage_score=..., total_tokens=..., error_count=...)
# Exporter is automatically flushed and closed on exit

Layouts:

  • per_run: {directory}/{mission_id}/{run_id}.jsonl — one file per run
  • append: {directory}/{mission_id}.jsonl — all runs in one file, with rotation

OTLP Export (Datadog / Grafana / OTLP-compatible)

from agent_vitals import AgentVitals, OTLPExporter

otlp = OTLPExporter(
    endpoint="http://localhost:4318/v1/metrics",
    service_name="deepsearch-agent",
    mission_id="DRM.0.5",
    run_id="run-2026-02-09",
    workflow_type="research",
    export_interval_ms=5000,
)

with AgentVitals(mission_id="DRM.0.5", exporters=[otlp]) as monitor:
    monitor.step(findings_count=1, coverage_score=0.2, total_tokens=300, error_count=0)

Datadog example (delta temporality enabled):

from agent_vitals import OTLPExporter

datadog = OTLPExporter(
    endpoint="https://otlp.datadoghq.com/v1/metrics",
    headers={"DD-API-KEY": "<datadog_api_key>"},
    service_name="agent-vitals",
    mission_id="DRM.0.5",
    run_id="run-42",
    workflow_type="research",
    delta_temporality=True,
)

Grafana Cloud example:

from agent_vitals import OTLPExporter

grafana = OTLPExporter(
    endpoint="https://otlp-gateway-<region>.grafana.net/otlp/v1/metrics",
    headers={"Authorization": "Basic <base64(instance_id:api_key)>"},
    service_name="agent-vitals",
    mission_id="DRM.0.5",
    run_id="run-42",
    workflow_type="research",
)

Configuration

from agent_vitals import AgentVitals, VitalsConfig

# From constructor kwargs
monitor = AgentVitals(config=VitalsConfig(
    loop_consecutive_count=6,
    stuck_dm_threshold=0.15,
))

# From YAML file
monitor = AgentVitals.from_yaml("thresholds.yaml")

# From environment variables (VITALS_* prefix)
monitor = AgentVitals()  # auto-reads VITALS_LOOP_CONSECUTIVE_COUNT, etc.

Key Thresholds

Parameter Default Description
loop_consecutive_count 5 Steps of flat findings before loop detection
stuck_dm_threshold 0.15 DM below this → coverage stagnation
stuck_cv_threshold 0.5 CV below this → low variation
burn_rate_multiplier 2.0 Token spike ratio for burn rate anomaly

Backtest

Evaluate detection accuracy against labeled trajectory corpora.

from agent_vitals.backtest import load_dataset, load_labels, run_backtest

dataset = load_dataset("path/to/traces/")
labels = load_labels("path/to/labels.json")
report = run_backtest(dataset, labels)

print(f"vitals.any: P={report.composite_any.precision:.3f} "
      f"R={report.composite_any.recall:.3f} "
      f"F1={report.composite_any.f1:.3f}")

for name, detector in report.detectors.items():
    print(f"  {name}: P={detector.precision:.3f} R={detector.recall:.3f}")

CI Coverage Gate

CI enforces coverage with pytest-cov:

  • Command: pytest --cov=agent_vitals --cov-report=xml --cov-fail-under=85
  • Baseline measured on 2026-02-09: 85% total coverage
  • Coverage XML artifact is uploaded in GitHub Actions (coverage.xml)

Session Summary

monitor = AgentVitals(mission_id="my-task")
# ... run steps ...
summary = monitor.summary()
# {"mission_id": "my-task", "total_steps": 8, "health_state": "healthy",
#  "any_loop_detected": False, "any_stuck_detected": False, ...}

monitor.reset()  # Clear history for next run (also flushes exporters)

Detection Precision

Agent Vitals v1.3.0 has been validated against a 54-trace combined corpus spanning DeepSearch (LangGraph/Ollama) and cross-agent (LangChain, raw OpenAI, with GPT-4o-mini, DeepSeek-chat, and local OSS models) trajectories.

Detector Precision Recall F1
vitals.any 1.000 1.000 1.000
loop 0.750 1.000 0.857
stuck 1.000 0.667 0.800
thrash 1.000 1.000 1.000

The composite vitals.any signal — used for enforcement decisions — maintains perfect precision and recall across all frameworks and models. Per-detector metrics are informational; the system correctly identifies failures even in the 2 edge cases where loop and stuck signals overlap.

See docs/vitals/av23-backtest-report.md for the full backtest report.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_vitals-1.3.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_vitals-1.3.0-py3-none-any.whl (54.9 kB view details)

Uploaded Python 3

File details

Details for the file agent_vitals-1.3.0.tar.gz.

File metadata

  • Download URL: agent_vitals-1.3.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for agent_vitals-1.3.0.tar.gz
Algorithm Hash digest
SHA256 2554cef91d1066f3847020ae5690fc48e975dfcd56e21b5fd1f4455e936a1cbd
MD5 33cd23fb58203308fee41480e22cd324
BLAKE2b-256 94e19890bcc378d34b2a50b95ddc9ee31ca1eff4ad40a47f10b730d775e41dc2

See more details on using hashes here.

File details

Details for the file agent_vitals-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: agent_vitals-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 54.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for agent_vitals-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22cbe42e38c38f9b69b2d049b232ba7cfa3955cae783f9494f36c857c8631df7
MD5 3ed0d6d04ec3c8ab8c6f7c28f32a27d5
BLAKE2b-256 01e7ee39af685d979d2d948fb60042c2e731fbf51ec23832ce43cbbe8b425e5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page