Governance-first observability for AI agents
Project description
Governance-first observability for AI agents -- real-time metrics, anomaly detection, kill switches, compliance export.
[!NOTE] Part of the theaios ecosystem. Install with
pip install theaios-agent-monitor.
What It Does
Record every agent event. Compute real-time metrics over rolling windows. Detect anomalies via z-score baselines. Kill misbehaving agents instantly. Export compliance reports. All in-process, no external services, ~0.1ms per event.
This is not LangSmith, Langfuse, or Arize. Those are tracing platforms that collect data. This library collects data and acts on it -- anomaly detection triggers alerts, kill switches stop agents, compliance export generates audit reports. Governance, not just observation.
- Event collection -- record agent actions, guardrail triggers, denials, approvals, costs, errors
- Real-time metrics -- event count, denial rate, cost/minute, average latency over configurable rolling windows
- Statistical baselines -- Welford's online algorithm for running mean and stddev, z-score anomaly detection
- Kill switches -- instant agent/session/global kill, automatic kill policies on metric thresholds, persistence across restarts
- Anomaly detection -- z-score rules with configurable thresholds, cooldown periods, wildcard agent matching
- Compliance export -- SOC 2, GDPR, JSON reports with filtering by agent, time range, event type
- Alert channels -- console, JSONL file, webhook (Slack, PagerDuty, OpsGenie)
- OpenTelemetry -- optional export to any OTel-compatible backend
- Guardrails adapter -- auto-record every theaios-guardrails decision
Quick Start
pip install theaios-agent-monitor
1. Write a config:
# monitor.yaml
version: "1.0"
metadata:
name: my-monitor
description: Production agent monitoring
metrics:
default_window_seconds: 300
kill_switch:
enabled: true
policies:
- name: auto-kill-on-high-cost
metric: cost_per_minute
operator: ">"
threshold: 5.0
action: kill_agent
severity: critical
alerts:
channels:
- type: console
2. Use it:
import time
from theaios.agent_monitor import Monitor, load_config, AgentEvent
monitor = Monitor(load_config("monitor.yaml"))
# Record events
monitor.record(AgentEvent(
timestamp=time.time(),
event_type="action",
agent="sales-agent",
cost_usd=0.007,
latency_ms=350.0,
data={"model": "gpt-4"},
))
# View metrics
snap = monitor.get_metrics("sales-agent")
print(f"Events: {snap.event_count}")
print(f"Cost/min: ${snap.cost_per_minute:.4f}")
print(f"Denial rate: {snap.denial_rate:.1%}")
# Kill an agent
monitor.kill_agent("sales-agent", reason="Cost spike detected")
Events tell the monitor what's happening. Each event has an event_type, an agent, a timestamp, and optional fields for cost, latency, session, and arbitrary data:
import time
# Action event with cost and latency
monitor.record(AgentEvent(
timestamp=time.time(), event_type="action", agent="my-agent",
cost_usd=0.007, latency_ms=350.0,
data={"model": "gpt-4"},
))
# Denial (guardrail blocked the request — feeds denial_rate metric)
monitor.record(AgentEvent(
timestamp=time.time(), event_type="denial", agent="my-agent",
data={"rule": "block-injection", "severity": "critical"},
))
# Guardrail trigger (non-denial, e.g., redact or log)
monitor.record(AgentEvent(
timestamp=time.time(), event_type="guardrail_trigger", agent="my-agent",
data={"rule": "redact-pii", "outcome": "redact"},
))
# Error
monitor.record(AgentEvent(
timestamp=time.time(), event_type="error", agent="my-agent",
data={"error_type": "TimeoutError", "message": "LLM call timed out"},
))
3. CLI:
agent-monitor -c monitor.yaml validate
agent-monitor -c monitor.yaml inspect
agent-monitor -c monitor.yaml status --agent sales-agent
agent-monitor -c monitor.yaml events --agent sales-agent --type action
agent-monitor -c monitor.yaml kill sales-agent --reason "Cost spike"
agent-monitor -c monitor.yaml revive sales-agent
agent-monitor -c monitor.yaml export --format soc2
Why This Library?
Every agentic system needs monitoring. The options today:
| Approach | Problem |
|---|---|
| LangSmith / Langfuse | Tracing only -- no kill switches, no anomaly detection, no compliance export |
| Arize / Weights & Biases | ML-focused, not agent-governance-focused, expensive at scale |
| Datadog / Grafana | Generic APM -- doesn't understand denial rates, guardrail decisions, or agent costs |
| Build your own | Months of engineering, no standard format, no z-score baselines |
theaios-agent-monitor is governance-first (kill switches and compliance, not just dashboards), statistical (z-score anomaly detection, not threshold-only), instant (in-process, ~0.1ms/event, no external services), and declarative (YAML configs that ops teams can read).
Source Types
| Source | What it provides |
|---|---|
| Events | Append-only log of every agent action |
| Metrics | Rolling window: event_count, denial_rate, cost_per_minute, avg_latency_ms |
| Baselines | Welford's algorithm: running mean, stddev, z-score |
| Anomalies | Z-score threshold alerts with cooldown |
| Kill switches | Agent/session/global kill with auto-policies |
| Compliance | SOC 2, GDPR, JSON export |
Documentation
Full documentation at cohorte-ai.github.io/agent-monitor -- including the config syntax reference, events, metrics & baselines, kill switches, compliance, and integration guide.
Part of the theaios Ecosystem
theaios-agent-monitor is one of the theaios platform components. It works standalone or alongside:
- theaios-guardrails -- declarative guardrails for AI agent governance
- theaios-context-router -- intelligent context routing for AI agents
License
Apache 2.0 -- see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file theaios_agent_monitor-0.1.1.tar.gz.
File metadata
- Download URL: theaios_agent_monitor-0.1.1.tar.gz
- Upload date:
- Size: 88.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
290aad4f61ebf3dc1083041e493c3aaf1596f2d3ccfc588538fa3762b4ebb38f
|
|
| MD5 |
89706adbc8b0ae4e15a8748bb6be580d
|
|
| BLAKE2b-256 |
200b1adf36fb6f861e57b9b5af341c73754876e82f8f1722b0f6078660c9b9fb
|
File details
Details for the file theaios_agent_monitor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: theaios_agent_monitor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 42.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7e7c2ba2ac578761bf808b8bd06d3203659580687fe0492bbbe5210c20fcf4a
|
|
| MD5 |
484624fb39fc4c9ca5efd0e8c064e299
|
|
| BLAKE2b-256 |
d497852ae1a16ae1a337f6f355b1d38cd5fdc27748346a6d78ad66ad4f01228a
|