Skip to main content

Governance-first observability for AI agents

Project description

Governance-first observability for AI agents -- real-time metrics, anomaly detection, kill switches, compliance export.

License PyPI Docs Follow @CohorteAI

[!NOTE] Part of the theaios ecosystem. Install with pip install theaios-agent-monitor.

What It Does

Record every agent event. Compute real-time metrics over rolling windows. Detect anomalies via z-score baselines. Kill misbehaving agents instantly. Export compliance reports. All in-process, no external services, ~0.1ms per event.

This is not LangSmith, Langfuse, or Arize. Those are tracing platforms that collect data. This library collects data and acts on it -- anomaly detection triggers alerts, kill switches stop agents, compliance export generates audit reports. Governance, not just observation.

  • Event collection -- record agent actions, guardrail triggers, denials, approvals, costs, errors
  • Real-time metrics -- event count, denial rate, cost/minute, average latency over configurable rolling windows
  • Statistical baselines -- Welford's online algorithm for running mean and stddev, z-score anomaly detection
  • Kill switches -- instant agent/session/global kill, automatic kill policies on metric thresholds, persistence across restarts
  • Anomaly detection -- z-score rules with configurable thresholds, cooldown periods, wildcard agent matching
  • Compliance export -- SOC 2, GDPR, JSON reports with filtering by agent, time range, event type
  • Alert channels -- console, JSONL file, webhook (Slack, PagerDuty, OpsGenie)
  • OpenTelemetry -- optional export to any OTel-compatible backend
  • Guardrails adapter -- auto-record every theaios-guardrails decision

Quick Start

pip install theaios-agent-monitor

1. Write a config:

# monitor.yaml
version: "1.0"
metadata:
  name: my-monitor
  description: Production agent monitoring

metrics:
  default_window_seconds: 300

kill_switch:
  enabled: true
  policies:
    - name: auto-kill-on-high-cost
      metric: cost_per_minute
      operator: ">"
      threshold: 5.0
      action: kill_agent
      severity: critical

alerts:
  channels:
    - type: console

2. Use it:

import time
from theaios.agent_monitor import Monitor, load_config, AgentEvent

monitor = Monitor(load_config("monitor.yaml"))

# Record events
monitor.record(AgentEvent(
    timestamp=time.time(),
    event_type="action",
    agent="sales-agent",
    cost_usd=0.007,
    latency_ms=350.0,
    data={"model": "gpt-4"},
))

# View metrics
snap = monitor.get_metrics("sales-agent")
print(f"Events: {snap.event_count}")
print(f"Cost/min: ${snap.cost_per_minute:.4f}")
print(f"Denial rate: {snap.denial_rate:.1%}")

# Kill an agent
monitor.kill_agent("sales-agent", reason="Cost spike detected")

Events tell the monitor what's happening. Each event has an event_type, an agent, a timestamp, and optional fields for cost, latency, session, and arbitrary data:

import time

# Action event with cost and latency
monitor.record(AgentEvent(
    timestamp=time.time(), event_type="action", agent="my-agent",
    cost_usd=0.007, latency_ms=350.0,
    data={"model": "gpt-4"},
))

# Denial (guardrail blocked the request — feeds denial_rate metric)
monitor.record(AgentEvent(
    timestamp=time.time(), event_type="denial", agent="my-agent",
    data={"rule": "block-injection", "severity": "critical"},
))

# Guardrail trigger (non-denial, e.g., redact or log)
monitor.record(AgentEvent(
    timestamp=time.time(), event_type="guardrail_trigger", agent="my-agent",
    data={"rule": "redact-pii", "outcome": "redact"},
))

# Error
monitor.record(AgentEvent(
    timestamp=time.time(), event_type="error", agent="my-agent",
    data={"error_type": "TimeoutError", "message": "LLM call timed out"},
))

3. CLI:

agent-monitor -c monitor.yaml validate
agent-monitor -c monitor.yaml inspect
agent-monitor -c monitor.yaml status --agent sales-agent
agent-monitor -c monitor.yaml events --agent sales-agent --type action
agent-monitor -c monitor.yaml kill sales-agent --reason "Cost spike"
agent-monitor -c monitor.yaml revive sales-agent
agent-monitor -c monitor.yaml export --format soc2

Why This Library?

Every agentic system needs monitoring. The options today:

Approach Problem
LangSmith / Langfuse Tracing only -- no kill switches, no anomaly detection, no compliance export
Arize / Weights & Biases ML-focused, not agent-governance-focused, expensive at scale
Datadog / Grafana Generic APM -- doesn't understand denial rates, guardrail decisions, or agent costs
Build your own Months of engineering, no standard format, no z-score baselines

theaios-agent-monitor is governance-first (kill switches and compliance, not just dashboards), statistical (z-score anomaly detection, not threshold-only), instant (in-process, ~0.1ms/event, no external services), and declarative (YAML configs that ops teams can read).

Source Types

Source What it provides
Events Append-only log of every agent action
Metrics Rolling window: event_count, denial_rate, cost_per_minute, avg_latency_ms
Baselines Welford's algorithm: running mean, stddev, z-score
Anomalies Z-score threshold alerts with cooldown
Kill switches Agent/session/global kill with auto-policies
Compliance SOC 2, GDPR, JSON export

Documentation

Full documentation at cohorte-ai.github.io/agent-monitor -- including the config syntax reference, events, metrics & baselines, kill switches, compliance, and integration guide.

Part of the theaios Ecosystem

theaios-agent-monitor is one of the theaios platform components. It works standalone or alongside:

License

Apache 2.0 -- see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theaios_agent_monitor-0.1.1.tar.gz (88.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

theaios_agent_monitor-0.1.1-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file theaios_agent_monitor-0.1.1.tar.gz.

File metadata

  • Download URL: theaios_agent_monitor-0.1.1.tar.gz
  • Upload date:
  • Size: 88.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for theaios_agent_monitor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 290aad4f61ebf3dc1083041e493c3aaf1596f2d3ccfc588538fa3762b4ebb38f
MD5 89706adbc8b0ae4e15a8748bb6be580d
BLAKE2b-256 200b1adf36fb6f861e57b9b5af341c73754876e82f8f1722b0f6078660c9b9fb

See more details on using hashes here.

File details

Details for the file theaios_agent_monitor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for theaios_agent_monitor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7e7c2ba2ac578761bf808b8bd06d3203659580687fe0492bbbe5210c20fcf4a
MD5 484624fb39fc4c9ca5efd0e8c064e299
BLAKE2b-256 d497852ae1a16ae1a337f6f355b1d38cd5fdc27748346a6d78ad66ad4f01228a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page