Production observability for Agent OS - OpenTelemetry traces, Prometheus metrics, Grafana dashboards
Project description
Agent OS Observability
Part of Agent OS - Kernel-level governance for AI agents
Production-ready observability stack for Agent OS kernel.
Status: Alpha
This package provides metrics, tracing, and dashboards for monitoring Agent OS deployments.
Features
- Prometheus Metrics: Kernel, agent, and CMVK metrics
- OpenTelemetry Tracing: Distributed tracing for agent operations
- Grafana Dashboards: Pre-built dashboards for SOC, ML Ops, and SRE teams
- Prometheus Alerts: Safety, performance, and availability alerts
Quick Start
Install Package
pip install agent-os-kernel[observability]
Basic Usage
from agent_os_observability import KernelMetrics, KernelTracer
# Initialize metrics
metrics = KernelMetrics()
# Record policy check
with metrics.policy_check_latency():
result = policy_engine.check(action)
# Record violation
if not result.allowed:
metrics.record_violation(agent_id, action, policy="data-access", severity="high")
metrics.record_blocked(agent_id, action)
# CMVK metrics
metrics.record_cmvk_verification(
result="verified",
confidence=0.95,
drift_score=0.08,
duration_seconds=2.3,
model_count=3
)
# Expose /metrics endpoint (FastAPI example)
from fastapi import FastAPI, Response
app = FastAPI()
@app.get("/metrics")
def get_metrics():
return Response(
content=metrics.export(),
media_type=metrics.content_type()
)
Full Observability Stack (Docker)
cd agent-governance-python/agent-os/modules/observability
docker-compose up -d
# Open dashboards
open http://localhost:3000 # Grafana (admin/admin)
open http://localhost:16686 # Jaeger
open http://localhost:9090 # Prometheus
Metrics Reference
Kernel Metrics
| Metric | Type | Description |
|---|---|---|
agent_os_violations_total |
Counter | Policy violations by agent, action, policy, severity |
agent_os_violations_blocked_total |
Counter | Violations blocked (SIGKILL issued) |
agent_os_violation_rate |
Gauge | Violations per 1000 requests |
agent_os_policy_check_duration_seconds |
Histogram | Policy check latency |
agent_os_signals_total |
Counter | Signals sent by type and reason |
agent_os_sigkill_total |
Counter | SIGKILL signals by agent and reason |
agent_os_mttr_seconds |
Histogram | Mean Time To Recovery |
agent_os_kernel_uptime_seconds |
Gauge | Kernel uptime |
CMVK Metrics
| Metric | Type | Description |
|---|---|---|
agent_os_cmvk_verifications_total |
Counter | Verifications by result (verified/flagged/rejected) |
agent_os_cmvk_consensus_ratio |
Gauge | Current model agreement (0.0-1.0) |
agent_os_cmvk_model_disagreements_total |
Counter | Disagreements by model pair |
agent_os_cmvk_drift_score |
Histogram | Drift score distribution |
agent_os_cmvk_verification_duration_seconds |
Histogram | Verification latency |
agent_os_cmvk_model_latency_seconds |
Histogram | Per-model response latency |
Agent Metrics
| Metric | Type | Description |
|---|---|---|
agent_os_agent_llm_calls_total |
Counter | LLM API calls by agent and model |
agent_os_agent_errors_total |
Counter | Errors by agent and type |
agent_os_agent_execution_duration_seconds |
Histogram | Task execution time |
Dashboards
agent-os-overview (10 panels)
Main dashboard for SOC teams: violation rate, SIGKILL count, latency, throughput.
agent-os-cmvk (12 panels)
ML Ops dashboard: consensus rate, drift scores, model latency, verification results.
agent-os-amb (13 panels)
AMB (Agent Message Bus): throughput, queue depth, backpressure, delivery latency.
agent-os-safety (1 panel)
CISO dashboard: 30-day violation count.
Export Dashboards
python scripts/export_dashboards.py
This creates JSON files in grafana/dashboards/ for Grafana provisioning.
Alerts
Alert rules are defined in alerts/agent-os-alerts.yaml:
Critical Alerts (Page Immediately)
AgentOSHighViolationRate: Violation rate >1%AgentOSSIGKILLSpike: >5 SIGKILL in 5 minutesAgentOSKernelCrash: Kernel panic
Warning Alerts
AgentOSHighPolicyLatency: p99 latency >10msCMVKLowConsensus: Consensus <80%CMVKHighDrift: p95 drift >0.25
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Agent OS │ │ KernelMetrics │ │
│ │ Kernel │──│ .export() │───► /metrics │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Docker Compose Stack │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Prometheus │─►│ Grafana │ │ Jaeger │ │
│ │ :9090 │ │ :3000 │ │ :16686 │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │ ▲ ▲ │
│ ▼ │ │ │
│ ┌────────────┐ │ ┌────────────┐ │
│ │AlertManager│ │ │ OTEL │ │
│ │ :9093 │ │ │ Collector │ │
│ └────────────┘ │ └────────────┘ │
│ │ │ ▲ │
│ ▼ │ │ │
│ [Slack/PagerDuty] └───────────────┘ │
└─────────────────────────────────────────────────────────────┘
Development
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Export dashboards
python scripts/export_dashboards.py
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentmesh_observability-3.4.0.tar.gz.
File metadata
- Download URL: agentmesh_observability-3.4.0.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: RestSharp/106.13.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ff467bf60a476383159f8635f4086890242e936d44190f4ae8e43d49ce5a032
|
|
| MD5 |
bc5791a4f6bdfd141f1d23c7656a5311
|
|
| BLAKE2b-256 |
54e24266e5bfa2bb4d742452f0ecfb8e475a01bc4355cdf77f5101f0ca807b39
|
File details
Details for the file agentmesh_observability-3.4.0-py3-none-any.whl.
File metadata
- Download URL: agentmesh_observability-3.4.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: RestSharp/106.13.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bfe582c78c143736e561b62effc023d561d926b8b568a4ef094e0395e222f8f
|
|
| MD5 |
209d1c69268aa4d8514b4804256a1807
|
|
| BLAKE2b-256 |
19cca008caf18dfccb18950982a33b6908cc8627365d76f4143e2c18aaa9fe26
|