Framework-agnostic observability, audit, and eval for AI agent applications
Project description
agent-observe
Framework-agnostic observability, audit, and eval for AI agent applications.
Overview
agent-observe is a lightweight runtime layer that provides:
- Observability - Track agent runs, tool calls, and model invocations
- Audit/Compliance - Policy engine with deny/allow patterns for tools
- Label-free Eval - Automatic risk scoring based on behavioral signals
- Tool Replay - Cache tool results for deterministic testing
- Local Viewer - FastAPI UI for browsing and debugging runs
Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.
Installation
# Core package
pip install agent-observe
# With viewer UI
pip install agent-observe[viewer]
# With PostgreSQL support
pip install agent-observe[postgres]
# All extras
pip install agent-observe[all]
Quick Start
from agent_observe import observe, tool, model_call
# Initialize (zero-config, auto-detects environment)
observe.install()
# Define tools
@tool(name="search", kind="http")
def search_web(query: str) -> list[dict]:
# Your implementation
return [{"title": "Result", "url": "https://..."}]
@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
# Your LLM call
return "Response..."
# Run your agent
with observe.run("my-agent", task={"goal": "Research topic"}):
results = search_web("AI agents")
analysis = call_llm(f"Analyze: {results}")
observe.emit_artifact("analysis", analysis)
View the results:
agent-observe view
Async Support
Full async/await support for modern agent frameworks:
@tool(name="fetch_data", kind="http")
async def fetch_data(url: str) -> dict:
async with httpx.AsyncClient() as client:
return await client.get(url)
@model_call(provider="anthropic", model="claude-3")
async def call_claude(prompt: str) -> str:
return await anthropic.messages.create(...)
# Use async context manager
async with observe.arun("async-agent"):
data = await fetch_data("https://api.example.com")
response = await call_claude(f"Analyze: {data}")
Framework Integration
See AGENTS.md for detailed integration examples with:
- OpenAI Function Calling
- Anthropic Claude
- Google Vertex AI / Gemini
- LangChain
- Custom ReAct agents
Features
Zero-Config Defaults
Just call observe.install() - it automatically:
- Selects the right sink based on environment
- Uses SQLite for local dev, Postgres if
DATABASE_URLis set - Captures metadata only (enterprise-safe)
Automatic Sink Selection
| Condition | Sink |
|---|---|
DATABASE_URL set |
PostgreSQL |
OTEL_EXPORTER_OTLP_ENDPOINT set |
OTLP (OpenTelemetry) |
AGENT_OBSERVE_ENV=dev |
SQLite |
| Default | JSONL |
Policy Engine
Create .riff/observe.policy.yml:
tools:
allow:
- "db.*"
- "http.*"
deny:
- "shell.*"
- "*.destructive"
limits:
max_tool_calls: 100
max_retries: 10
max_model_calls: 50
Coming Soon: SQL query validation and network domain restrictions are planned for a future release.
Risk Scoring
Automatic risk scoring (0-100) based on:
| Signal | Weight | Tag |
|---|---|---|
| Policy violations | +40 | POLICY_VIOLATION |
| Tool success rate < 90% | +25 | TOOL_FAILURE |
| Repeated tool calls (loops) | +15 | LOOP_SUSPECTED |
| 5+ retries | +10 | RETRY_STORM |
| Latency exceeds budget | +10 | LATENCY_BREACH |
Capture Modes
| Mode | Description |
|---|---|
off |
Disable observability |
metadata_only |
Store hashes, sizes, timings only (default) |
evidence_only |
Store small blobs with redaction |
full |
Store all content (with caps) |
Environment Variables
# Core
AGENT_OBSERVE_MODE=metadata_only # off|metadata_only|evidence_only|full
AGENT_OBSERVE_ENV=prod # dev|staging|prod
AGENT_OBSERVE_PROJECT=my-app # Project name
AGENT_OBSERVE_AGENT_VERSION=1.0.0 # Agent version
# Sink selection
AGENT_OBSERVE_SINK=auto # auto|sqlite|jsonl|postgres|otlp
DATABASE_URL=postgresql://... # Enables Postgres sink
# Policy
AGENT_OBSERVE_POLICY_FILE=.riff/observe.policy.yml
AGENT_OBSERVE_FAIL_ON_VIOLATION=0 # 1 to raise on violations
# Replay
AGENT_OBSERVE_REPLAY=off # off|write|read
# Performance
AGENT_OBSERVE_LATENCY_BUDGET_MS=20000
CLI
# Start the viewer
agent-observe view
agent-observe view --port 8080
# Export to JSONL
agent-observe export-jsonl -o ./export
# With specific database
agent-observe view --db .riff/observe.db
agent-observe view --database-url postgresql://...
API Reference
Core
from agent_observe import observe
# Initialize (reads from environment variables automatically)
observe.install()
# Or with mode override
observe.install(mode="metadata_only")
# Create a run context
with observe.run("agent-name", task={"goal": "..."}) as run:
pass
# Emit events
observe.emit_event("custom.event", {"key": "value"})
# Emit artifacts
observe.emit_artifact("report", {"data": "..."}, provenance=["tool1", "tool2"])
Explicit Configuration
When you need full control, pass a Config object. Important: You must include
all connection strings explicitly - they are NOT read from environment variables
when using a custom Config:
import os
from agent_observe import observe
from agent_observe.config import Config, CaptureMode, Environment, SinkType
# Option 1: Let the library auto-detect from env vars (recommended)
observe.install()
# Option 2: Explicit config - must include ALL required fields
database_url = os.environ.get("DATABASE_URL")
config = Config(
mode=CaptureMode.METADATA_ONLY,
env=Environment.PROD,
sink_type=SinkType.POSTGRES,
project="my-agent",
database_url=database_url, # Required for Postgres!
)
observe.install(config=config)
Decorators
from agent_observe import tool, model_call
@tool(name="my_tool", kind="db", version="1")
def my_tool(arg: str) -> dict:
pass
@model_call(provider="openai", model="gpt-4")
def call_model(prompt: str) -> str:
pass
Architecture
agent_observe/
├── observe.py # Core runtime (install, run, emit_*)
├── decorators.py # @tool, @model_call (sync and async)
├── policy.py # YAML policy engine
├── metrics.py # Risk scoring and eval
├── replay.py # Tool result caching
├── sinks/
│ ├── sqlite_sink.py # Local dev
│ ├── jsonl_sink.py # Fallback
│ ├── postgres_sink.py # Production (Neon, Supabase compatible)
│ └── otel_sink.py # OTLP export (Jaeger, Honeycomb, Datadog, etc.)
└── viewer/
└── app.py # FastAPI viewer
PostgreSQL Sink Design
The Postgres sink follows production best practices:
- Parameterized queries - All queries use
%splaceholders (SQL injection safe) - Batch inserts - Uses
executemanyfor efficient bulk writes - Retry with backoff - Transient connection errors retry with exponential backoff
- Connection timeout - 10-second timeout prevents hanging connections
- Graceful degradation - Works with pre-created tables (no CREATE permission needed)
- Efficient schema checks - Single query to verify all tables exist
Roadmap
- Auto-instrumentation for OpenAI SDK
- Auto-instrumentation for Anthropic SDK
- SQL query validation policies
- Network domain restriction policies
- Streaming support for LLM responses
- Sampling for high-volume production
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check .
# Type checking
mypy agent_observe
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_observe-0.1.2.tar.gz.
File metadata
- Download URL: agent_observe-0.1.2.tar.gz
- Upload date:
- Size: 70.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
388ab986c183e10f7b5cec138aa0bbdc01145e12974b46c16c5e07a8716cedaf
|
|
| MD5 |
eef204284153f4e27098f14b3e9ae30a
|
|
| BLAKE2b-256 |
62262cdf8e5e536ef5acc9596135451667d2dad57672f4cd08155a1fbcd18865
|
File details
Details for the file agent_observe-0.1.2-py3-none-any.whl.
File metadata
- Download URL: agent_observe-0.1.2-py3-none-any.whl
- Upload date:
- Size: 58.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
821acd691d1239b51e9ef94d08abf3407e8162d3a0003719ccf82b665a2bb8ca
|
|
| MD5 |
962deee1454fe672905b48e0b98cb736
|
|
| BLAKE2b-256 |
3d494f30f5d67e1cbccf1a7936d913dc89aba2c0aba1e681b514ec4706e526a2
|