Skip to main content

Framework-agnostic observability, audit, and eval for AI agent applications

Project description

agent-observe

Framework-agnostic observability, audit, and eval for AI agent applications.

Python 3.9+ License: MIT

Overview

agent-observe is a lightweight runtime layer that provides:

  • Observability - Track agent runs, tool calls, and model invocations
  • Audit/Compliance - Policy engine with deny/allow patterns for tools
  • Label-free Eval - Automatic risk scoring based on behavioral signals
  • Tool Replay - Cache tool results for deterministic testing
  • Local Viewer - FastAPI UI for browsing and debugging runs

Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.

Installation

# Core package
pip install agent-observe

# With viewer UI
pip install agent-observe[viewer]

# With PostgreSQL support
pip install agent-observe[postgres]

# All extras
pip install agent-observe[all]

Quick Start

from agent_observe import observe, tool, model_call

# Initialize (zero-config, auto-detects environment)
observe.install()

# Define tools
@tool(name="search", kind="http")
def search_web(query: str) -> list[dict]:
    # Your implementation
    return [{"title": "Result", "url": "https://..."}]

@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
    # Your LLM call
    return "Response..."

# Run your agent
with observe.run("my-agent", task={"goal": "Research topic"}):
    results = search_web("AI agents")
    analysis = call_llm(f"Analyze: {results}")
    observe.emit_artifact("analysis", analysis)

View the results:

agent-observe view

Async Support

Full async/await support for modern agent frameworks:

@tool(name="fetch_data", kind="http")
async def fetch_data(url: str) -> dict:
    async with httpx.AsyncClient() as client:
        return await client.get(url)

@model_call(provider="anthropic", model="claude-3")
async def call_claude(prompt: str) -> str:
    return await anthropic.messages.create(...)

# Use async context manager
async with observe.arun("async-agent"):
    data = await fetch_data("https://api.example.com")
    response = await call_claude(f"Analyze: {data}")

Framework Integration

See AGENTS.md for detailed integration examples with:

  • OpenAI Function Calling
  • Anthropic Claude
  • Google Vertex AI / Gemini
  • LangChain
  • Custom ReAct agents

Features

Zero-Config Defaults

Just call observe.install() - it automatically:

  • Selects the right sink based on environment
  • Uses SQLite for local dev, Postgres if DATABASE_URL is set
  • Captures metadata only (enterprise-safe)

Automatic Sink Selection

Condition Sink
DATABASE_URL set PostgreSQL
OTEL_EXPORTER_OTLP_ENDPOINT set OTLP (OpenTelemetry)
AGENT_OBSERVE_ENV=dev SQLite
Default JSONL

Policy Engine

Create .riff/observe.policy.yml:

tools:
  allow:
    - "db.*"
    - "http.*"
  deny:
    - "shell.*"
    - "*.destructive"

limits:
  max_tool_calls: 100
  max_retries: 10
  max_model_calls: 50

Coming Soon: SQL query validation and network domain restrictions are planned for a future release.

Risk Scoring

Automatic risk scoring (0-100) based on:

Signal Weight Tag
Policy violations +40 POLICY_VIOLATION
Tool success rate < 90% +25 TOOL_FAILURE
Repeated tool calls (loops) +15 LOOP_SUSPECTED
5+ retries +10 RETRY_STORM
Latency exceeds budget +10 LATENCY_BREACH

Capture Modes

Mode Description
off Disable observability
metadata_only Store hashes, sizes, timings only (default)
evidence_only Store small blobs with redaction
full Store all content (with caps)

Environment Variables

# Core
AGENT_OBSERVE_MODE=metadata_only    # off|metadata_only|evidence_only|full
AGENT_OBSERVE_ENV=prod              # dev|staging|prod
AGENT_OBSERVE_PROJECT=my-app        # Project name
AGENT_OBSERVE_AGENT_VERSION=1.0.0   # Agent version

# Sink selection
AGENT_OBSERVE_SINK=auto             # auto|sqlite|jsonl|postgres|otlp
DATABASE_URL=postgresql://...     # Enables Postgres sink

# Policy
AGENT_OBSERVE_POLICY_FILE=.riff/observe.policy.yml
AGENT_OBSERVE_FAIL_ON_VIOLATION=0   # 1 to raise on violations

# Replay
AGENT_OBSERVE_REPLAY=off            # off|write|read

# Performance
AGENT_OBSERVE_LATENCY_BUDGET_MS=20000

CLI

# Start the viewer
agent-observe view
agent-observe view --port 8080

# Export to JSONL
agent-observe export-jsonl -o ./export

# With specific database
agent-observe view --db .riff/observe.db
agent-observe view --database-url postgresql://...

API Reference

Core

from agent_observe import observe

# Initialize (reads from environment variables automatically)
observe.install()

# Or with mode override
observe.install(mode="metadata_only")

# Create a run context
with observe.run("agent-name", task={"goal": "..."}) as run:
    pass

# Emit events
observe.emit_event("custom.event", {"key": "value"})

# Emit artifacts
observe.emit_artifact("report", {"data": "..."}, provenance=["tool1", "tool2"])

Explicit Configuration

When you need full control, pass a Config object. Important: You must include all connection strings explicitly - they are NOT read from environment variables when using a custom Config:

import os
from agent_observe import observe
from agent_observe.config import Config, CaptureMode, Environment, SinkType

# Option 1: Let the library auto-detect from env vars (recommended)
observe.install()

# Option 2: Explicit config - must include ALL required fields
database_url = os.environ.get("DATABASE_URL")

config = Config(
    mode=CaptureMode.METADATA_ONLY,
    env=Environment.PROD,
    sink_type=SinkType.POSTGRES,
    project="my-agent",
    database_url=database_url,  # Required for Postgres!
)
observe.install(config=config)

Decorators

from agent_observe import tool, model_call

@tool(name="my_tool", kind="db", version="1")
def my_tool(arg: str) -> dict:
    pass

@model_call(provider="openai", model="gpt-4")
def call_model(prompt: str) -> str:
    pass

Architecture

agent_observe/
├── observe.py      # Core runtime (install, run, emit_*)
├── decorators.py   # @tool, @model_call (sync and async)
├── policy.py       # YAML policy engine
├── metrics.py      # Risk scoring and eval
├── replay.py       # Tool result caching
├── sinks/
│   ├── sqlite_sink.py   # Local dev
│   ├── jsonl_sink.py    # Fallback
│   ├── postgres_sink.py # Production (Neon, Supabase compatible)
│   └── otel_sink.py     # OTLP export (Jaeger, Honeycomb, Datadog, etc.)
└── viewer/
    └── app.py      # FastAPI viewer

PostgreSQL Sink Design

The Postgres sink follows production best practices:

  • Parameterized queries - All queries use %s placeholders (SQL injection safe)
  • Batch inserts - Uses executemany for efficient bulk writes
  • Retry with backoff - Transient connection errors retry with exponential backoff
  • Connection timeout - 10-second timeout prevents hanging connections
  • Graceful degradation - Works with pre-created tables (no CREATE permission needed)
  • Efficient schema checks - Single query to verify all tables exist

Roadmap

  • Auto-instrumentation for OpenAI SDK
  • Auto-instrumentation for Anthropic SDK
  • SQL query validation policies
  • Network domain restriction policies
  • Streaming support for LLM responses
  • Sampling for high-volume production

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .

# Type checking
mypy agent_observe

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_observe-0.1.1.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_observe-0.1.1-py3-none-any.whl (58.0 kB view details)

Uploaded Python 3

File details

Details for the file agent_observe-0.1.1.tar.gz.

File metadata

  • Download URL: agent_observe-0.1.1.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fa5d2173f3aca451c9b617a6b14b5dc3e9c87746a0119810198cc9e82f3166d5
MD5 8be5a1ea005d4d3fcfee77117c761f15
BLAKE2b-256 da607b34eba6f09bf066d22f151eed3161ed1402b812c7e7fe2008f6e9853456

See more details on using hashes here.

File details

Details for the file agent_observe-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: agent_observe-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 58.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7da13c6e36917beb8669dbc5be6fbc4acef286067b039123ab7d5c84a4de992a
MD5 5e07bbca6faec55a43801c8ff8856cba
BLAKE2b-256 28f7650ddcaca2ebb62a4bb6a128d66be67c9d6b66ca9e4217593dc83d969c6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page