Framework-agnostic observability, audit, and eval for AI agent applications

These details have not been verified by PyPI

Project links

Project description

agent-observe

Framework-agnostic observability, audit, and eval for AI agent applications.

Overview

agent-observe is a lightweight runtime layer that provides:

Observability - Track agent runs, tool calls, and model invocations
Audit/Compliance - Policy engine with deny/allow patterns for tools
Label-free Eval - Automatic risk scoring based on behavioral signals
Tool Replay - Cache tool results for deterministic testing
Local Viewer - FastAPI UI for browsing and debugging runs

Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.

Installation

# Core package
pip install agent-observe

# With viewer UI
pip install agent-observe[viewer]

# With PostgreSQL support
pip install agent-observe[postgres]

# All extras
pip install agent-observe[all]

Quick Start

from agent_observe import observe, tool, model_call

# Initialize (zero-config, auto-detects environment)
observe.install()

# Define tools
@tool(name="search", kind="http")
def search_web(query: str) -> list[dict]:
    # Your implementation
    return [{"title": "Result", "url": "https://..."}]

@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
    # Your LLM call
    return "Response..."

# Run your agent
with observe.run("my-agent", task={"goal": "Research topic"}):
    results = search_web("AI agents")
    analysis = call_llm(f"Analyze: {results}")
    observe.emit_artifact("analysis", analysis)

View the results:

agent-observe view

Async Support

Full async/await support for modern agent frameworks:

@tool(name="fetch_data", kind="http")
async def fetch_data(url: str) -> dict:
    async with httpx.AsyncClient() as client:
        return await client.get(url)

@model_call(provider="anthropic", model="claude-3")
async def call_claude(prompt: str) -> str:
    return await anthropic.messages.create(...)

# Use async context manager
async with observe.arun("async-agent"):
    data = await fetch_data("https://api.example.com")
    response = await call_claude(f"Analyze: {data}")

Framework Integration

See AGENTS.md for detailed integration examples with:

OpenAI Function Calling
Anthropic Claude
Google Vertex AI / Gemini
LangChain
Custom ReAct agents

Features

Zero-Config Defaults

Just call observe.install() - it automatically:

Selects the right sink based on environment
Uses SQLite for local dev, Postgres if DATABASE_URL is set
Captures metadata only (enterprise-safe)

Automatic Sink Selection

Condition	Sink
`DATABASE_URL` set	PostgreSQL
`OTEL_EXPORTER_OTLP_ENDPOINT` set	OTLP (OpenTelemetry)
`AGENT_OBSERVE_ENV=dev`	SQLite
Default	JSONL

Policy Engine

Create .riff/observe.policy.yml:

tools:
  allow:
    - "db.*"
    - "http.*"
  deny:
    - "shell.*"
    - "*.destructive"

limits:
  max_tool_calls: 100
  max_retries: 10
  max_model_calls: 50

Coming Soon: SQL query validation and network domain restrictions are planned for a future release.

Risk Scoring

Automatic risk scoring (0-100) based on:

Signal	Weight	Tag
Policy violations	+40	`POLICY_VIOLATION`
Tool success rate < 90%	+25	`TOOL_FAILURE`
Repeated tool calls (loops)	+15	`LOOP_SUSPECTED`
5+ retries	+10	`RETRY_STORM`
Latency exceeds budget	+10	`LATENCY_BREACH`

Capture Modes

Mode	Description
`off`	Disable observability
`metadata_only`	Store hashes, sizes, timings only (default)
`evidence_only`	Store small blobs with redaction
`full`	Store all content (with caps)

Environment Variables

# Core
AGENT_OBSERVE_MODE=metadata_only    # off|metadata_only|evidence_only|full
AGENT_OBSERVE_ENV=prod              # dev|staging|prod
AGENT_OBSERVE_PROJECT=my-app        # Project name
AGENT_OBSERVE_AGENT_VERSION=1.0.0   # Agent version

# Sink selection
AGENT_OBSERVE_SINK=auto             # auto|sqlite|jsonl|postgres|otlp
DATABASE_URL=postgresql://...     # Enables Postgres sink

# Policy
AGENT_OBSERVE_POLICY_FILE=.riff/observe.policy.yml
AGENT_OBSERVE_FAIL_ON_VIOLATION=0   # 1 to raise on violations

# Replay
AGENT_OBSERVE_REPLAY=off            # off|write|read

# Performance
AGENT_OBSERVE_LATENCY_BUDGET_MS=20000

CLI

# Start the viewer
agent-observe view
agent-observe view --port 8080

# Export to JSONL
agent-observe export-jsonl -o ./export

# With specific database
agent-observe view --db .riff/observe.db
agent-observe view --database-url postgresql://...

API Reference

Core

from agent_observe import observe

# Initialize (reads from environment variables automatically)
observe.install()

# Or with mode override
observe.install(mode="metadata_only")

# Create a run context
with observe.run("agent-name", task={"goal": "..."}) as run:
    pass

# Emit events
observe.emit_event("custom.event", {"key": "value"})

# Emit artifacts
observe.emit_artifact("report", {"data": "..."}, provenance=["tool1", "tool2"])

Explicit Configuration

When you need full control, pass a Config object. Important: You must include all connection strings explicitly - they are NOT read from environment variables when using a custom Config:

import os
from agent_observe import observe
from agent_observe.config import Config, CaptureMode, Environment, SinkType

# Option 1: Let the library auto-detect from env vars (recommended)
observe.install()

# Option 2: Explicit config - must include ALL required fields
database_url = os.environ.get("DATABASE_URL")

config = Config(
    mode=CaptureMode.METADATA_ONLY,
    env=Environment.PROD,
    sink_type=SinkType.POSTGRES,
    project="my-agent",
    database_url=database_url,  # Required for Postgres!
)
observe.install(config=config)

Decorators

from agent_observe import tool, model_call

@tool(name="my_tool", kind="db", version="1")
def my_tool(arg: str) -> dict:
    pass

@model_call(provider="openai", model="gpt-4")
def call_model(prompt: str) -> str:
    pass

Architecture

agent_observe/
├── observe.py      # Core runtime (install, run, emit_*)
├── decorators.py   # @tool, @model_call (sync and async)
├── policy.py       # YAML policy engine
├── metrics.py      # Risk scoring and eval
├── replay.py       # Tool result caching
├── sinks/
│   ├── sqlite_sink.py   # Local dev
│   ├── jsonl_sink.py    # Fallback
│   ├── postgres_sink.py # Production (Neon, Supabase compatible)
│   └── otel_sink.py     # OTLP export (Jaeger, Honeycomb, Datadog, etc.)
└── viewer/
    └── app.py      # FastAPI viewer

PostgreSQL Sink Design

The Postgres sink follows production best practices:

Parameterized queries - All queries use %s placeholders (SQL injection safe)
Batch inserts - Uses executemany for efficient bulk writes
Retry with backoff - Transient connection errors retry with exponential backoff
Connection timeout - 10-second timeout prevents hanging connections
Graceful degradation - Works with pre-created tables (no CREATE permission needed)
Efficient schema checks - Single query to verify all tables exist

Roadmap

Auto-instrumentation for OpenAI SDK
Auto-instrumentation for Anthropic SDK
SQL query validation policies
Network domain restriction policies
Streaming support for LLM responses
Sampling for high-volume production

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .

# Type checking
mypy agent_observe

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jan 10, 2026

0.1.7

Jan 5, 2026

0.1.6

Jan 5, 2026

0.1.4

Jan 4, 2026

0.1.3

Jan 4, 2026

0.1.2

Jan 4, 2026

This version

0.1.1

Jan 4, 2026

0.1.0

Jan 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_observe-0.1.1.tar.gz (69.6 kB view details)

Uploaded Jan 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_observe-0.1.1-py3-none-any.whl (58.0 kB view details)

Uploaded Jan 4, 2026 Python 3

File details

Details for the file agent_observe-0.1.1.tar.gz.

File metadata

Download URL: agent_observe-0.1.1.tar.gz
Upload date: Jan 4, 2026
Size: 69.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`fa5d2173f3aca451c9b617a6b14b5dc3e9c87746a0119810198cc9e82f3166d5`
MD5	`8be5a1ea005d4d3fcfee77117c761f15`
BLAKE2b-256	`da607b34eba6f09bf066d22f151eed3161ed1402b812c7e7fe2008f6e9853456`

See more details on using hashes here.

File details

Details for the file agent_observe-0.1.1-py3-none-any.whl.

File metadata

Download URL: agent_observe-0.1.1-py3-none-any.whl
Upload date: Jan 4, 2026
Size: 58.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7da13c6e36917beb8669dbc5be6fbc4acef286067b039123ab7d5c84a4de992a`
MD5	`5e07bbca6faec55a43801c8ff8856cba`
BLAKE2b-256	`28f7650ddcaca2ebb62a4bb6a128d66be67c9d6b66ca9e4217593dc83d969c6c`

See more details on using hashes here.

agent-observe 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-observe

Overview

Installation

Quick Start

Async Support

Framework Integration

Features

Zero-Config Defaults

Automatic Sink Selection

Policy Engine

Risk Scoring

Capture Modes

Environment Variables

CLI

API Reference

Core

Explicit Configuration

Decorators

Architecture

PostgreSQL Sink Design

Roadmap

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes