Skip to main content

Framework-agnostic observability, audit, and eval for AI agent applications

Project description

agent-observe

Framework-agnostic observability, audit, and eval for AI agent applications.

Python 3.9+ License: MIT

What is this?

agent-observe is a lightweight runtime layer that wraps your AI agent code to capture:

  • What tools were called and when
  • What LLM calls were made and how long they took
  • Policy violations (blocked operations)
  • Risk scores based on behavioral signals

Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.

Installation

pip install agent-observe

# With PostgreSQL support
pip install agent-observe[postgres]

# With viewer UI
pip install agent-observe[viewer]

Quick Start

from agent_observe import observe, tool, model_call

# Initialize (zero-config)
observe.install()

# Wrap your tools
@tool(name="search", kind="http")
def search_web(query: str) -> list:
    return requests.get(f"https://api.search.com?q={query}").json()

# Wrap your LLM calls
@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
    return openai.chat.completions.create(...).choices[0].message.content

# Run your agent
with observe.run("my-agent", task={"goal": "Research AI"}):
    results = search_web("AI agents")
    analysis = call_llm(f"Analyze: {results}")

View results:

agent-observe view
# Open http://localhost:8765

Documentation

Document Description
Examples Runnable code examples (basic usage, async, policies)
Data Model What are Runs, Spans, Events, and Replay Cache?
Capture Modes What data is stored? Hashes vs full content
Configuration Environment variables and Config options
Usage Guide Policies, risk scoring, querying, real-world examples
Integration Guide How to integrate with OpenAI, Anthropic, LangChain, etc.

Key Concepts

Runs, Spans, and Events

┌─────────────────────────────────────────────────────────────┐
│                        observe.run()                         │
│                           (Run)                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │ @tool       │  │ @model_call │  │ emit_event  │          │
│  │  (Span)     │  │   (Span)    │  │  (Event)    │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
└─────────────────────────────────────────────────────────────┘
  • Run = One agent execution (start to finish)
  • Span = One tool or model call within a run
  • Event = Custom occurrence you emit

See Data Model for details.

Capture Modes

Mode What's Stored Use Case
metadata_only Hashes, timings Production (default)
evidence_only Small content + hashes Debugging
full Everything Development

Default is metadata_only - enterprise-safe, no PII leakage.

See Capture Modes for details.

Risk Scoring

Automatic risk scoring (0-100) based on:

Signal Weight
Policy violations +40
Tool success rate < 90% +25
Repeated tool calls (loops) +15
5+ retries +10
Latency exceeds budget +10

Configuration

Zero-Config (Recommended)

observe.install()  # Reads from environment variables

Environment Variables

AGENT_OBSERVE_MODE=metadata_only    # Capture mode
AGENT_OBSERVE_ENV=prod              # Environment
DATABASE_URL=postgresql://...       # Enables Postgres sink

See Configuration for all options.

Explicit Config

from agent_observe.config import Config, CaptureMode, SinkType

config = Config(
    mode=CaptureMode.FULL,
    sink_type=SinkType.POSTGRES,
    database_url=os.environ.get("DATABASE_URL"),
)
observe.install(config=config)

Sinks (Storage Backends)

Sink Use Case
SQLite Local development
PostgreSQL Production
JSONL Simple fallback
OTLP OpenTelemetry export (Jaeger, Honeycomb, Datadog)

Auto-selected based on available connections.

Policy Engine

Create .riff/observe.policy.yml:

tools:
  allow:
    - "db.*"
    - "http.*"
  deny:
    - "shell.*"
    - "*.destructive"

limits:
  max_tool_calls: 100
  max_model_calls: 50

CLI

# Start viewer
agent-observe view

# Export to JSONL
agent-observe export-jsonl -o ./export

Architecture

agent_observe/
├── observe.py      # Core runtime
├── decorators.py   # @tool, @model_call
├── policy.py       # YAML policy engine
├── metrics.py      # Risk scoring
├── replay.py       # Tool result caching
├── sinks/          # Storage backends
└── viewer/         # FastAPI UI

Development

pip install -e ".[dev]"
pytest
ruff check .

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_observe-0.1.6.tar.gz (88.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_observe-0.1.6-py3-none-any.whl (62.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_observe-0.1.6.tar.gz.

File metadata

  • Download URL: agent_observe-0.1.6.tar.gz
  • Upload date:
  • Size: 88.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.6.tar.gz
Algorithm Hash digest
SHA256 cf566435bf0760f102435bf7ec6b7c95eaf64e3fee5cc5e0b75601c9fade3d9c
MD5 2610c86fd4dcd402ab0d8a1b793f673d
BLAKE2b-256 f25749c21a25b06342080997fc71a2f95b5dd3f37797a52a5f82a8f2a89e63c1

See more details on using hashes here.

File details

Details for the file agent_observe-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: agent_observe-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 62.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 deffd1910b75f16e2b37834f1066c0f5544ec6e14229ecfeff409cd89d88d1fc
MD5 6cbfa9389b3d7e2d42eeaa070516ca47
BLAKE2b-256 f8cc79bbb57f5200acaa86ed19f21b310371a122963cd48bed4ee97a42f93e2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page