Skip to main content

Framework-agnostic observability, audit, and eval for AI agent applications

Project description

agent-observe

Framework-agnostic observability, audit, and eval for AI agent applications.

Python 3.9+ License: MIT

What is this?

agent-observe is a lightweight runtime layer that wraps your AI agent code to capture:

  • What tools were called and when
  • What LLM calls were made and how long they took
  • Policy violations (blocked operations)
  • Risk scores based on behavioral signals

Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.

Installation

pip install agent-observe

# With PostgreSQL support
pip install agent-observe[postgres]

# With viewer UI
pip install agent-observe[viewer]

Quick Start

from agent_observe import observe, tool, model_call

# Initialize (zero-config)
observe.install()

# Wrap your tools
@tool(name="search", kind="http")
def search_web(query: str) -> list:
    return requests.get(f"https://api.search.com?q={query}").json()

# Wrap your LLM calls
@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
    return openai.chat.completions.create(...).choices[0].message.content

# Run your agent
with observe.run("my-agent", task={"goal": "Research AI"}):
    results = search_web("AI agents")
    analysis = call_llm(f"Analyze: {results}")

View results:

agent-observe view
# Open http://localhost:8765

Documentation

Document Description
Examples Runnable code examples (basic usage, async, policies)
Data Model What are Runs, Spans, Events, and Replay Cache?
Capture Modes What data is stored? Hashes vs full content
Configuration Environment variables and Config options
Usage Guide Policies, risk scoring, querying, real-world examples
Integration Guide How to integrate with OpenAI, Anthropic, LangChain, etc.

Key Concepts

Runs, Spans, and Events

┌─────────────────────────────────────────────────────────────┐
│                        observe.run()                         │
│                           (Run)                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │ @tool       │  │ @model_call │  │ emit_event  │          │
│  │  (Span)     │  │   (Span)    │  │  (Event)    │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
└─────────────────────────────────────────────────────────────┘
  • Run = One agent execution (start to finish)
  • Span = One tool or model call within a run
  • Event = Custom occurrence you emit

See Data Model for details.

Capture Modes

Mode What's Stored Use Case
metadata_only Hashes, timings Production (default)
evidence_only Small content + hashes Debugging
full Everything Development

Default is metadata_only - enterprise-safe, no PII leakage.

See Capture Modes for details.

Risk Scoring

Automatic risk scoring (0-100) based on:

Signal Weight
Policy violations +40
Tool success rate < 90% +25
Repeated tool calls (loops) +15
5+ retries +10
Latency exceeds budget +10

Configuration

Zero-Config (Recommended)

observe.install()  # Reads from environment variables

Environment Variables

AGENT_OBSERVE_MODE=metadata_only    # Capture mode
AGENT_OBSERVE_ENV=prod              # Environment
DATABASE_URL=postgresql://...       # Enables Postgres sink

See Configuration for all options.

Explicit Config

from agent_observe.config import Config, CaptureMode, SinkType

config = Config(
    mode=CaptureMode.FULL,
    sink_type=SinkType.POSTGRES,
    database_url=os.environ.get("DATABASE_URL"),
)
observe.install(config=config)

Sinks (Storage Backends)

Sink Use Case
SQLite Local development
PostgreSQL Production
JSONL Simple fallback
OTLP OpenTelemetry export (Jaeger, Honeycomb, Datadog)

Auto-selected based on available connections.

Policy Engine

Create .riff/observe.policy.yml:

tools:
  allow:
    - "db.*"
    - "http.*"
  deny:
    - "shell.*"
    - "*.destructive"

limits:
  max_tool_calls: 100
  max_model_calls: 50

CLI

# Start viewer
agent-observe view

# Export to JSONL
agent-observe export-jsonl -o ./export

Architecture

agent_observe/
├── observe.py      # Core runtime
├── decorators.py   # @tool, @model_call
├── policy.py       # YAML policy engine
├── metrics.py      # Risk scoring
├── replay.py       # Tool result caching
├── sinks/          # Storage backends
└── viewer/         # FastAPI UI

Development

pip install -e ".[dev]"
pytest
ruff check .

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_observe-0.1.3.tar.gz (79.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_observe-0.1.3-py3-none-any.whl (57.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_observe-0.1.3.tar.gz.

File metadata

  • Download URL: agent_observe-0.1.3.tar.gz
  • Upload date:
  • Size: 79.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b7e3d5a9bd9b6b61747c02123256aa99e5c1712ac97439b8cfec9572aed95603
MD5 ed0254966e36ee7d27a63339181a9582
BLAKE2b-256 67865bdecee311d042ec99f8c2bbb59646ce8fe25f84b61be2447dadbb401e5a

See more details on using hashes here.

File details

Details for the file agent_observe-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: agent_observe-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 57.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e4c751c5a635b52b2e4bb6db62ed6d8eda09d61e5d5a4f7b0c61c190eab9a74d
MD5 64be656e64daca30d9754e90f48bb9dd
BLAKE2b-256 e7aef57b6017ffe388e0e90614ef48c7acfccf68c4f48f9da0c79a8bbb292f50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page