Skip to main content

Framework-agnostic observability, audit, and eval for AI agent applications

Project description

agent-observe

Framework-agnostic observability, audit, and eval for AI agent applications.

Python 3.9+ License: MIT

What is this?

agent-observe is a lightweight runtime layer that wraps your AI agent code to capture:

  • What tools were called and when
  • What LLM calls were made and how long they took
  • Full LLM context - system prompts, message history, tool definitions (v0.1.7+)
  • Run input/output - what the user asked, what the agent responded (v0.1.7+)
  • Session continuity - link runs in a conversation (v0.1.7+)
  • Policy violations (blocked operations)
  • Risk scores based on behavioral signals

As of v0.1.7, default mode is full capture - stores complete traces for debugging and audit.

Installation

pip install agent-observe

# With PostgreSQL support
pip install agent-observe[postgres]

# With viewer UI
pip install agent-observe[viewer]

Quick Start

from agent_observe import observe, tool, model_call

# Initialize (zero-config, defaults to full capture as of v0.1.7)
observe.install()

# Wrap your tools
@tool(name="search", kind="http")
def search_web(query: str) -> list:
    return requests.get(f"https://api.search.com?q={query}").json()

# Wrap your LLM calls
@model_call(provider="openai", model="gpt-4")
def call_llm(messages: list) -> str:
    return openai.chat.completions.create(
        model="gpt-4",
        messages=messages,
    ).choices[0].message.content

# Run your agent with context (v0.1.7+)
with observe.run(
    "my-agent",
    user_id="jane",              # Who triggered this?
    session_id="conv_123",       # Part of which conversation?
) as run:
    run.set_input("Research AI agents")  # Capture user request

    results = search_web("AI agents")
    analysis = call_llm([
        {"role": "system", "content": "You are a research assistant"},
        {"role": "user", "content": f"Analyze: {results}"},
    ])

    run.set_output(analysis)  # Capture final response

View results:

agent-observe view
# Open http://localhost:8765

Documentation

Document Description
Examples Runnable code examples (basic usage, async, policies)
Data Model What are Runs, Spans, Events, and Replay Cache?
Capture Modes What data is stored? Hashes vs full content
Configuration Environment variables and Config options
Usage Guide Policies, risk scoring, querying, real-world examples
Integration Guide How to integrate with OpenAI, Anthropic, LangChain, etc.

Key Concepts

Runs, Spans, and Events

┌─────────────────────────────────────────────────────────────┐
│                        observe.run()                         │
│                           (Run)                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │ @tool       │  │ @model_call │  │ emit_event  │          │
│  │  (Span)     │  │   (Span)    │  │  (Event)    │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
└─────────────────────────────────────────────────────────────┘
  • Run = One agent execution (start to finish)
  • Span = One tool or model call within a run
  • Event = Custom occurrence you emit

See Data Model for details.

Capture Modes

Mode What's Stored Use Case
full Everything (default as of v0.1.7) Development, debugging
evidence_only Small content + hashes (64KB limit) Production with audit needs
metadata_only Hashes, timings only High-security production

Default is full as of v0.1.7 - you install observability because you want to see what happened.

For minimal storage: observe.install(mode="metadata_only")

See Capture Modes for details.

Risk Scoring

Automatic risk scoring (0-100) based on:

Signal Weight
Policy violations +40
Tool success rate < 90% +25
Repeated tool calls (loops) +15
5+ retries +10
Latency exceeds budget +10

Configuration

Zero-Config (Recommended)

observe.install()  # Reads from environment variables

Environment Variables

AGENT_OBSERVE_MODE=full             # Capture mode (default: full as of v0.1.7)
AGENT_OBSERVE_ENV=prod              # Environment
DATABASE_URL=postgresql://...       # Enables Postgres sink

See Configuration for all options.

Explicit Config

from agent_observe.config import Config, CaptureMode, SinkType

config = Config(
    mode=CaptureMode.FULL,
    sink_type=SinkType.POSTGRES,
    database_url=os.environ.get("DATABASE_URL"),
)
observe.install(config=config)

Sinks (Storage Backends)

Sink Use Case
SQLite Local development
PostgreSQL Production
JSONL Simple fallback
OTLP OpenTelemetry export (Jaeger, Honeycomb, Datadog)

Auto-selected based on available connections.

Policy Engine

Create .riff/observe.policy.yml:

tools:
  allow:
    - "db.*"
    - "http.*"
  deny:
    - "shell.*"
    - "*.destructive"

limits:
  max_tool_calls: 100
  max_model_calls: 50

CLI

# Start viewer
agent-observe view

# Export to JSONL
agent-observe export-jsonl -o ./export

Architecture

agent_observe/
├── observe.py      # Core runtime
├── decorators.py   # @tool, @model_call
├── policy.py       # YAML policy engine
├── metrics.py      # Risk scoring
├── replay.py       # Tool result caching
├── sinks/          # Storage backends
└── viewer/         # FastAPI UI

Development

pip install -e ".[dev]"
pytest
ruff check .

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_observe-0.1.7.tar.gz (108.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_observe-0.1.7-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_observe-0.1.7.tar.gz.

File metadata

  • Download URL: agent_observe-0.1.7.tar.gz
  • Upload date:
  • Size: 108.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c82fc6665efd802c8ce94bc854182061d58316c133602e8595704b2ed5107e5e
MD5 c74a508ef94f4e66127726228aac2778
BLAKE2b-256 d1822674d50a8f0d1b4a8cb8cf2621c668237e944336252d58f658a1badf3046

See more details on using hashes here.

File details

Details for the file agent_observe-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: agent_observe-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9

File hashes

Hashes for agent_observe-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d002b6f51e8d033bb4dbcf03702c3bff737e2cb99b5d83e2fc608f5f1f5e5038
MD5 02ff6ea96da88256859632a2a7dbf451
BLAKE2b-256 d3ac28f586deae0e407e446f8e5467b2eabb7685dc22261ed0c6bf3a31b094b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page