Framework-agnostic observability, audit, and eval for AI agent applications
Project description
agent-observe
Framework-agnostic observability, audit, and eval for AI agent applications.
What is this?
agent-observe is a lightweight runtime layer that wraps your AI agent code to capture:
- What tools were called and when
- What LLM calls were made and how long they took
- Policy violations (blocked operations)
- Risk scores based on behavioral signals
Designed to be enterprise-safe by default - stores only metadata (hashes, sizes, timings), not raw content.
Installation
pip install agent-observe
# With PostgreSQL support
pip install agent-observe[postgres]
# With viewer UI
pip install agent-observe[viewer]
Quick Start
from agent_observe import observe, tool, model_call
# Initialize (zero-config)
observe.install()
# Wrap your tools
@tool(name="search", kind="http")
def search_web(query: str) -> list:
return requests.get(f"https://api.search.com?q={query}").json()
# Wrap your LLM calls
@model_call(provider="openai", model="gpt-4")
def call_llm(prompt: str) -> str:
return openai.chat.completions.create(...).choices[0].message.content
# Run your agent
with observe.run("my-agent", task={"goal": "Research AI"}):
results = search_web("AI agents")
analysis = call_llm(f"Analyze: {results}")
View results:
agent-observe view
# Open http://localhost:8765
Documentation
| Document | Description |
|---|---|
| Examples | Runnable code examples (basic usage, async, policies) |
| Data Model | What are Runs, Spans, Events, and Replay Cache? |
| Capture Modes | What data is stored? Hashes vs full content |
| Configuration | Environment variables and Config options |
| Usage Guide | Policies, risk scoring, querying, real-world examples |
| Integration Guide | How to integrate with OpenAI, Anthropic, LangChain, etc. |
Key Concepts
Runs, Spans, and Events
┌─────────────────────────────────────────────────────────────┐
│ observe.run() │
│ (Run) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ @tool │ │ @model_call │ │ emit_event │ │
│ │ (Span) │ │ (Span) │ │ (Event) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Run = One agent execution (start to finish)
- Span = One tool or model call within a run
- Event = Custom occurrence you emit
See Data Model for details.
Capture Modes
| Mode | What's Stored | Use Case |
|---|---|---|
metadata_only |
Hashes, timings | Production (default) |
evidence_only |
Small content + hashes | Debugging |
full |
Everything | Development |
Default is metadata_only - enterprise-safe, no PII leakage.
See Capture Modes for details.
Risk Scoring
Automatic risk scoring (0-100) based on:
| Signal | Weight |
|---|---|
| Policy violations | +40 |
| Tool success rate < 90% | +25 |
| Repeated tool calls (loops) | +15 |
| 5+ retries | +10 |
| Latency exceeds budget | +10 |
Configuration
Zero-Config (Recommended)
observe.install() # Reads from environment variables
Environment Variables
AGENT_OBSERVE_MODE=metadata_only # Capture mode
AGENT_OBSERVE_ENV=prod # Environment
DATABASE_URL=postgresql://... # Enables Postgres sink
See Configuration for all options.
Explicit Config
from agent_observe.config import Config, CaptureMode, SinkType
config = Config(
mode=CaptureMode.FULL,
sink_type=SinkType.POSTGRES,
database_url=os.environ.get("DATABASE_URL"),
)
observe.install(config=config)
Sinks (Storage Backends)
| Sink | Use Case |
|---|---|
| SQLite | Local development |
| PostgreSQL | Production |
| JSONL | Simple fallback |
| OTLP | OpenTelemetry export (Jaeger, Honeycomb, Datadog) |
Auto-selected based on available connections.
Policy Engine
Create .riff/observe.policy.yml:
tools:
allow:
- "db.*"
- "http.*"
deny:
- "shell.*"
- "*.destructive"
limits:
max_tool_calls: 100
max_model_calls: 50
CLI
# Start viewer
agent-observe view
# Export to JSONL
agent-observe export-jsonl -o ./export
Architecture
agent_observe/
├── observe.py # Core runtime
├── decorators.py # @tool, @model_call
├── policy.py # YAML policy engine
├── metrics.py # Risk scoring
├── replay.py # Tool result caching
├── sinks/ # Storage backends
└── viewer/ # FastAPI UI
Development
pip install -e ".[dev]"
pytest
ruff check .
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_observe-0.1.3.tar.gz.
File metadata
- Download URL: agent_observe-0.1.3.tar.gz
- Upload date:
- Size: 79.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7e3d5a9bd9b6b61747c02123256aa99e5c1712ac97439b8cfec9572aed95603
|
|
| MD5 |
ed0254966e36ee7d27a63339181a9582
|
|
| BLAKE2b-256 |
67865bdecee311d042ec99f8c2bbb59646ce8fe25f84b61be2447dadbb401e5a
|
File details
Details for the file agent_observe-0.1.3-py3-none-any.whl.
File metadata
- Download URL: agent_observe-0.1.3-py3-none-any.whl
- Upload date:
- Size: 57.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.3.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4c751c5a635b52b2e4bb6db62ed6d8eda09d61e5d5a4f7b0c61c190eab9a74d
|
|
| MD5 |
64be656e64daca30d9754e90f48bb9dd
|
|
| BLAKE2b-256 |
e7aef57b6017ffe388e0e90614ef48c7acfccf68c4f48f9da0c79a8bbb292f50
|