Zero-config observability for AI agents
Project description
Peekr captures every LLM call, tool call, and framework step in your agent — what was sent, what came back, how long it took, and what it cost. Two lines of code, no backend, no account.
import peekr
peekr.instrument()
That's it. Spans stream to traces.jsonl (or SQLite) and to your console. Inspect them with peekr view, find expensive calls with peekr cost, generate a self-contained dashboard with peekr dashboard, and score every output with built-in LLM-as-judge evaluators including RAGAS-style claim decomposition.
Contents
- Install
- Quick start
- What you get
- CLI
- Evaluators
- Dashboard
- Multi-tenant traces
- Storage
- Supported clients
- TypeScript SDK
- Peekr Cloud
- How it works
- Contributing
Install
pip install peekr # base
pip install "peekr[openai]" # with OpenAI
pip install "peekr[anthropic]" # with Anthropic
pip install "peekr[bedrock]" # with AWS Bedrock
pip install "peekr[gemini]" # with Google Gemini
pip install "peekr[langchain]" # with LangChain / LangGraph
pip install "peekr[llamaindex]" # with LlamaIndex
pip install "peekr[crewai]" # with CrewAI
pip install "peekr[otel]" # with OpenTelemetry / OpenInference export
pip install "peekr[all]" # everything
Quick start
1. Instrument once at startup. Patches OpenAI, Anthropic, Bedrock, and any installed agent framework.
import peekr
peekr.instrument()
2. Trace your tools so they appear in the same tree as LLM calls.
from peekr import trace
@trace
def search_web(query: str) -> list[str]:
return fetch_results(query)
@trace # async works
async def fetch_user(user_id: int) -> dict:
return await db.get(user_id)
3. View the trace.
peekr view traces.jsonl # tree view
peekr view --io traces.jsonl # include inputs and outputs
peekr cost traces.jsonl # cost breakdown + top hotspots
Trace a3f2b1c0 1243ms 891tok
────────────────────────────────────────────────
agent.run 1243ms
└─ tool.search_web 210ms
in: {"query": "climate policy"}
out: ["result1", "result2", ...]
└─ openai.chat.completions [gpt-4o] 1033ms 891tok
in: [{"role": "user", "content": "..."}]
out: "Based on recent research..."
What you get
| Capability | API |
|---|---|
| Auto-instrumentation | peekr.instrument() — patches OpenAI, Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI |
| Tool tracing | @peekr.trace on any sync or async function |
| Sessions | with peekr.session(user_id="alice", tenant_id="acme"): ... |
| Multi-tenant schema | tenant_id and retention_class first-class on every span |
| Alerts + Slack/webhook sinks | ErrorRate(0.05).with_sinks(SlackSink(url), WebhookSink(url)) |
| LLM-as-judge eval | instrument(evaluators=[peekr.eval.Rubric("Be concise")]) |
| Hallucination detection | instrument(evaluators=[peekr.eval.Hallucination()]) |
| Claim-level (RAGAS) hallucination | Hallucination(detailed=True) — per-claim verdicts |
| Drift dashboard | peekr dashboard traces.db -o report.html |
| Feedback + fine-tuning export | peekr.feedback(trace_id, rating="good") |
| A/B experiments | @peekr.experiment(variants=["control", "test"]) |
| Trace replay | peekr replay <trace_id> |
| TypeScript SDK | npm install @peekr/sdk — same wire format |
| OpenTelemetry export | add_exporter(peekr.OTelExporter()) — OpenInference-shaped spans into any OTel pipeline |
| Sampling | instrument(sample_rate=0.1) — whole-trace decision; errored spans always kept |
Failure modes peekr catches that timing alone won't
A profiler tells you a function was slow. Peekr also tells you it returned the wrong shape and the LLM had no idea.
agent.run 2100ms
└─ tool.fetch_user 12ms out: null ← tool returned null
└─ openai.chat 2088ms in: "User profile: null..." ← LLM got garbage
Slow steps are obvious in the tree, with the cost broken out:
agent.run 4300ms
└─ tool.search_web 3800ms ← 88% of latency. Cache, don't swap models.
└─ openai.chat 490ms
Token growth across runs surfaces unbounded conversation history:
Trace 1: 18,432 tokens
Trace 2: 21,104 tokens
Trace 3: 24,891 tokens ← summarise after N turns
And prod-vs-local divergence is a tool I/O diff, not guesswork:
local: out: [{"id": 1, "qty": 42}]
prod: out: [] ← upstream pipeline bug, not agent logic
CLI
peekr view
Tree view of every trace, optionally with inputs and outputs.
peekr view traces.jsonl
peekr view --io traces.jsonl
peekr view traces.db # SQLite works the same way
peekr cost
Where money and time went, with a top-10 hotspots list ranked by composite cost-and-latency score.
peekr cost traces.jsonl
────────────────────────────────────────────────────────────
peekr cost · traces.jsonl
────────────────────────────────────────────────────────────
Total spans : 8,022
LLM calls : 85
Errors : 0
Total input tokens : 130,807
Total output tokens: 10,274
Total LLM time : 161.9s
Total cost (est.) : $0.14574
────────────────────────────────────────────────────────────
Top 10 hottest calls (60% cost · 40% latency):
# Operation In Out Cost ms Model
1 anthropic.messages 5,066 264 $ 0.00511 2965ms claude-haiku-4-5
2 anthropic.messages 3,924 376 $ 0.00464 3458ms claude-haiku-4-5
...
peekr dashboard
Self-contained HTML report — see Dashboard.
peekr replay
Re-run a stored trace through the live SDK, with the same inputs.
peekr replay a3f2b1c0
Evaluators
Score every LLM output for groundedness, conciseness, or any custom rubric. Scores land on the span as attributes.eval_scores.
import peekr
peekr.instrument(evaluators=[
peekr.eval.Hallucination(), # 0.0 = hallucinated, 1.0 = grounded
peekr.eval.Rubric("Answer is concise and direct"),
peekr.eval.NotEmpty(),
peekr.eval.NoError(),
])
openai.chat [gpt-4o] 843ms 312tok
in: "When was the Eiffel Tower built?"
out: "The Eiffel Tower was built in 1923 by Frank Lloyd Wright."
eval_scores: {Hallucination: 0.0, Rubric: 0.9, NotEmpty: 1.0}
For RAG flows, point Hallucination at the retrieved document instead of the prompt:
peekr.eval.Hallucination(
context_extractor=lambda span: span.attributes.get("retrieved_docs", "")
)
Claim-level (RAGAS-style) detection
For why a response was scored low — not just what the score was — set detailed=True. The judge decomposes the output into atomic claims and assigns each one a verdict (supported / contradicted / unsupported), the same pipeline RAGAS Faithfulness uses.
peekr.instrument(evaluators=[peekr.eval.Hallucination(detailed=True)])
// span.attributes.hallucination_details
{
"total": 3, "supported": 1, "contradicted": 2, "unsupported": 0, "score": 0.33,
"claims": [
{"text": "The Eiffel Tower is in Paris", "verdict": "supported"},
{"text": "It was built in 1923", "verdict": "contradicted"},
{"text": "It was designed by Frank Lloyd Wright", "verdict": "contradicted"}
]
}
Use simple mode for cheap monitoring across many traces; detailed mode for the cases worth investigating. Cost is roughly one judge call per scored span.
Query the lowest-scoring traces from SQLite to find regressions:
SELECT trace_id,
json_extract(attributes, '$.eval_scores.Hallucination') AS score,
json_extract(attributes, '$.output') AS output
FROM spans
WHERE score IS NOT NULL AND score < 0.5
ORDER BY start_time DESC;
Dashboard
Generate a self-contained HTML observability report. No server, no build step — open the file in a browser, or attach it to a Slack message.
peekr dashboard traces.db -o report.html # SQLite
peekr dashboard traces.jsonl # writes ./dashboard.html
Five tabs (1–5 to switch, / to search, R to clear filters, Esc to close panels):
| Tab | Purpose |
|---|---|
| Overview | Health hero (0–100), narrative summary of what's happening, top 3 action items |
| Traces | Search and filter every trace; click any row for full I/O, claim verdicts, citations |
| Quality | Rolling chart with thresholds, score distribution, channel × time heatmap |
| Diagnose | AI-generated likely causes, severity-tagged action lists, worst-offender cards with side-by-side context vs answer |
| Help | Setup checklist, glossary, evaluator snippets, troubleshooting |
A persistent filter bar (tenant · model · endpoint · time range) refilters every panel across every tab in one click. Tab and filter state live in the URL hash so links are shareable.
To populate the channel breakdown, peekr reads attributes.model automatically and tenant_id from the span schema. Attach an endpoint yourself in your request handler:
from peekr import trace, get_current_span
@trace
def handle_request(req):
get_current_span().attributes["endpoint"] = req.path
return call_llm(...)
Full screenshots and tab-by-tab walkthrough → docs.
Multi-tenant traces
Every span carries two first-class fields — tenant_id (the customer org) and retention_class (a storage-tier hint). They're separate from user_id (the end-user) so a B2B agent can tag both without conflict.
import peekr
peekr.instrument(tenant_id="acme", retention_class="default")
with peekr.session(user_id="alice", tenant_id="acme",
retention_class="long"):
run_agent()
Resolution order, highest priority first:
peekr.session(tenant_id=..., retention_class=...)peekr.instrument(tenant_id=..., retention_class=...)- Env vars
PEEKR_TENANT_ID/PEEKR_RETENTION_CLASS
Both fields are top-level columns in SQLite (indexed) and top-level keys in JSONL — query without json_extract:
SELECT tenant_id, COUNT(*) FROM spans GROUP BY tenant_id;
SELECT * FROM spans WHERE retention_class = 'long' AND start_time > ?;
retention_class is a free-form string in the OSS SDK. Recommended values are default, short, long, and pii; the meaning of each is enforced by your storage tier (or by Peekr Cloud when you're ready).
Storage
peekr.instrument() # JSONL — default, grep-able
peekr.instrument(storage="sqlite") # SQLite — queryable, multi-process safe
peekr.instrument(storage="both") # both
SQLite uses WAL mode so multiple processes (Docker, CI, parallel agents) can write at the same time. Query across runs:
# slowest tool calls
sqlite3 traces.db "
SELECT name, ROUND(AVG(duration_ms)) avg_ms
FROM spans GROUP BY name ORDER BY avg_ms DESC;"
# token spend by model
sqlite3 traces.db "
SELECT json_extract(attributes,'\$.model') AS model,
SUM(json_extract(attributes,'\$.tokens_total')) AS tokens
FROM spans GROUP BY model;"
# all errors
sqlite3 traces.db "
SELECT name, trace_id, json_extract(attributes,'\$.error') AS msg
FROM spans WHERE status = 'error';"
Alert routing — Slack, webhooks, PagerDuty
By default, alert messages go to stderr. Attach one or more sinks to route them anywhere:
import peekr
from peekr.alert import ErrorRate, CostSpike, LatencyP95, SlackSink, WebhookSink
peekr.instrument(alerts=[
ErrorRate(threshold=0.05).with_sinks(
SlackSink("https://hooks.slack.com/services/T0/B0/abc"),
),
CostSpike(multiplier=3.0).with_sinks(
WebhookSink(
"https://events.pagerduty.com/v2/enqueue",
payload_builder=lambda name, msg: {
"routing_key": "your-key",
"event_action": "trigger",
"payload": {"summary": msg, "source": "peekr", "severity": "warning"},
},
),
),
])
Sinks are best-effort — network failures, timeouts, and exceptions inside notify() are swallowed silently so a flaky webhook never breaks the application's tracing path. Use WebhookSink(payload_builder=...) to fit any incident system (PagerDuty Events v2, Opsgenie, OpsLevel, custom routers).
Sampling
High-traffic agents produce a lot of spans. sample_rate drops a fraction of traces from storage while keeping evaluators and alerts running on the full stream — so your error rate, hallucination score, and cost figures stay accurate.
peekr.instrument(
sample_rate=0.1, # keep 10% of traces; default 1.0
keep_errors=True, # errored spans always persisted (default)
)
The decision is made once per trace at root-span creation and inherited by every child, so a trace is never partially captured — you don't get orphan openai.chat.completions spans without their parent.
OpenTelemetry export
Ship peekr spans into any OTel-compatible backend (Datadog, Honeycomb, Grafana Tempo, Arize Phoenix, Langfuse-OTel, etc.) by translating attributes into the OpenInference semantic conventions the LLM observability ecosystem uses.
pip install "peekr[otel]"
import peekr
from peekr.exporters import add_exporter
peekr.instrument()
add_exporter(peekr.OTelExporter()) # uses your app's existing OTel setup
add_exporter(peekr.OTelExporter(endpoint="https://api.honeycomb.io",
headers={"x-honeycomb-team": "..."})) # or configure inline
No agent, no collector, no separate process. Peekr writes OpenInference-shaped spans in-process, and any OTel pipeline you already operate consumes them.
Custom exporters
Ship spans to any backend by implementing one method:
from peekr.exporters import add_exporter
class MyExporter:
def export(self, span):
requests.post("https://my-backend.com/spans", json=span.to_dict())
peekr.instrument()
add_exporter(MyExporter())
@trace options
@trace # auto-names from module.function, captures I/O
@trace(name="tool.search") # custom span name
@trace(capture_io=False) # skip args/output (e.g. secrets)
Supported clients
LLM SDKs
| Provider | SDK | Install |
|---|---|---|
| OpenAI | openai |
pip install "peekr[openai]" |
| Anthropic | anthropic |
pip install "peekr[anthropic]" |
| AWS Bedrock | boto3 |
pip install "peekr[bedrock]" |
| Google Gemini | google-genai (or legacy google-generativeai) |
pip install "peekr[gemini]" |
Agent frameworks
| Framework | Package | Install |
|---|---|---|
| LangChain / LangGraph | langchain-core |
pip install "peekr[langchain]" |
| LlamaIndex | llama-index-core |
pip install "peekr[llamaindex]" |
| CrewAI | crewai |
pip install "peekr[crewai]" |
peekr.instrument() detects whichever SDKs and frameworks are installed and patches them. Streaming is supported across all LLM SDKs. Frameworks emit chain / tool / retriever / agent / LLM spans nested in the order they actually executed:
crewai.crew.kickoff 3.4s
└─ crewai.task.execute 3.4s task=plan_trip
└─ crewai.agent.execute_task 3.4s agent=planner
└─ openai.chat.completions 1.2s gpt-4o · 891tok
└─ langchain.tool.search_web 2.1s
TypeScript SDK
npm install @peekr/sdk
import { instrument, wrap, trace, withSession } from "@peekr/sdk";
import OpenAI from "openai";
instrument({ jsonlPath: "./traces.jsonl" });
const openai = wrap(new OpenAI());
await withSession(
{ user_id: "alice", tenant_id: "acme" },
async () => {
await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Summarise the docs above" }],
});
},
);
The TypeScript SDK writes the same JSONL schema as Python, so a Node app's traces work with peekr view, peekr cost, and peekr dashboard unchanged. Full reference → peekr-ts/README.md.
Peekr Cloud
The OSS SDK runs in your process, writes to local files, and is MIT licensed forever — that's not changing. When a single-process file isn't the right fit any more (multiple services, a team that needs shared dashboards, longer retention, audit-grade trace storage), Peekr Cloud is the managed backend.
Sign up at peekr.cloud.ashwanijha.dev — free up to 10k spans/month, no card required.
Once you have a pk_live_ key from the project settings page:
import peekr
peekr.instrument(
tenant_id="acme",
exporter=peekr.HTTPExporter(
endpoint="https://peekr.cloud.ashwanijha.dev",
api_key="pk_live_…",
),
)
HTTPExporter is fully implemented as of v0.5 — batched, retried, flushed at interpreter exit. The spans you already instrument locally ship to the Cloud dashboard unchanged; tenant_id and retention_class are first-class columns.
| Tier | Spans / month | Price |
|---|---|---|
| Free | 10k | $0 |
| Starter | 500k | $29/mo |
| Pro | 5M | $99/mo |
| Scale | 50M | $399/mo |
How it works
instrument() monkey-patches the OpenAI, Anthropic, and Bedrock SDK methods before your code runs. Python resolves function references at call time, so every subsequent call hits the wrapper without any change to your code.
Parent / child span relationships are tracked through contextvars.ContextVar, which propagates correctly across async / await without manual threading. The TypeScript SDK uses Node's AsyncLocalStorage for the same reason.
Contributing
git clone https://github.com/ashwanijha04/peekr
cd peekr
pip install -e ".[dev]"
pytest
Open an issue before large changes. PRs welcome.
Website · Docs · PyPI · TypeScript SDK · MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peekr-0.5.1.tar.gz.
File metadata
- Download URL: peekr-0.5.1.tar.gz
- Upload date:
- Size: 132.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c5ba6ef2df916a555b3ade6ad66b89a4de7bd0bd93db2e0eafb964671408ac3
|
|
| MD5 |
b1d6b4a8ee10d0d3779bb5c1f7306c03
|
|
| BLAKE2b-256 |
6eb41e39fc9297a8ac662af414704d30d407b1778e24512fb9f465fc380f29dc
|
Provenance
The following attestation bundles were made for peekr-0.5.1.tar.gz:
Publisher:
publish.yml on ashwanijha04/peekr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
peekr-0.5.1.tar.gz -
Subject digest:
3c5ba6ef2df916a555b3ade6ad66b89a4de7bd0bd93db2e0eafb964671408ac3 - Sigstore transparency entry: 1682688775
- Sigstore integration time:
-
Permalink:
ashwanijha04/peekr@c06ea4999727d54cad35982b434a413578346979 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/ashwanijha04
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c06ea4999727d54cad35982b434a413578346979 -
Trigger Event:
push
-
Statement type:
File details
Details for the file peekr-0.5.1-py3-none-any.whl.
File metadata
- Download URL: peekr-0.5.1-py3-none-any.whl
- Upload date:
- Size: 93.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d342a85e0e26ffb1341254ef399be2355ec43041b036e0e04f69f73a88a1a09b
|
|
| MD5 |
aeec3f3a17e7ff3c203ae51cd5b71e3b
|
|
| BLAKE2b-256 |
49927191c4a8cab2380aa1ce920fe6083f3d326f38e18261b17b26828e26335c
|
Provenance
The following attestation bundles were made for peekr-0.5.1-py3-none-any.whl:
Publisher:
publish.yml on ashwanijha04/peekr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
peekr-0.5.1-py3-none-any.whl -
Subject digest:
d342a85e0e26ffb1341254ef399be2355ec43041b036e0e04f69f73a88a1a09b - Sigstore transparency entry: 1682688962
- Sigstore integration time:
-
Permalink:
ashwanijha04/peekr@c06ea4999727d54cad35982b434a413578346979 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/ashwanijha04
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c06ea4999727d54cad35982b434a413578346979 -
Trigger Event:
push
-
Statement type: