Zero-config observability for AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ashwanijha04

These details have not been verified by PyPI

Project description

peekr

Observability and evaluation for AI agents.

Website · Docs · PyPI · TypeScript SDK

Peekr captures every LLM call, tool call, and framework step in your agent — what was sent, what came back, how long it took, and what it cost. Two lines of code, no backend, no account.

import peekr
peekr.instrument()

That's it. Spans stream to traces.jsonl (or SQLite) and to your console. Inspect them with peekr view, find expensive calls with peekr cost, generate a self-contained dashboard with peekr dashboard, and score every output with built-in LLM-as-judge evaluators including RAGAS-style claim decomposition.

Install
Quick start
What you get
CLI
Evaluators
Dashboard
Multi-tenant traces
Storage
Supported clients
TypeScript SDK
Peekr Cloud
How it works
Contributing

Install

pip install peekr                   # base
pip install "peekr[openai]"         # with OpenAI
pip install "peekr[anthropic]"      # with Anthropic
pip install "peekr[bedrock]"        # with AWS Bedrock
pip install "peekr[gemini]"         # with Google Gemini
pip install "peekr[langchain]"      # with LangChain / LangGraph
pip install "peekr[llamaindex]"     # with LlamaIndex
pip install "peekr[crewai]"         # with CrewAI
pip install "peekr[otel]"           # with OpenTelemetry / OpenInference export
pip install "peekr[all]"            # everything

Quick start

1. Instrument once at startup. Patches OpenAI, Anthropic, Bedrock, and any installed agent framework.

import peekr
peekr.instrument()

2. Trace your tools so they appear in the same tree as LLM calls.

from peekr import trace

@trace
def search_web(query: str) -> list[str]:
    return fetch_results(query)

@trace                       # async works
async def fetch_user(user_id: int) -> dict:
    return await db.get(user_id)

3. View the trace.

peekr view traces.jsonl          # tree view
peekr view --io traces.jsonl     # include inputs and outputs
peekr cost traces.jsonl          # cost breakdown + top hotspots

Trace a3f2b1c0  1243ms  891tok
────────────────────────────────────────────────
agent.run  1243ms
   └─ tool.search_web  210ms
         in:  {"query": "climate policy"}
         out: ["result1", "result2", ...]
   └─ openai.chat.completions [gpt-4o]  1033ms  891tok
         in:  [{"role": "user", "content": "..."}]
         out: "Based on recent research..."

What you get

Capability	API
Auto-instrumentation	`peekr.instrument()` — patches OpenAI, Anthropic, Bedrock, LangChain, LlamaIndex, CrewAI
Tool tracing	`@peekr.trace` on any sync or async function
Sessions	`with peekr.session(user_id="alice", tenant_id="acme"): ...`
Multi-tenant schema	`tenant_id` and `retention_class` first-class on every span
Alerts + Slack/webhook sinks	`ErrorRate(0.05).with_sinks(SlackSink(url), WebhookSink(url))`
LLM-as-judge eval	`instrument(evaluators=[peekr.eval.Rubric("Be concise")])`
Hallucination detection	`instrument(evaluators=[peekr.eval.Hallucination()])`
Claim-level (RAGAS) hallucination	`Hallucination(detailed=True)` — per-claim verdicts
Drift dashboard	`peekr dashboard traces.db -o report.html`
Feedback + fine-tuning export	`peekr.feedback(trace_id, rating="good")`
A/B experiments	`@peekr.experiment(variants=["control", "test"])`
Trace replay	`peekr replay <trace_id>`
TypeScript SDK	`npm install @peekr/sdk` — same wire format
OpenTelemetry export	`add_exporter(peekr.OTelExporter())` — OpenInference-shaped spans into any OTel pipeline
Sampling	`instrument(sample_rate=0.1)` — whole-trace decision; errored spans always kept

Failure modes peekr catches that timing alone won't

A profiler tells you a function was slow. Peekr also tells you it returned the wrong shape and the LLM had no idea.

agent.run  2100ms
   └─ tool.fetch_user  12ms     out: null         ← tool returned null
   └─ openai.chat       2088ms  in: "User profile: null..."   ← LLM got garbage

Slow steps are obvious in the tree, with the cost broken out:

agent.run  4300ms
   └─ tool.search_web   3800ms  ← 88% of latency. Cache, don't swap models.
   └─ openai.chat        490ms

Token growth across runs surfaces unbounded conversation history:

Trace 1:  18,432 tokens
Trace 2:  21,104 tokens
Trace 3:  24,891 tokens   ← summarise after N turns

And prod-vs-local divergence is a tool I/O diff, not guesswork:

local:  out: [{"id": 1, "qty": 42}]
prod:   out: []   ← upstream pipeline bug, not agent logic

CLI

`peekr view`

Tree view of every trace, optionally with inputs and outputs.

peekr view traces.jsonl
peekr view --io traces.jsonl
peekr view traces.db          # SQLite works the same way

`peekr cost`

Where money and time went, with a top-10 hotspots list ranked by composite cost-and-latency score.

peekr cost traces.jsonl

────────────────────────────────────────────────────────────
  peekr cost  ·  traces.jsonl
────────────────────────────────────────────────────────────
  Total spans        : 8,022
  LLM calls          : 85
  Errors             : 0
  Total input tokens : 130,807
  Total output tokens: 10,274
  Total LLM time     : 161.9s
  Total cost (est.)  : $0.14574
────────────────────────────────────────────────────────────

  Top 10 hottest calls  (60% cost · 40% latency):
  #   Operation                In      Out      Cost      ms  Model
  1   anthropic.messages    5,066     264 $ 0.00511   2965ms  claude-haiku-4-5
  2   anthropic.messages    3,924     376 $ 0.00464   3458ms  claude-haiku-4-5
  ...

`peekr dashboard`

Self-contained HTML report — see Dashboard.

`peekr replay`

Re-run a stored trace through the live SDK, with the same inputs.

peekr replay a3f2b1c0

Evaluators

Score every LLM output for groundedness, conciseness, or any custom rubric. Scores land on the span as attributes.eval_scores.

import peekr

peekr.instrument(evaluators=[
    peekr.eval.Hallucination(),                  # 0.0 = hallucinated, 1.0 = grounded
    peekr.eval.Rubric("Answer is concise and direct"),
    peekr.eval.NotEmpty(),
    peekr.eval.NoError(),
])

openai.chat [gpt-4o]  843ms  312tok
   in:  "When was the Eiffel Tower built?"
   out: "The Eiffel Tower was built in 1923 by Frank Lloyd Wright."
   eval_scores: {Hallucination: 0.0, Rubric: 0.9, NotEmpty: 1.0}

For RAG flows, point Hallucination at the retrieved document instead of the prompt:

peekr.eval.Hallucination(
    context_extractor=lambda span: span.attributes.get("retrieved_docs", "")
)

Claim-level (RAGAS-style) detection

For why a response was scored low — not just what the score was — set detailed=True. The judge decomposes the output into atomic claims and assigns each one a verdict (supported / contradicted / unsupported), the same pipeline RAGAS Faithfulness uses.

peekr.instrument(evaluators=[peekr.eval.Hallucination(detailed=True)])

// span.attributes.hallucination_details
{
  "total": 3, "supported": 1, "contradicted": 2, "unsupported": 0, "score": 0.33,
  "claims": [
    {"text": "The Eiffel Tower is in Paris",         "verdict": "supported"},
    {"text": "It was built in 1923",                 "verdict": "contradicted"},
    {"text": "It was designed by Frank Lloyd Wright", "verdict": "contradicted"}
  ]
}

Use simple mode for cheap monitoring across many traces; detailed mode for the cases worth investigating. Cost is roughly one judge call per scored span.

Query the lowest-scoring traces from SQLite to find regressions:

SELECT trace_id,
       json_extract(attributes, '$.eval_scores.Hallucination') AS score,
       json_extract(attributes, '$.output')                    AS output
FROM spans
WHERE score IS NOT NULL AND score < 0.5
ORDER BY start_time DESC;

Dashboard

Generate a self-contained HTML observability report. No server, no build step — open the file in a browser, or attach it to a Slack message.

peekr dashboard traces.db -o report.html   # SQLite
peekr dashboard traces.jsonl               # writes ./dashboard.html

Five tabs (1–5 to switch, / to search, R to clear filters, Esc to close panels):

Tab	Purpose
Overview	Health hero (0–100), narrative summary of what's happening, top 3 action items
Traces	Search and filter every trace; click any row for full I/O, claim verdicts, citations
Quality	Rolling chart with thresholds, score distribution, channel × time heatmap
Diagnose	AI-generated likely causes, severity-tagged action lists, worst-offender cards with side-by-side context vs answer
Help	Setup checklist, glossary, evaluator snippets, troubleshooting

A persistent filter bar (tenant · model · endpoint · time range) refilters every panel across every tab in one click. Tab and filter state live in the URL hash so links are shareable.

To populate the channel breakdown, peekr reads attributes.model automatically and tenant_id from the span schema. Attach an endpoint yourself in your request handler:

from peekr import trace, get_current_span

@trace
def handle_request(req):
    get_current_span().attributes["endpoint"] = req.path
    return call_llm(...)

Full screenshots and tab-by-tab walkthrough → docs.

Multi-tenant traces

Every span carries two first-class fields — tenant_id (the customer org) and retention_class (a storage-tier hint). They're separate from user_id (the end-user) so a B2B agent can tag both without conflict.

import peekr
peekr.instrument(tenant_id="acme", retention_class="default")

with peekr.session(user_id="alice", tenant_id="acme",
                   retention_class="long"):
    run_agent()

Resolution order, highest priority first:

peekr.session(tenant_id=..., retention_class=...)
peekr.instrument(tenant_id=..., retention_class=...)
Env vars PEEKR_TENANT_ID / PEEKR_RETENTION_CLASS

Both fields are top-level columns in SQLite (indexed) and top-level keys in JSONL — query without json_extract:

SELECT tenant_id, COUNT(*) FROM spans GROUP BY tenant_id;
SELECT * FROM spans WHERE retention_class = 'long' AND start_time > ?;

retention_class is a free-form string in the OSS SDK. Recommended values are default, short, long, and pii; the meaning of each is enforced by your storage tier (or by Peekr Cloud when you're ready).

Storage

peekr.instrument()                    # JSONL — default, grep-able
peekr.instrument(storage="sqlite")    # SQLite — queryable, multi-process safe
peekr.instrument(storage="both")      # both

SQLite uses WAL mode so multiple processes (Docker, CI, parallel agents) can write at the same time. Query across runs:

# slowest tool calls
sqlite3 traces.db "
  SELECT name, ROUND(AVG(duration_ms)) avg_ms
  FROM spans GROUP BY name ORDER BY avg_ms DESC;"

# token spend by model
sqlite3 traces.db "
  SELECT json_extract(attributes,'\$.model')        AS model,
         SUM(json_extract(attributes,'\$.tokens_total')) AS tokens
  FROM spans GROUP BY model;"

# all errors
sqlite3 traces.db "
  SELECT name, trace_id, json_extract(attributes,'\$.error') AS msg
  FROM spans WHERE status = 'error';"

Alert routing — Slack, webhooks, PagerDuty

By default, alert messages go to stderr. Attach one or more sinks to route them anywhere:

import peekr
from peekr.alert import ErrorRate, CostSpike, LatencyP95, SlackSink, WebhookSink

peekr.instrument(alerts=[
    ErrorRate(threshold=0.05).with_sinks(
        SlackSink("https://hooks.slack.com/services/T0/B0/abc"),
    ),
    CostSpike(multiplier=3.0).with_sinks(
        WebhookSink(
            "https://events.pagerduty.com/v2/enqueue",
            payload_builder=lambda name, msg: {
                "routing_key": "your-key",
                "event_action": "trigger",
                "payload": {"summary": msg, "source": "peekr", "severity": "warning"},
            },
        ),
    ),
])

Sinks are best-effort — network failures, timeouts, and exceptions inside notify() are swallowed silently so a flaky webhook never breaks the application's tracing path. Use WebhookSink(payload_builder=...) to fit any incident system (PagerDuty Events v2, Opsgenie, OpsLevel, custom routers).

Sampling

High-traffic agents produce a lot of spans. sample_rate drops a fraction of traces from storage while keeping evaluators and alerts running on the full stream — so your error rate, hallucination score, and cost figures stay accurate.

peekr.instrument(
    sample_rate=0.1,        # keep 10% of traces; default 1.0
    keep_errors=True,       # errored spans always persisted (default)
)

The decision is made once per trace at root-span creation and inherited by every child, so a trace is never partially captured — you don't get orphan openai.chat.completions spans without their parent.

OpenTelemetry export

Ship peekr spans into any OTel-compatible backend (Datadog, Honeycomb, Grafana Tempo, Arize Phoenix, Langfuse-OTel, etc.) by translating attributes into the OpenInference semantic conventions the LLM observability ecosystem uses.

pip install "peekr[otel]"

import peekr
from peekr.exporters import add_exporter

peekr.instrument()
add_exporter(peekr.OTelExporter())                    # uses your app's existing OTel setup
add_exporter(peekr.OTelExporter(endpoint="https://api.honeycomb.io",
                                headers={"x-honeycomb-team": "..."}))   # or configure inline

No agent, no collector, no separate process. Peekr writes OpenInference-shaped spans in-process, and any OTel pipeline you already operate consumes them.

Custom exporters

Ship spans to any backend by implementing one method:

from peekr.exporters import add_exporter

class MyExporter:
    def export(self, span):
        requests.post("https://my-backend.com/spans", json=span.to_dict())

peekr.instrument()
add_exporter(MyExporter())

`@trace` options

@trace                        # auto-names from module.function, captures I/O
@trace(name="tool.search")    # custom span name
@trace(capture_io=False)      # skip args/output (e.g. secrets)

Supported clients

LLM SDKs

Provider	SDK	Install
OpenAI	`openai`	`pip install "peekr[openai]"`
Anthropic	`anthropic`	`pip install "peekr[anthropic]"`
AWS Bedrock	`boto3`	`pip install "peekr[bedrock]"`
Google Gemini	`google-genai` (or legacy `google-generativeai`)	`pip install "peekr[gemini]"`

Agent frameworks

Framework	Package	Install
LangChain / LangGraph	`langchain-core`	`pip install "peekr[langchain]"`
LlamaIndex	`llama-index-core`	`pip install "peekr[llamaindex]"`
CrewAI	`crewai`	`pip install "peekr[crewai]"`

peekr.instrument() detects whichever SDKs and frameworks are installed and patches them. Streaming is supported across all LLM SDKs. Frameworks emit chain / tool / retriever / agent / LLM spans nested in the order they actually executed:

crewai.crew.kickoff                       3.4s
  └─ crewai.task.execute                  3.4s   task=plan_trip
       └─ crewai.agent.execute_task       3.4s   agent=planner
            └─ openai.chat.completions    1.2s   gpt-4o  · 891tok
            └─ langchain.tool.search_web  2.1s

TypeScript SDK

npm install @peekr/sdk

import { instrument, wrap, trace, withSession } from "@peekr/sdk";
import OpenAI from "openai";

instrument({ jsonlPath: "./traces.jsonl" });
const openai = wrap(new OpenAI());

await withSession(
  { user_id: "alice", tenant_id: "acme" },
  async () => {
    await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: "Summarise the docs above" }],
    });
  },
);

The TypeScript SDK writes the same JSONL schema as Python, so a Node app's traces work with peekr view, peekr cost, and peekr dashboard unchanged. Full reference → peekr-ts/README.md.

Peekr Cloud

The OSS SDK runs in your process, writes to local files, and is MIT licensed forever — that's not changing. When a single-process file isn't the right fit any more (multiple services, a team that needs shared dashboards, longer retention, audit-grade trace storage), Peekr Cloud is the managed backend.

Sign up at peekr.cloud.ashwanijha.dev — free up to 10k spans/month, no card required.

Once you have a pk_live_ key from the project settings page:

import peekr

peekr.instrument(
    tenant_id="acme",
    exporter=peekr.HTTPExporter(
        endpoint="https://peekr.cloud.ashwanijha.dev",
        api_key="pk_live_…",
    ),
)

HTTPExporter is fully implemented as of v0.5 — batched, retried, flushed at interpreter exit. The spans you already instrument locally ship to the Cloud dashboard unchanged; tenant_id and retention_class are first-class columns.

Tier	Spans / month	Price
Free	10k	$0
Starter	500k	$29/mo
Pro	5M	$99/mo
Scale	50M	$399/mo

How it works

instrument() monkey-patches the OpenAI, Anthropic, and Bedrock SDK methods before your code runs. Python resolves function references at call time, so every subsequent call hits the wrapper without any change to your code.

Parent / child span relationships are tracked through contextvars.ContextVar, which propagates correctly across async / await without manual threading. The TypeScript SDK uses Node's AsyncLocalStorage for the same reason.

Contributing

git clone https://github.com/ashwanijha04/peekr
cd peekr
pip install -e ".[dev]"
pytest

Open an issue before large changes. PRs welcome.

Website · Docs · PyPI · TypeScript SDK · MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ashwanijha04

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.6

May 31, 2026

0.5.5

May 31, 2026

0.5.4

May 31, 2026

0.5.3

May 31, 2026

0.5.2

May 31, 2026

This version

0.5.1

May 31, 2026

0.5.0

May 31, 2026

0.4.0

May 17, 2026

0.3.2

May 15, 2026

0.3.1

May 15, 2026

0.3.0

May 14, 2026

0.2.0

May 11, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peekr-0.5.1.tar.gz (132.8 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

peekr-0.5.1-py3-none-any.whl (93.5 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file peekr-0.5.1.tar.gz.

File metadata

Download URL: peekr-0.5.1.tar.gz
Upload date: May 31, 2026
Size: 132.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peekr-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`3c5ba6ef2df916a555b3ade6ad66b89a4de7bd0bd93db2e0eafb964671408ac3`
MD5	`b1d6b4a8ee10d0d3779bb5c1f7306c03`
BLAKE2b-256	`6eb41e39fc9297a8ac662af414704d30d407b1778e24512fb9f465fc380f29dc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for peekr-0.5.1.tar.gz:

Publisher: publish.yml on ashwanijha04/peekr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: peekr-0.5.1.tar.gz
- Subject digest: 3c5ba6ef2df916a555b3ade6ad66b89a4de7bd0bd93db2e0eafb964671408ac3
- Sigstore transparency entry: 1682688775
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: ashwanijha04/peekr@c06ea4999727d54cad35982b434a413578346979
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/ashwanijha04
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c06ea4999727d54cad35982b434a413578346979
- Trigger Event: push

File details

Details for the file peekr-0.5.1-py3-none-any.whl.

File metadata

Download URL: peekr-0.5.1-py3-none-any.whl
Upload date: May 31, 2026
Size: 93.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peekr-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d342a85e0e26ffb1341254ef399be2355ec43041b036e0e04f69f73a88a1a09b`
MD5	`aeec3f3a17e7ff3c203ae51cd5b71e3b`
BLAKE2b-256	`49927191c4a8cab2380aa1ce920fe6083f3d326f38e18261b17b26828e26335c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for peekr-0.5.1-py3-none-any.whl:

Publisher: publish.yml on ashwanijha04/peekr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: peekr-0.5.1-py3-none-any.whl
- Subject digest: d342a85e0e26ffb1341254ef399be2355ec43041b036e0e04f69f73a88a1a09b
- Sigstore transparency entry: 1682688962
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: ashwanijha04/peekr@c06ea4999727d54cad35982b434a413578346979
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/ashwanijha04
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c06ea4999727d54cad35982b434a413578346979
- Trigger Event: push

peekr 0.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

peekr

Contents

Install

Quick start

What you get

Failure modes peekr catches that timing alone won't

CLI

peekr view

peekr cost

peekr dashboard

peekr replay

Evaluators

Claim-level (RAGAS-style) detection

Dashboard

Multi-tenant traces

Storage

Alert routing — Slack, webhooks, PagerDuty

Sampling

OpenTelemetry export

Custom exporters

@trace options

Supported clients

TypeScript SDK

Peekr Cloud

How it works

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`peekr view`

`peekr cost`

`peekr dashboard`

`peekr replay`

`@trace` options