Universal record-and-replay for LLM agents.
Project description
AgentLab
Universal record-and-replay for LLM agents.
Status: pre-alpha, APIs will change.
AgentLab captures model calls, tools, state transitions, and timing into a
trace you can replay without hitting the network. It is built around a
framework-agnostic core and an HTTP capture layer that works with any SDK
that routes requests through httpx.
Overhead
Per-LLM-call cost of running inside agentlab.record():
| metric | baseline | recorded | overhead |
|---|---|---|---|
| latency p50 | 13.5 ms | 14.7 ms | +1.16 ms |
| latency p99 | 14.4 ms | 15.9 ms | +1.52 ms |
Measured against an in-process loopback HTTP server with a 10 ms upstream delay (eliminates network jitter so the delta isolates SDK overhead: HTTP capture, span emit, JSONL write+fsync, matcher, LLMSpan build). Real LLM calls land in the 100 ms – 2000 ms range, so this works out to under 1% wall-clock overhead in practice.
Reproduce with:
uv run python scripts/bench_record_overhead.py --calls 200 --runs 5
Installation
pip install agentic-lab # minimal SDK
pip install 'agentic-lab[ui]' # + Starlette UI server
The PyPI distribution is agentic-lab; the importable Python
module is agentlab:
import agentlab as al
For local development, this repo is uv-managed:
git clone https://github.com/ambuj-krishna-agrawal/agent-lab.git
cd agent-lab
uv sync --all-extras --frozen
Use --frozen by default so your environment matches uv.lock and CI.
Documentation
- Quickstart — five minutes from install to a replayable trace.
- Provider coverage — every supported LLM provider + how to add custom ones.
- Error reference — every
AGL-…code with a remediation sentence (auto-generated fromsrc/agentlab/errors.py). - Changelog — version history.
AGENTS.md— invariants and quality gates contributors must respect.CONTRIBUTING.md— human-contributor process.
Configuration
- Secrets live in
.env(git-ignored). Copy.env.exampleand set the provider keys you use. - Non-secret defaults live in
src/agentlab/_defaults.tomland can be overridden byAGENTLAB_*environment variables. - Full typed config lives in
src/agentlab/config.py.
Quickstart
Five minutes from pip install to a trace you can replay without an
API key. The full runnable script lives at
example/quickstart.py; the inline version:
import os
import openai
import agentlab as al
client = openai.OpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1",
)
# 1. Record.
with (
al.record(agent_name="quickstart") as recording,
al.agent(name="quickstart", version="0"),
al.step(role=al.StepRole.EXECUTE),
):
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
max_tokens=16,
)
print("model said:", response.choices[0].message.content)
print("trace at: ", recording.directory)
# 2. Replay — no network, no key.
with al.replay(str(recording.directory)) as session:
replay = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
max_tokens=16,
)
print("replay said:", replay.choices[0].message.content)
print("cache hits: ", session.cache_hits)
pip install 'agentic-lab[ui]' openai
export OPENROUTER_API_KEY=sk-or-...
python example/quickstart.py
agentlab serve --root ~/.agentlab/traces
# → http://127.0.0.1:7861/
The with al.agent(...) and al.step(...) envelopes give the
auto-emitted LLMSpan a typed parent (the V4 schema forbids LLM
under bare RUN). Production agents normally establish these once
near their entrypoints and don't repeat them per-call — see
example/workflows/ for that shape.
Larger example agents
Three reference agents under example/ cover the
Anthropic building-effective-agents
shapes:
| Folder | Shape | What it does |
|---|---|---|
workflows/ |
Workflow (fixed code path) | Decompose → Wikipedia search → cite → LLM-as-judge → revise. |
autonomous/ |
Autonomous (model picks each step) | LangGraph observe-plan-act loop that triages support tickets. |
hybrid/ |
Workflow + autonomous sub-agent | Incident-response pipeline with autonomous investigation step. |
All three use OpenRouter via langchain-openai, real (or
realistic) tools, and produce traces directly into example_traces/
that agentlab serve can browse.
Provider coverage
Inside an agentlab.record() block AgentLab patches httpx transport
methods, so every SDK that routes through httpx (which is most
modern Python LLM SDKs) lands its raw exchange in http.jsonl. That
file is the source of truth for replay; the typed LLMSpan is a
best-effort view layered on top.
The built-in matchers turn recognised exchanges into typed LLMSpans
out of the box:
| Provider | Endpoint(s) | Stream? |
|---|---|---|
| OpenAI chat completions | api.openai.com/v1/chat/completions |
yes |
| OpenAI Responses | api.openai.com/v1/responses |
yes |
| OpenAI Embeddings | api.openai.com/v1/embeddings |
n/a |
| Azure OpenAI chat completions | *.openai.azure.com/openai/deployments/<dep>/chat/completions |
yes |
| Anthropic Messages | api.anthropic.com/v1/messages |
yes |
| AWS Bedrock — Invoke | bedrock-runtime.<region>.amazonaws.com/model/<id>/invoke[-with-response-stream] |
partial[^1] |
| AWS Bedrock — Converse | bedrock-runtime.<region>.amazonaws.com/model/<id>/converse[-stream] |
partial[^1] |
| Google Gemini | generativelanguage.googleapis.com/.../models/<m>:[stream]generateContent |
yes |
| Vertex AI — Gemini | <region>-aiplatform.googleapis.com/.../models/<m>:[stream]generateContent |
yes |
| Vertex AI — Anthropic (Claude) | <region>-aiplatform.googleapis.com/.../models/<m>:[stream]rawPredict |
yes |
| OpenRouter | openrouter.ai/api/v1/chat/completions |
yes |
| Together AI | api.together.{xyz,ai}/v1/chat/completions |
yes |
| Groq | api.groq.com/openai/v1/chat/completions |
yes |
| Mistral | api.mistral.ai/v1/chat/completions |
yes |
| Fireworks | api.fireworks.ai/inference/v1/chat/completions |
yes |
| DeepInfra | api.deepinfra.com/v1/openai/chat/completions |
yes |
| Perplexity | api.perplexity.ai/chat/completions |
yes |
[^1]: Bedrock streaming uses AWS event-stream binary framing.
Buffered responses populate every LLMSpan field; streamed responses
record the request side and a validation_errors entry explaining
why the response side is empty. The raw bytes are still preserved
in http.jsonl.
Adding a custom or self-hosted provider
OpenAI-compatible hosts (vLLM, Ollama, your private gateway) need one line:
import agentlab as al
from agentlab.llm.matchers.openai import HostPathMatcher
al.register_llm_provider(HostPathMatcher(
name="my-vllm",
host_suffix="llm.internal.example.com",
path_prefix="/v1/chat/completions",
))
For wholly different body shapes, subclass agentlab.llm.LLMProviderMatcher.
Pricing
The SDK is token-only by default — LLMSpan.cost.usd stays at
0.0 and the span is annotated with agentlab.llm.pricing.unknown=True.
Provider list-prices change too often to bake into the SDK. Operators
who want USD computed on every span install their own table:
from agentlab.llm.pricing import PriceRow, StaticPriceTable, set_price_table
set_price_table(StaticPriceTable(rows=(
PriceRow("openai", "gpt-4o", 2.50, 10.00),
PriceRow("anthropic", "claude-3-5-sonnet*", 3.00, 15.00),
)))
Strict mode for unrecognised exchanges
By default, exchanges that don't match any provider matcher log a
warning (one per (trace, host)) and the raw exchange remains in
http.jsonl. Power users can opt into stricter behaviour:
with al.record(strict_unknown_provider="raise"): # or "emit_op"
...
"raise" surfaces the gap as UnknownLLMProviderError; "emit_op"
records the call as a typed OpSpan so the trace tree is complete
even without a matcher.
UI and examples
Run the backend UI server against bundled traces:
uv run agentlab --root example_traces serve --port 7861
Optional frontend dev server with HMR:
cd frontend
npm install
npm run dev
The bundled runnable agents are seeded from example/ and are available from
the Agents page when the server starts successfully.
Production deployment
The OSS UI server can be hosted on a single EC2 box behind Caddy, with a
separate Next.js + Clerk marketing/auth site on Vercel that redirects
authenticated users to it. See deploy/README.md
for the end-to-end runbook.
UI walkthrough
Dashboard
Traces list
Trace detail
Agents
Settings
Development
Run the local quality gate:
bash scripts/check.sh
Equivalent commands:
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run pytest tests/unit tests/integration -n auto --dist=worksteal
Testing
Current test tiers:
tests/unit/: hermetic unit tests (no real network).tests/integration/: in-process integration tests with mocked HTTP where needed.
For live-provider smoke runs, use the runnable examples in example/ through
their CLIs or the UI Agents page.
Project layout
agentlab/
├── src/agentlab/
│ ├── __init__.py # public API surface
│ ├── cli.py # `agentlab` console entry point
│ ├── config.py # typed settings
│ ├── recorder.py # public `record()` context manager
│ ├── _defaults.toml # bundled non-secret defaults
│ ├── _proto/ # generated protobuf bindings (private)
│ ├── bridges/ # export bridges (e.g. OTel GenAI)
│ ├── core/ # recording primitives
│ ├── io/ # trace IO + HTTP capture
│ ├── integrations/ # framework adapters
│ ├── llm/ # provider-agnostic LLM client
│ ├── replay/ # deterministic replay engine
│ ├── storage/ # JSONL + protobuf stores
│ ├── ui/ # Starlette UI server + DTO mapping
│ ├── pytest.py # pytest plugin
│ └── promote.py # replay-test scaffold generator
├── frontend/ # React SPA for the UI server
├── example/ # bundled runnable agent seeds
├── proto/agentlab/v1/trace.proto
├── scripts/ # check, proto regen, UI screenshot helpers
├── tests/{unit,integration}/
└── uv.lock
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_lab-0.1.0.tar.gz.
File metadata
- Download URL: agentic_lab-0.1.0.tar.gz
- Upload date:
- Size: 654.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f092a26d61d645bb5d6ecf33709f05f0e10afe61bb9d624ca51f61d34df91358
|
|
| MD5 |
ef1888f8d27ad10c022654f4b1ee3d54
|
|
| BLAKE2b-256 |
196656405d21216c77ab2c7394b5d7200bf5343a3d988d5810a7092bec241c47
|
Provenance
The following attestation bundles were made for agentic_lab-0.1.0.tar.gz:
Publisher:
release.yml on ambuj-krishna-agrawal/agent-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_lab-0.1.0.tar.gz -
Subject digest:
f092a26d61d645bb5d6ecf33709f05f0e10afe61bb9d624ca51f61d34df91358 - Sigstore transparency entry: 1522898639
- Sigstore integration time:
-
Permalink:
ambuj-krishna-agrawal/agent-lab@026eec48475d25ac57ec502ec1782df4e70f5922 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ambuj-krishna-agrawal
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@026eec48475d25ac57ec502ec1782df4e70f5922 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentic_lab-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentic_lab-0.1.0-py3-none-any.whl
- Upload date:
- Size: 720.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b3d3845f630dc143d4abab57818d036b2c6f063dd398ce7d232103c8391128e
|
|
| MD5 |
b9db8b459d86e009cf630d9a9b42daab
|
|
| BLAKE2b-256 |
afdce64ccb63cf4d8b1c20d6cdc7baed46a63ff2a98deccaad8be48276a2cf26
|
Provenance
The following attestation bundles were made for agentic_lab-0.1.0-py3-none-any.whl:
Publisher:
release.yml on ambuj-krishna-agrawal/agent-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_lab-0.1.0-py3-none-any.whl -
Subject digest:
6b3d3845f630dc143d4abab57818d036b2c6f063dd398ce7d232103c8391128e - Sigstore transparency entry: 1522898645
- Sigstore integration time:
-
Permalink:
ambuj-krishna-agrawal/agent-lab@026eec48475d25ac57ec502ec1782df4e70f5922 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ambuj-krishna-agrawal
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@026eec48475d25ac57ec502ec1782df4e70f5922 -
Trigger Event:
push
-
Statement type: