Skip to main content

Universal record-and-replay for LLM agents.

Project description

AgentLab

Universal record-and-replay for LLM agents.

Status: pre-alpha, APIs will change.

AgentLab captures model calls, tools, state transitions, and timing into a trace you can replay without hitting the network. It is built around a framework-agnostic core and an HTTP capture layer that works with any SDK that routes requests through httpx.

Overhead

Per-LLM-call cost of running inside agentlab.record():

metric baseline recorded overhead
latency p50 13.5 ms 14.7 ms +1.16 ms
latency p99 14.4 ms 15.9 ms +1.52 ms

Measured against an in-process loopback HTTP server with a 10 ms upstream delay (eliminates network jitter so the delta isolates SDK overhead: HTTP capture, span emit, JSONL write+fsync, matcher, LLMSpan build). Real LLM calls land in the 100 ms – 2000 ms range, so this works out to under 1% wall-clock overhead in practice.

Reproduce with:

uv run python scripts/bench_record_overhead.py --calls 200 --runs 5

Installation

pip install agentic-lab           # minimal SDK
pip install 'agentic-lab[ui]'     # + Starlette UI server

The PyPI distribution is agentic-lab; the importable Python module is agentlab:

import agentlab as al

For local development, this repo is uv-managed:

git clone https://github.com/ambuj-krishna-agrawal/agent-lab.git
cd agent-lab
uv sync --all-extras --frozen

Use --frozen by default so your environment matches uv.lock and CI.

Documentation

  • Quickstart — five minutes from install to a replayable trace.
  • Provider coverage — every supported LLM provider + how to add custom ones.
  • Error reference — every AGL-… code with a remediation sentence (auto-generated from src/agentlab/errors.py).
  • Changelog — version history.
  • AGENTS.md — invariants and quality gates contributors must respect.
  • CONTRIBUTING.md — human-contributor process.

Configuration

  • Secrets live in .env (git-ignored). Copy .env.example and set the provider keys you use.
  • Non-secret defaults live in src/agentlab/_defaults.toml and can be overridden by AGENTLAB_* environment variables.
  • Full typed config lives in src/agentlab/config.py.

Quickstart

Five minutes from pip install to a trace you can replay without an API key. The full runnable script lives at example/quickstart.py; the inline version:

import os
import openai
import agentlab as al

client = openai.OpenAI(
    api_key=os.environ["OPENROUTER_API_KEY"],
    base_url="https://openrouter.ai/api/v1",
)

# 1. Record.
with (
    al.record(agent_name="quickstart") as recording,
    al.agent(name="quickstart", version="0"),
    al.step(role=al.StepRole.EXECUTE),
):
    response = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
        max_tokens=16,
    )
print("model said:", response.choices[0].message.content)
print("trace at:  ", recording.directory)

# 2. Replay — no network, no key.
with al.replay(str(recording.directory)) as session:
    replay = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
        max_tokens=16,
    )
print("replay said:", replay.choices[0].message.content)
print("cache hits: ", session.cache_hits)
pip install 'agentic-lab[ui]' openai
export OPENROUTER_API_KEY=sk-or-...
python example/quickstart.py
agentlab serve --root ~/.agentlab/traces
# → http://127.0.0.1:7861/

The with al.agent(...) and al.step(...) envelopes give the auto-emitted LLMSpan a typed parent (the V4 schema forbids LLM under bare RUN). Production agents normally establish these once near their entrypoints and don't repeat them per-call — see example/workflows/ for that shape.

Larger example agents

Three reference agents under example/ cover the Anthropic building-effective-agents shapes:

Folder Shape What it does
workflows/ Workflow (fixed code path) Decompose → Wikipedia search → cite → LLM-as-judge → revise.
autonomous/ Autonomous (model picks each step) LangGraph observe-plan-act loop that triages support tickets.
hybrid/ Workflow + autonomous sub-agent Incident-response pipeline with autonomous investigation step.

All three use OpenRouter via langchain-openai, real (or realistic) tools, and produce traces directly into example_traces/ that agentlab serve can browse.

Provider coverage

Inside an agentlab.record() block AgentLab patches httpx transport methods, so every SDK that routes through httpx (which is most modern Python LLM SDKs) lands its raw exchange in http.jsonl. That file is the source of truth for replay; the typed LLMSpan is a best-effort view layered on top.

The built-in matchers turn recognised exchanges into typed LLMSpans out of the box:

Provider Endpoint(s) Stream?
OpenAI chat completions api.openai.com/v1/chat/completions yes
OpenAI Responses api.openai.com/v1/responses yes
OpenAI Embeddings api.openai.com/v1/embeddings n/a
Azure OpenAI chat completions *.openai.azure.com/openai/deployments/<dep>/chat/completions yes
Anthropic Messages api.anthropic.com/v1/messages yes
AWS Bedrock — Invoke bedrock-runtime.<region>.amazonaws.com/model/<id>/invoke[-with-response-stream] partial[^1]
AWS Bedrock — Converse bedrock-runtime.<region>.amazonaws.com/model/<id>/converse[-stream] partial[^1]
Google Gemini generativelanguage.googleapis.com/.../models/<m>:[stream]generateContent yes
Vertex AI — Gemini <region>-aiplatform.googleapis.com/.../models/<m>:[stream]generateContent yes
Vertex AI — Anthropic (Claude) <region>-aiplatform.googleapis.com/.../models/<m>:[stream]rawPredict yes
OpenRouter openrouter.ai/api/v1/chat/completions yes
Together AI api.together.{xyz,ai}/v1/chat/completions yes
Groq api.groq.com/openai/v1/chat/completions yes
Mistral api.mistral.ai/v1/chat/completions yes
Fireworks api.fireworks.ai/inference/v1/chat/completions yes
DeepInfra api.deepinfra.com/v1/openai/chat/completions yes
Perplexity api.perplexity.ai/chat/completions yes

[^1]: Bedrock streaming uses AWS event-stream binary framing. Buffered responses populate every LLMSpan field; streamed responses record the request side and a validation_errors entry explaining why the response side is empty. The raw bytes are still preserved in http.jsonl.

Adding a custom or self-hosted provider

OpenAI-compatible hosts (vLLM, Ollama, your private gateway) need one line:

import agentlab as al
from agentlab.llm.matchers.openai import HostPathMatcher

al.register_llm_provider(HostPathMatcher(
    name="my-vllm",
    host_suffix="llm.internal.example.com",
    path_prefix="/v1/chat/completions",
))

For wholly different body shapes, subclass agentlab.llm.LLMProviderMatcher.

Pricing

The SDK is token-only by defaultLLMSpan.cost.usd stays at 0.0 and the span is annotated with agentlab.llm.pricing.unknown=True. Provider list-prices change too often to bake into the SDK. Operators who want USD computed on every span install their own table:

from agentlab.llm.pricing import PriceRow, StaticPriceTable, set_price_table

set_price_table(StaticPriceTable(rows=(
    PriceRow("openai", "gpt-4o", 2.50, 10.00),
    PriceRow("anthropic", "claude-3-5-sonnet*", 3.00, 15.00),
)))

Strict mode for unrecognised exchanges

By default, exchanges that don't match any provider matcher log a warning (one per (trace, host)) and the raw exchange remains in http.jsonl. Power users can opt into stricter behaviour:

with al.record(strict_unknown_provider="raise"):  # or "emit_op"
    ...

"raise" surfaces the gap as UnknownLLMProviderError; "emit_op" records the call as a typed OpSpan so the trace tree is complete even without a matcher.

UI and examples

Run the backend UI server against bundled traces:

uv run agentlab --root example_traces serve --port 7861

Optional frontend dev server with HMR:

cd frontend
npm install
npm run dev

The bundled runnable agents are seeded from example/ and are available from the Agents page when the server starts successfully.

Production deployment

The OSS UI server can be hosted on a single EC2 box behind Caddy, with a separate Next.js + Clerk marketing/auth site on Vercel that redirects authenticated users to it. See deploy/README.md for the end-to-end runbook.

UI walkthrough

Dashboard

Dashboard

Traces list

Traces list

Trace detail

Trace detail

Agents

Agents

Settings

Settings

Development

Run the local quality gate:

bash scripts/check.sh

Equivalent commands:

uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run pytest tests/unit tests/integration -n auto --dist=worksteal

Testing

Current test tiers:

  • tests/unit/: hermetic unit tests (no real network).
  • tests/integration/: in-process integration tests with mocked HTTP where needed.

For live-provider smoke runs, use the runnable examples in example/ through their CLIs or the UI Agents page.

Project layout

agentlab/
├── src/agentlab/
│   ├── __init__.py          # public API surface
│   ├── cli.py               # `agentlab` console entry point
│   ├── config.py            # typed settings
│   ├── recorder.py          # public `record()` context manager
│   ├── _defaults.toml       # bundled non-secret defaults
│   ├── _proto/              # generated protobuf bindings (private)
│   ├── bridges/             # export bridges (e.g. OTel GenAI)
│   ├── core/                # recording primitives
│   ├── io/                  # trace IO + HTTP capture
│   ├── integrations/        # framework adapters
│   ├── llm/                 # provider-agnostic LLM client
│   ├── replay/              # deterministic replay engine
│   ├── storage/             # JSONL + protobuf stores
│   ├── ui/                  # Starlette UI server + DTO mapping
│   ├── pytest.py            # pytest plugin
│   └── promote.py           # replay-test scaffold generator
├── frontend/                # React SPA for the UI server
├── example/                 # bundled runnable agent seeds
├── proto/agentlab/v1/trace.proto
├── scripts/                 # check, proto regen, UI screenshot helpers
├── tests/{unit,integration}/
└── uv.lock

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_lab-0.1.0.tar.gz (654.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_lab-0.1.0-py3-none-any.whl (720.0 kB view details)

Uploaded Python 3

File details

Details for the file agentic_lab-0.1.0.tar.gz.

File metadata

  • Download URL: agentic_lab-0.1.0.tar.gz
  • Upload date:
  • Size: 654.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_lab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f092a26d61d645bb5d6ecf33709f05f0e10afe61bb9d624ca51f61d34df91358
MD5 ef1888f8d27ad10c022654f4b1ee3d54
BLAKE2b-256 196656405d21216c77ab2c7394b5d7200bf5343a3d988d5810a7092bec241c47

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_lab-0.1.0.tar.gz:

Publisher: release.yml on ambuj-krishna-agrawal/agent-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_lab-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentic_lab-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 720.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_lab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b3d3845f630dc143d4abab57818d036b2c6f063dd398ce7d232103c8391128e
MD5 b9db8b459d86e009cf630d9a9b42daab
BLAKE2b-256 afdce64ccb63cf4d8b1c20d6cdc7baed46a63ff2a98deccaad8be48276a2cf26

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_lab-0.1.0-py3-none-any.whl:

Publisher: release.yml on ambuj-krishna-agrawal/agent-lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page