pytest plugin that uploads LiveKit-agents eval results to agent-observability
Project description
pytest-agent-observability
pytest plugin that uploads LiveKit-agents eval results to agent-observability.
Each pytest run becomes one eval_run in the dashboard, with every test
function showing up as an eval_case including events, judgments, and failure
detail.
Install
pip install pytest-agent-observability
Requires Python 3.9+ and pytest>=7.0. LiveKit integration is optional — the
plugin works for plain pytest suites too.
Quick start
export AGENT_OBSERVABILITY_URL=http://localhost:9090
export AGENT_OBSERVABILITY_AGENT_ID=my-agent # optional
pytest # that's it
If AGENT_OBSERVABILITY_URL is unset, the plugin no-ops — your tests run
identically.
With LiveKit eval tests
import pytest
from livekit.agents import Agent, AgentSession, inference
from pytest_agent_observability import capture
class Assistant(Agent):
def __init__(self):
super().__init__(instructions="Be helpful.")
@pytest.mark.asyncio
async def test_greeting():
async with inference.LLM(model="openai/gpt-4.1-mini") as llm, \
AgentSession(llm=llm) as sess:
await sess.start(Assistant())
result = capture(await sess.run(user_input="Hello"))
result.expect.next_event().is_message(role="assistant")
await result.expect.next_event(type="message").judge(
llm, intent="greets politely",
)
Auto-capture is on by default. The plugin monkey-patches
AgentSession.run so every RunResult flows into the collector
automatically — in practice you rarely need to call capture(result) at
all. The helper remains exported for RunResults produced outside the
standard .run() path, and it's idempotent so calling it on an
already-captured result is a no-op.
.judge(...) calls on LiveKit's assertion API are intercepted
automatically. Verdict, intent, and reasoning are recorded as a first-class
Judgment event in the dashboard.
Node/TypeScript users: the mirror plugin
vitest-agent-observabilityexposes the same behavior. The manual helper is namedcaptureRunResult(result)there (vs.capture(result)here); the auto-capture and.judge()interception are identical across both sides.
Configuration
| Env var | CLI flag | Purpose |
|---|---|---|
AGENT_OBSERVABILITY_URL |
--agent-observability-url |
Base URL of the server (required for upload) |
AGENT_OBSERVABILITY_AGENT_ID |
--agent-observability-agent-id |
Free-form agent identifier for the dashboard |
AGENT_OBSERVABILITY_ACCOUNT_ID |
--agent-observability-account-id |
Multi-tenant account id |
AGENT_OBSERVABILITY_USER |
— | Basic-auth user (when server enables auth) |
AGENT_OBSERVABILITY_PASS |
— | Basic-auth password |
AGENT_OBSERVABILITY_TIMEOUT |
--agent-observability-timeout |
Upload request timeout in seconds (default 10) |
AGENT_OBSERVABILITY_MAX_RETRIES |
--agent-observability-max-retries |
Max upload attempts before falling back (default 3) |
AGENT_OBSERVABILITY_FALLBACK_DIR |
--agent-observability-fallback-dir |
Directory for failed-upload JSON (defaults to .pytest_cache/agent-observability) |
CI metadata (GitHub / GitLab / CircleCI / Buildkite) is auto-detected from standard env vars. No configuration required.
Behavior
- One
POST /observability/evals/v0atpytest_sessionfinish. - 10-second timeout, 3 retries with exponential backoff (1s, 2s, 4s).
- On total upload failure: payload is written to
.pytest_cache/agent-observability/<run_id>.jsonfor manual inspection. - Never raises — upload issues won't fail your test suite.
Running evals from a server
You can invoke pytest programmatically from a FastAPI (or any WSGI/ASGI)
server so your evals run on demand — useful for CI webhooks, scheduled
runs, or an internal "re-grade this agent" button. The plugin attaches
the same way it does on the CLI, so each HTTP-triggered run lands in
the dashboard as its own eval_run.
Use pytest.main() in-process:
from fastapi import FastAPI
import pytest
app = FastAPI()
class JsonCollector:
"""Collect per-case outcomes into a list we can serialize."""
def __init__(self):
self.cases = []
def pytest_runtest_logreport(self, report):
if report.when == "call":
self.cases.append({
"name": report.nodeid,
"outcome": report.outcome,
"ms": int(report.duration * 1000),
})
@app.post("/run")
async def run(files: list[str]):
collector = JsonCollector()
# pytest.main runs in the current process, picks up pyproject.toml /
# conftest.py, and activates pytest-agent-observability like any
# normal invocation. The extra plugin argument registers our
# in-memory collector alongside it.
code = pytest.main([*files, "-q"], plugins=[collector])
return {"passed": code == 0, "cases": collector.cases}
Notes:
pytest.main()re-uses the current process, so the plugin readsAGENT_OBSERVABILITY_URLfrom the server's environment and uploads the run. Set those vars on the server process, not per request.- A single process can only run one
pytest.main()at a time — pytest'sconftestmachinery and plugin registries are global. For a throughput-oriented server, spawn a subprocess per request (subprocess.run(["pytest", …])) or queue requests. pytest.main()triggers the asyncio plugin's event-loop setup; if your FastAPI endpoint is already on a loop, run it throughasyncio.to_thread(...)to keep the loops separate.
A working reference server with both /run/pytest (full pytest run)
and /run/scenarios (bypasses pytest, calls the scenario runner
directly) lives at
plugins/examples/pytest/fastapi_runner.py.
Its Node mirror using startVitest from Bun is at
plugins/examples/vitest/bun_runner.ts.
Development
cd plugins/pytest-agent-observability
pip install -e ".[dev]"
pytest
Releasing
Publishing is PR-label triggered — no manual tags or releases.
- Bump
versioninplugins/pytest-agent-observability/pyproject.tomlin a dedicated PR (no feature changes in the same PR). - Apply labels:
release-pytest-plugin— trigger: publishes to PyPI on merge.pytest-agent-observability— (on feature/fix PRs only) filter: include this PR in the next release's notes.
- Merge to
main.Testsruns, thenPublish pytest-agent-observabilitypicks up the merged commit, buildsplugins/pytest-agent-observabilitywithpython -m build, publishes via PyPI trusted publishing, and creates apytest-plugin-v<version>GitHub Release with notes listing everypytest-agent-observability-labeled PR merged since the previouspytest-plugin-v*tag.
Prerequisite (one-time): configure a PyPI trusted publisher for
pytest-agent-observability pointing at the publish-pytest-plugin.yml
workflow in this repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_agent_observability-0.2.1.tar.gz.
File metadata
- Download URL: pytest_agent_observability-0.2.1.tar.gz
- Upload date:
- Size: 29.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f21c30a8f4bc4593d65b98540c07e0571f8c0642e1780cc90f6653cb2ba7b6a5
|
|
| MD5 |
2d326224801d2065cb83cae9a31e80fa
|
|
| BLAKE2b-256 |
39f90f365e58930f0aca484aee4d81b3e01c37285179f0500fe28676f2285a82
|
Provenance
The following attestation bundles were made for pytest_agent_observability-0.2.1.tar.gz:
Publisher:
publish-pytest-plugin.yml on plivo-labs/agent-observability
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_agent_observability-0.2.1.tar.gz -
Subject digest:
f21c30a8f4bc4593d65b98540c07e0571f8c0642e1780cc90f6653cb2ba7b6a5 - Sigstore transparency entry: 1392536133
- Sigstore integration time:
-
Permalink:
plivo-labs/agent-observability@982e53c1f358dffae556bf2aef956caa6697ee72 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/plivo-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pytest-plugin.yml@982e53c1f358dffae556bf2aef956caa6697ee72 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file pytest_agent_observability-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pytest_agent_observability-0.2.1-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5c4cd33f4ad1beb4c357570c583f530f68fc13f9090bb65b6d60b0ff212822c
|
|
| MD5 |
833573b1fdb41880e93d250a3fc86afc
|
|
| BLAKE2b-256 |
eb325eec4d7662a3f0b4b2c911bc9ca6363abd465d77137eb788329578dd2186
|
Provenance
The following attestation bundles were made for pytest_agent_observability-0.2.1-py3-none-any.whl:
Publisher:
publish-pytest-plugin.yml on plivo-labs/agent-observability
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_agent_observability-0.2.1-py3-none-any.whl -
Subject digest:
a5c4cd33f4ad1beb4c357570c583f530f68fc13f9090bb65b6d60b0ff212822c - Sigstore transparency entry: 1392536135
- Sigstore integration time:
-
Permalink:
plivo-labs/agent-observability@982e53c1f358dffae556bf2aef956caa6697ee72 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/plivo-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pytest-plugin.yml@982e53c1f358dffae556bf2aef956caa6697ee72 -
Trigger Event:
workflow_run
-
Statement type: