pytest plugin that uploads LiveKit-agents eval results to agent-observability

These details have not been verified by PyPI

Project description

pytest-agent-observability

pytest plugin that uploads LiveKit-agents eval results to agent-observability.

Each pytest run becomes one eval_run in the dashboard, with every test function showing up as an eval_case including events, judgments, and failure detail.

Install

pip install pytest-agent-observability

Requires Python 3.9+ and pytest>=7.0. LiveKit integration is optional — the plugin works for plain pytest suites too.

Quick start

export AGENT_OBSERVABILITY_URL=http://localhost:9090
export AGENT_OBSERVABILITY_AGENT_ID=my-agent      # optional
pytest                                             # that's it

If AGENT_OBSERVABILITY_URL is unset, the plugin no-ops — your tests run identically.

With LiveKit eval tests

import pytest
from livekit.agents import Agent, AgentSession, inference
from pytest_agent_observability import capture

class Assistant(Agent):
    def __init__(self):
        super().__init__(instructions="Be helpful.")

@pytest.mark.asyncio
async def test_greeting():
    async with inference.LLM(model="openai/gpt-4.1-mini") as llm, \
               AgentSession(llm=llm) as sess:
        await sess.start(Assistant())
        result = capture(await sess.run(user_input="Hello"))
        result.expect.next_event().is_message(role="assistant")
        await result.expect.next_event(type="message").judge(
            llm, intent="greets politely",
        )

Auto-capture is on by default. The plugin monkey-patches AgentSession.run so every RunResult flows into the collector automatically — in practice you rarely need to call capture(result) at all. The helper remains exported for RunResults produced outside the standard .run() path, and it's idempotent so calling it on an already-captured result is a no-op.

.judge(...) calls on LiveKit's assertion API are intercepted automatically. Verdict, intent, and reasoning are recorded as a first-class Judgment event in the dashboard.

Node/TypeScript users: the mirror plugin vitest-agent-observability exposes the same behavior. The manual helper is named captureRunResult(result) there (vs. capture(result) here); the auto-capture and .judge() interception are identical across both sides.

Configuration

Env var	CLI flag	Purpose
`AGENT_OBSERVABILITY_URL`	`--agent-observability-url`	Base URL of the server (required for upload)
`AGENT_OBSERVABILITY_AGENT_ID`	`--agent-observability-agent-id`	Free-form agent identifier for the dashboard
`AGENT_OBSERVABILITY_ACCOUNT_ID`	`--agent-observability-account-id`	Multi-tenant account id
`AGENT_OBSERVABILITY_USER`	—	Basic-auth user (when server enables auth)
`AGENT_OBSERVABILITY_PASS`	—	Basic-auth password
`AGENT_OBSERVABILITY_TIMEOUT`	`--agent-observability-timeout`	Upload request timeout in seconds (default `10`)
`AGENT_OBSERVABILITY_MAX_RETRIES`	`--agent-observability-max-retries`	Max upload attempts before falling back (default `3`)
`AGENT_OBSERVABILITY_FALLBACK_DIR`	`--agent-observability-fallback-dir`	Directory for failed-upload JSON (defaults to `.pytest_cache/agent-observability`)

CI metadata (GitHub / GitLab / CircleCI / Buildkite) is auto-detected from standard env vars. No configuration required.

Behavior

One POST /observability/evals/v0 at pytest_sessionfinish.
10-second timeout, 3 retries with exponential backoff (1s, 2s, 4s).
On total upload failure: payload is written to .pytest_cache/agent-observability/<run_id>.json for manual inspection.
Never raises — upload issues won't fail your test suite.

Running evals from a server

You can invoke pytest programmatically from a FastAPI (or any WSGI/ASGI) server so your evals run on demand — useful for CI webhooks, scheduled runs, or an internal "re-grade this agent" button. The plugin attaches the same way it does on the CLI, so each HTTP-triggered run lands in the dashboard as its own eval_run.

Use pytest.main() in-process:

from fastapi import FastAPI
import pytest

app = FastAPI()

class JsonCollector:
    """Collect per-case outcomes into a list we can serialize."""
    def __init__(self):
        self.cases = []

    def pytest_runtest_logreport(self, report):
        if report.when == "call":
            self.cases.append({
                "name": report.nodeid,
                "outcome": report.outcome,
                "ms": int(report.duration * 1000),
            })

@app.post("/run")
async def run(files: list[str]):
    collector = JsonCollector()
    # pytest.main runs in the current process, picks up pyproject.toml /
    # conftest.py, and activates pytest-agent-observability like any
    # normal invocation. The extra plugin argument registers our
    # in-memory collector alongside it.
    code = pytest.main([*files, "-q"], plugins=[collector])
    return {"passed": code == 0, "cases": collector.cases}

Notes:

pytest.main() re-uses the current process, so the plugin reads AGENT_OBSERVABILITY_URL from the server's environment and uploads the run. Set those vars on the server process, not per request.
A single process can only run one pytest.main() at a time — pytest's conftest machinery and plugin registries are global. For a throughput-oriented server, spawn a subprocess per request (subprocess.run(["pytest", …])) or queue requests.
pytest.main() triggers the asyncio plugin's event-loop setup; if your FastAPI endpoint is already on a loop, run it through asyncio.to_thread(...) to keep the loops separate.

A working reference server with both /run/pytest (full pytest run) and /run/scenarios (bypasses pytest, calls the scenario runner directly) lives at plugins/examples/pytest/fastapi_runner.py. Its Node mirror using startVitest from Bun is at plugins/examples/vitest/bun_runner.ts.

Development

cd plugins/pytest-agent-observability
pip install -e ".[dev]"
pytest

Releasing

Publishing is PR-label triggered — no manual tags or releases.

Bump version in plugins/pytest-agent-observability/pyproject.toml in a dedicated PR (no feature changes in the same PR).
Apply labels:
- release-pytest-plugin — trigger: publishes to PyPI on merge.
- pytest-agent-observability — (on feature/fix PRs only) filter: include this PR in the next release's notes.
Merge to main. Tests runs, then Publish pytest-agent-observability picks up the merged commit, builds plugins/pytest-agent-observability with python -m build, publishes via PyPI trusted publishing, and creates a pytest-plugin-v<version> GitHub Release with notes listing every pytest-agent-observability-labeled PR merged since the previous pytest-plugin-v* tag.

Prerequisite (one-time): configure a PyPI trusted publisher for pytest-agent-observability pointing at the publish-pytest-plugin.yml workflow in this repo.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Apr 27, 2026

0.0.1

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agent_observability-0.2.1.tar.gz (29.7 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agent_observability-0.2.1-py3-none-any.whl (16.7 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file pytest_agent_observability-0.2.1.tar.gz.

File metadata

Download URL: pytest_agent_observability-0.2.1.tar.gz
Upload date: Apr 27, 2026
Size: 29.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_agent_observability-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`f21c30a8f4bc4593d65b98540c07e0571f8c0642e1780cc90f6653cb2ba7b6a5`
MD5	`2d326224801d2065cb83cae9a31e80fa`
BLAKE2b-256	`39f90f365e58930f0aca484aee4d81b3e01c37285179f0500fe28676f2285a82`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agent_observability-0.2.1.tar.gz:

Publisher: publish-pytest-plugin.yml on plivo-labs/agent-observability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agent_observability-0.2.1.tar.gz
- Subject digest: f21c30a8f4bc4593d65b98540c07e0571f8c0642e1780cc90f6653cb2ba7b6a5
- Sigstore transparency entry: 1392536133
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: plivo-labs/agent-observability@982e53c1f358dffae556bf2aef956caa6697ee72
- Branch / Tag: refs/heads/main
- Owner: https://github.com/plivo-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pytest-plugin.yml@982e53c1f358dffae556bf2aef956caa6697ee72
- Trigger Event: workflow_run

File details

Details for the file pytest_agent_observability-0.2.1-py3-none-any.whl.

File metadata

Download URL: pytest_agent_observability-0.2.1-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_agent_observability-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5c4cd33f4ad1beb4c357570c583f530f68fc13f9090bb65b6d60b0ff212822c`
MD5	`833573b1fdb41880e93d250a3fc86afc`
BLAKE2b-256	`eb325eec4d7662a3f0b4b2c911bc9ca6363abd465d77137eb788329578dd2186`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agent_observability-0.2.1-py3-none-any.whl:

Publisher: publish-pytest-plugin.yml on plivo-labs/agent-observability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agent_observability-0.2.1-py3-none-any.whl
- Subject digest: a5c4cd33f4ad1beb4c357570c583f530f68fc13f9090bb65b6d60b0ff212822c
- Sigstore transparency entry: 1392536135
- Sigstore integration time: Apr 27, 2026
Source repository:
- Permalink: plivo-labs/agent-observability@982e53c1f358dffae556bf2aef956caa6697ee72
- Branch / Tag: refs/heads/main
- Owner: https://github.com/plivo-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pytest-plugin.yml@982e53c1f358dffae556bf2aef956caa6697ee72
- Trigger Event: workflow_run

pytest-agent-observability 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

pytest-agent-observability

Install

Quick start

With LiveKit eval tests

Configuration

Behavior

Running evals from a server

Development

Releasing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance