Skip to main content

End-to-end agent testing — assert an agent's tools fired, via Revefi LLM observability

Project description

revefi-agent-test

A small Python harness that asserts an AI agent actually called the tools it was supposed to — verified against Revefi's LLM observability spans, not the answer text. Works for any agent instrumented with revefi-llm-sdk.

Per case it: generates a per-run test_prompt_id → POSTs the case body to the agent's url with a test-prompt-id header → polls Revefi's test-span-data API by that id → checks the run's span for the required_tools → prints the result. Exit code is non-zero if any case fails.

Layout

revefi_agent_test/__init__.py   # all the logic + the CLI (run_test, AgentTestCase, RevefiConfig, main)
revefi_agent_test/__main__.py   # lets you do `python -m revefi_agent_test`
examples/raden.yaml             # a worked example config you copy and edit
pyproject.toml                  # package metadata (the `revefi-agent-test` command)

Run it

pip install -e .                 # from this repo (editable; no PyPI needed)
cp examples/raden.yaml my.yaml   # copy the example, then edit the config block + cases
export REVEFI_API_KEY=          # or put api_key in the YAML (env wins over the file)
revefi-agent-test --config my.yaml

The config (one generic format)

A single YAML file: a config block (how to reach Revefi to read spans back) plus cases, each POSTing a body to a url.

config:
  base_url: https://your-revefi-instance.com/v1   # Revefi public API base — used to read spans back
  # api_key: "…"                                   # or set REVEFI_API_KEY (env wins); never commit secrets

cases:
  - name: web search
    url: https://your-agent.example.com/run        # the agent endpoint to drive
    body:                                           # POSTed to `url` verbatim — shape is up to the agent
      input: "Who won the 2024 IPL final? Use web search."
    required_tools:
      - web_search_tool

body is opaque — it's whatever the agent under test expects. required_tools is the whole point of the test.

As a library

from revefi_agent_test import RevefiConfig, AgentTestCase, run_test

cfg = RevefiConfig(base_url="https://your-revefi-instance.com/v1", api_key="…")
cases = [
    AgentTestCase(
        name="web search",
        url="https://your-agent.example.com/run",
        body={"input": "Who won the 2024 IPL final? Use web search."},
        required_tools=["web_search_tool"],
    )
]
assert all(r.passed for r in run_test(cfg, cases))

How test_prompt_id reaches the agent

The harness attaches a per-run test-prompt-id header to every agent call. The agent reads it off its inbound request and forwards it to revefi_llm_sdk.set_request_test_prompt_id(...) so the run's spans get tagged — in the agent's own request handler, or in whatever gateway fronts it. The harness never touches the body, so adopting this needs no change to your agent's request schema.

How tools are detected

Verification reads the run's latest SPAN_KIND_CLIENT span from Revefi's test-span-data API and collects tool names from extractedData.promptsList[*].toolCallsList[*].name (and completionsList[*]), comparing them against required_tools.

CI

Run it in CI by pointing --config at a YAML committed to your repo and injecting the API token via a secret (the REVEFI_API_KEY env var) — never commit the token.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revefi_agent_test-0.1.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

revefi_agent_test-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file revefi_agent_test-0.1.0.tar.gz.

File metadata

  • Download URL: revefi_agent_test-0.1.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for revefi_agent_test-0.1.0.tar.gz
Algorithm Hash digest
SHA256 33453e00717905ae163137aed88967f21fa01d2614b8e33dc474a977635dfcdd
MD5 317910894443f940023e8eedd8550c4a
BLAKE2b-256 8be110a9c1cd719d68383f871ef9feeb1a052ec025998f2a002f978f78f3baed

See more details on using hashes here.

File details

Details for the file revefi_agent_test-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for revefi_agent_test-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8eb946b5eb07c2af89ba66ca61afa7b09a160013f06f4739bccec2dc95e4f1cd
MD5 58bca05fa719eb3c6be885504fceeed4
BLAKE2b-256 a343c0cd374615a647540f25a01f83b057fa89728d80beb416704e05941ed62b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page