Skip to main content

End-to-end agent testing — assert an agent's tools fired, via Revefi LLM observability

Project description

revefi-agent-test

A small Python harness that asserts an AI agent actually called the tools it was supposed to — verified against Revefi's LLM observability spans, not the answer text. Works for any agent instrumented with revefi-llm-sdk.

Per case it: generates a per-run test_prompt_id → POSTs the case body to the agent's url with a test-prompt-id header → polls Revefi's test-span-data API by that id → checks the run's span for the required_tools → optionally scores the answer against a baseline_answer via Revefi's baseline-similarity API → prints the result. Exit code is non-zero if any case fails.

Two keys (least-privilege)

The harness uses two distinct API keys so each can be least-privilege:

  • config.api_key — the read-only key for Revefi observability reads (test-span-data and baseline-similarity). config.base_url is the Revefi public API base.
  • cases[].api_key — the key used to call that case's agent url (e.g. /raden/ask).

Any key may be a literal token or a ${ENV_VAR} reference expanded from the environment at load time, so CI injects secrets via env and the YAML never commits them. The read and agent keys must be the same tenant, or the span read won't find the agent run's spans.

Layout

revefi_agent_test/__init__.py   # all the logic + the CLI (run_test, AgentTestCase, RevefiConfig, main)
revefi_agent_test/__main__.py   # lets you do `python -m revefi_agent_test`
examples/raden.yaml             # a worked example config you copy and edit
pyproject.toml                  # package metadata (the `revefi-agent-test` command)

Run it

pip install -e .                          # from this repo (editable; no PyPI needed)
cp examples/raden.yaml my.yaml            # copy the example, then edit the config block + cases
export REVEFI_OBS_API_KEY=  REVEFI_AGENT_API_KEY=   # if the YAML uses ${REVEFI_OBS_API_KEY} / ${REVEFI_AGENT_API_KEY}
revefi-agent-test --config my.yaml

The config (one generic format)

A single YAML file: a config block (the read-only observability connection) plus cases, each a self-contained agent request (url + api_key + body).

config:
  base_url: https://your-revefi-instance.com/v1   # Revefi public API base — observability reads
  api_key: ${REVEFI_OBS_API_KEY}                       # READ-ONLY key; literal or a ${ENV_VAR} reference

cases:
  - name: web search
    url: https://your-agent.example.com/run        # the agent endpoint to drive
    api_key: ${REVEFI_AGENT_API_KEY}                    # key for THIS agent call; literal or ${ENV_VAR}
    body:                                           # POSTed to `url` verbatim — shape is up to the agent
      input: "Who won the 2024 IPL final? Use web search."
    required_tools:
      - web_search_tool

  - name: answer quality                            # optional semantic check via baseline-similarity
    url: https://your-agent.example.com/run
    api_key: ${REVEFI_AGENT_API_KEY}
    body:
      input: "What is the configuration of warehouse REVEFI_DEV_WH?"
    baseline_answer: "REVEFI_DEV_WH is X-SMALL with auto-suspend 60s."
    min_similarity: 0.8                             # fail if cosine(answer, baseline) < this (default 0.8)

body is opaque — it's whatever the agent under test expects. A case can assert required_tools (verified via spans) and/or set baseline_answer + min_similarity (verified via baseline-similarity). The similarity candidate is the answer text parsed from the run's span (the LLM completion content — cleaner than the agent's JSON/markup HTTP body); if the span can't be read (not ingested), it falls back to the raw agent response.

As a library

from revefi_agent_test import RevefiConfig, AgentTestCase, run_test

cfg = RevefiConfig(base_url="https://your-revefi-instance.com/v1", api_key="<read-only obs key>")
cases = [
    AgentTestCase(
        name="web search",
        url="https://your-agent.example.com/run",
        api_key="<agent key>",
        body={"input": "Who won the 2024 IPL final? Use web search."},
        required_tools=["web_search_tool"],
    )
]
assert all(r.passed for r in run_test(cfg, cases))

How test_prompt_id reaches the agent

The harness attaches a per-run test-prompt-id header to every agent call. The agent reads it off its inbound request and forwards it to revefi_llm_sdk.set_request_test_prompt_id(...) so the run's spans get tagged — in the agent's own request handler, or in whatever gateway fronts it. The harness never touches the body, so adopting this needs no change to your agent's request schema.

How tools are detected

Verification reads the run's latest SPAN_KIND_CLIENT span from Revefi's test-span-data API and collects tool names from extractedData.promptsList[*].toolCallsList[*].name (and completionsList[*]), comparing them against required_tools.

CI

Run it in CI by pointing --config at a YAML committed to your repo whose api_key values are ${ENV_VAR} references, and inject the matching secrets via env — never commit the tokens. For example, with config.api_key: ${PUBLIC_API_TOKEN_RO} and cases[].api_key: ${PUBLIC_API_TOKEN}:

env:
  PUBLIC_API_TOKEN_RO: ${{ secrets.PUBLIC_API_TOKEN_RO }}   # read-only observability key
  PUBLIC_API_TOKEN:    ${{ secrets.PUBLIC_API_TOKEN }}      # agent-call key (same tenant)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revefi_agent_test-0.2.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

revefi_agent_test-0.2.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file revefi_agent_test-0.2.0.tar.gz.

File metadata

  • Download URL: revefi_agent_test-0.2.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for revefi_agent_test-0.2.0.tar.gz
Algorithm Hash digest
SHA256 da8deec59f67542fad7d5b0bc7b8bb3531f216e518a698663f900f6e7778f1c7
MD5 956f2e3e72c2eb3bb74eb3e891a9f928
BLAKE2b-256 f2723457cf8c3a2bfa34c8b2248c3186ada59d66bf215dbc1931f9e58c694f04

See more details on using hashes here.

File details

Details for the file revefi_agent_test-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for revefi_agent_test-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dda0a6b411aa4bd64b88d225afbf092cd7b89dbe792d43ee7f645477bfe5b96
MD5 1c94d12749dc7dcb40fec2a839c8e7a9
BLAKE2b-256 6b28727e0c33aefabfa6028b89e4d04de06e0ea041ad509e1d825e75baeaf544

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page