Skip to main content

Testbench to evaluate agents using Ragas

Project description

Testbench

Kubernetes-native agent evaluation system that executes test datasets via A2A protocol, evaluates responses using pluggable metrics (RAGAS by default), and publishes scores via OpenTelemetry.

Documentation

Full documentation is available at docs.agentic-layer.ai.

Install

Run the testbench standalone (without Kubernetes / Testkube) on any system:

pip install agentic-layer-testbench
testbench config.yaml

See config.example.yaml for the available configuration options.

Prerequisites

  • Python 3.12+ and uv
  • Kubernetes cluster (e.g. kind) with Tilt
  • Testkube CLI
  • GOOGLE_API_KEY for LLM-as-a-judge evaluation via Gemini models

Getting Started

# 1. Start local infrastructure (AI Gateway, OTLP collector, sample agents, Testkube)
#    Create a .env file with GOOGLE_API_KEY=your-key first
tilt up

# 2. Run the example evaluation workflow
kubectl testkube run tw example-workflow --watch

See the how-to guide for detailed pipeline usage including dataset format, metric configuration, and custom workflows.

Development

Command Description
uv run poe check Run all quality checks (tests, mypy, bandit, ruff)
uv run poe test Unit tests
uv run poe format Format with Ruff
uv run poe lint Lint and auto-fix with Ruff
uv run poe ruff Both format and lint
uv run poe mypy Static type checking
uv run poe bandit Security vulnerability scanning

E2E Testing

Requires the Tilt environment running (tilt up).

# Configure (optional — defaults target the Tilt environment)
export E2E_DATASET_URL="http://data-server.data-server:8000/dataset.csv"
export E2E_AGENT_URL="http://weather-agent.sample-agents:8000"
export E2E_MODEL="gemini-2.5-flash-lite"

# Run
uv run poe test_e2e

Contributing

See Contribution Guide for details on contributing and the process for submitting pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_layer_testbench-0.9.1.tar.gz (38.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_layer_testbench-0.9.1-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file agentic_layer_testbench-0.9.1.tar.gz.

File metadata

  • Download URL: agentic_layer_testbench-0.9.1.tar.gz
  • Upload date:
  • Size: 38.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_layer_testbench-0.9.1.tar.gz
Algorithm Hash digest
SHA256 a42bd68bbcbfc1b965a42a64b1d3e84334d6657a30f2cc7511b9694dae37ba83
MD5 e2de648286c408125b5f81663bbd171d
BLAKE2b-256 f2d100cc3532183f0d2eaccff43120ddfdce7afa1a9308cfdfb053b55e338fc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_layer_testbench-0.9.1.tar.gz:

Publisher: publish.yml on agentic-layer/testbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_layer_testbench-0.9.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_layer_testbench-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6d0b7836cef6fb56e846b9041f1bf05eb821f36a35bcc4bfa5e6378efe36a918
MD5 a56d79c9c6d78509b700b3a3d7476f2c
BLAKE2b-256 36864bb10a0fdd81727aba57667d0dea76e5a4cbe0966ecc89f554bfd9ce01e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_layer_testbench-0.9.1-py3-none-any.whl:

Publisher: publish.yml on agentic-layer/testbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page