Skip to main content

Testbench to evaluate agents using Ragas

Project description

Testbench

Kubernetes-native agent evaluation system that executes test datasets via the A2A protocol, scores responses with pluggable metrics (RAGAS by default), and publishes scores via OpenTelemetry.

📖 Documentation: https://docs.agentic-layer.ai/testbench/

Run standalone

For evaluating an agent without deploying into Kubernetes / Testkube:

pip install agentic-layer-testbench
testworkflow config.yaml

See config.example.yaml for the available configuration options.

Development

Prerequisites

  • Python
  • uv
  • Tilt and a local Kubernetes cluster (e.g. kind)
  • Testkube CLI
  • GOOGLE_API_KEY for LLM-as-a-judge evaluation via Gemini

Build and run locally

# Install Python dependencies
uv sync
# Provide the LLM-as-a-judge API key
echo "GOOGLE_API_KEY=<key>" > .env
# Start the local stack (AI gateway, OTLP collector, sample agents, Testkube)
tilt up

Test

uv run poe ruff      # format and lint
uv run poe mypy      # static type checking
uv run poe bandit    # security scanning
uv run poe test      # unit tests
uv run poe check     # all of the above
uv run poe test_e2e  # E2E tests (requires `tilt up`)

E2E defaults target the Tilt environment. Override with E2E_DATASET_URL, E2E_AGENT_URL, E2E_MODEL if needed.

Verify the local deploy

Run the example workflow against the sample weather agent:

kubectl testkube run tw example-workflow --watch

The full walkthrough — defining experiments, configuring metrics, viewing reports — is in the first-workflow how-to.

Contributing

See the Contribution Guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_layer_testbench-0.9.2.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_layer_testbench-0.9.2-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file agentic_layer_testbench-0.9.2.tar.gz.

File metadata

  • Download URL: agentic_layer_testbench-0.9.2.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_layer_testbench-0.9.2.tar.gz
Algorithm Hash digest
SHA256 b4bb33582106f13801baa6b3e9c1939d2d91c7d86ee0afd21dcdc8cfab350864
MD5 dc31f9ea1a7f7edef772776e01969085
BLAKE2b-256 9a63c9a01d8d38fbfc6051db6810686b4be8b590149e3dbc881c07f3a578d78c

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_layer_testbench-0.9.2.tar.gz:

Publisher: publish.yml on agentic-layer/testbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_layer_testbench-0.9.2-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_layer_testbench-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2566503ab1bafc1f90e2782e11a73c91b8a6344a5a2e2f0be8388ffce393ff6a
MD5 c9246a89879377a79729e4b9a809ca88
BLAKE2b-256 3d67132f90f25dff91749f720f2c81e3168b58a7ebd425eca9e9dac1be5e5892

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_layer_testbench-0.9.2-py3-none-any.whl:

Publisher: publish.yml on agentic-layer/testbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page