A generative AI-powered framework for testing virtual agents.
Project description
Agent Evaluation - Weni Fork
Note: This is a fork of the original Agent Evaluation framework by AWS Labs. This fork adds support for testing Weni conversational AI agents while maintaining all the original functionality for AWS services.
Agent Evaluation is a generative AI-powered framework for testing virtual agents.
Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.
✨ Key features
- 🆕 Weni Agent Support: Built-in support for testing Weni conversational AI agents through their API and WebSocket interface.
- Built-in support for popular AWS services including Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. You can also bring your own agent to test using Agent Evaluation.
- Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
- Define hooks to perform additional tasks such as integration testing.
- Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.
🚀 Quick Start with Weni
Installation
Install the package from PyPI:
pip install weni-agenteval
Alternative: Install from source
If you want to install from source for development:
git clone https://github.com/weni-ai/agent-evaluation.git
cd agent-evaluation
pip install -e .
Prerequisites for Weni Target
⚠️ Important: You need both AWS and Weni credentials to run evaluations!
To test Weni agents, you'll need:
-
AWS Credentials: Required for the evaluator (Claude model via Bedrock)
- AWS Access Key ID
- AWS Secret Access Key
- AWS Session Token
-
A Weni Project: An active project in the Weni platform
-
Weni Authentication: Choose one of the following methods:
🚀 Option 1: Weni CLI (Recommended)
Install and authenticate with the Weni CLI:
# Install Weni CLI pip install weni-cli # Authenticate with Weni weni login # Select your project weni project use [your-project-uuid]
Get the Weni CLI from: https://github.com/weni-ai/weni-cli
📋 Option 2: Environment Variables
Set these environment variables manually:
WENI_PROJECT_UUID: Your project's unique identifierWENI_BEARER_TOKEN: Your authentication bearer token
⚙️ Option 3: Configuration File
Provide credentials directly in your test configuration file.
Setting up Environment Variables
💡 Note: If you're using Weni CLI (Option 1 above), you only need to set AWS credentials. The Weni credentials will be handled automatically by the CLI.
macOS/Linux:
# AWS Credentials (required for evaluator)
export AWS_ACCESS_KEY_ID="your-aws-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
export AWS_SESSION_TOKEN="your-aws-session-token"
# Weni Credentials (only needed if NOT using Weni CLI)
export WENI_PROJECT_UUID="your-project-uuid-here"
export WENI_BEARER_TOKEN="your-bearer-token-here"
Windows (Command Prompt):
REM AWS Credentials (required for evaluator)
set AWS_ACCESS_KEY_ID=your-aws-access-key-id
set AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
set AWS_SESSION_TOKEN=your-aws-session-token
REM Weni Credentials (only needed if NOT using Weni CLI)
set WENI_PROJECT_UUID=your-project-uuid-here
set WENI_BEARER_TOKEN=your-bearer-token-here
Windows (PowerShell):
# AWS Credentials (required for evaluator)
$env:AWS_ACCESS_KEY_ID="your-aws-access-key-id"
$env:AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
$env:AWS_SESSION_TOKEN="your-aws-session-token"
# Weni Credentials (only needed if NOT using Weni CLI)
$env:WENI_PROJECT_UUID="your-project-uuid-here"
$env:WENI_BEARER_TOKEN="your-bearer-token-here"
Basic Usage
Create a test configuration file agenteval.yml:
evaluator:
model: claude-haiku-3_5-us
aws_region: us-east-1
target:
type: weni
timeout: 30 # Optional: max seconds to wait for response
tests:
greeting:
steps:
- Send a greeting "Olá, bom dia!"
- Ask what "com oq vc pode me ajudar?"
expected_results:
- Agent responds with a friendly greeting
- Agent shows up a menu with options to help the user
purchase_outside_postal_code:
steps:
- Ask information "quero comprar arroz"
- Give the postal code "04538-132"
expected_results:
- Agent responds asking for postal code
- Agent says it doesn't deliver to this postal code
Run the evaluation:
weni-agenteval run
Note: The tool automatically looks for
agenteval.ymlin the current directory. You can also specify a different directory with--plan-dirif needed.
Additional CLI options:
# Run with verbose output
weni-agenteval run --verbose
# Run specific tests only
weni-agenteval run --filter "greeting,purchase_outside_postal_code"
# Run from a different directory
weni-agenteval run --plan-dir /path/to/test/directory
# Initialize a new test plan template
weni-agenteval init
Configuration Options for Weni Target
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
type |
string | Yes | - | Must be "weni" |
timeout |
integer | No | 30 |
Max seconds to wait for response |
weni_project_uuid |
string | No | - | Project UUID (use Weni CLI or env var instead) |
weni_bearer_token |
string | No | - | Bearer token (use Weni CLI or env var instead) |
Advanced Example
Here's a more comprehensive test plan:
evaluator:
model: claude-haiku-3_5-us
aws_region: us-east-1
target:
type: weni
timeout: 45
tests:
# Basic greeting test
greeting:
steps:
- Send a greeting "Olá, como você está?"
expected_results:
- Agent responds politely
- Agent asks how it can help
# Multi-turn conversation
product_inquiry:
steps:
- Ask "Quais produtos vocês têm?"
- Follow up with "Qual é o preço do arroz?"
- Ask "Vocês entregam em São Paulo?"
expected_results:
- Agent provides product information
- Agent gives pricing details
- Agent confirms delivery area
# Error handling
unclear_input:
steps:
- Send unclear text "xyz123 !!!"
expected_results:
- Agent handles gracefully
- Agent asks for clarification
- No error messages shown to user
# Context maintenance
context_test:
steps:
- Say "Quero comprar feijão"
- Ask "Qual o prazo de entrega?"
- Ask "E o frete?"
expected_results:
- Agent remembers the product context
- Agent provides delivery timeframe
- Agent gives shipping cost information
📚 Documentation
To get started with the original framework features, please visit the full documentation here.
For Weni-specific documentation, see the Weni Target Documentation.
To contribute, please refer to CONTRIBUTING.md
🔧 Troubleshooting Weni Target
Common Issues
AWS Authentication Errors
- Verify your AWS environment variables are set correctly (ACCESS_KEY_ID, SECRET_ACCESS_KEY, SESSION_TOKEN)
- Ensure you have access to Amazon Bedrock in your specified region
- Check that your AWS credentials have the necessary Bedrock permissions
- Verify the
aws_regionin your configuration matches your AWS account's region access
Weni Authentication Errors
- Using Weni CLI (Recommended): Run
weni loginto re-authenticate, thenweni project use [project-uuid]to select your project - Using Environment Variables: Verify your
WENI_BEARER_TOKENis valid and not expired - Check that the
WENI_PROJECT_UUIDmatches your actual project - Ensure the bearer token has the necessary permissions for the project
- Get Weni CLI at: https://github.com/weni-ai/weni-cli
Connection Issues
- Verify the Weni API endpoints are accessible from your network
- Check for any firewall or proxy settings blocking HTTPS/WSS connections
- Ensure your internet connection is stable
Timeout Errors
- Increase the
timeoutparameter if your agent requires more processing time - Check if the agent is properly configured and active in the Weni platform
- Verify the agent is not stuck in a processing loop
WebSocket Connection Failures
- Ensure the
websocket-clientpackage is properly installed - Check for any proxy configurations that might interfere with WebSocket connections
- Verify the WebSocket endpoint URL is correct for your project
🆚 Differences from Original
This fork maintains full compatibility with the original AWS Labs Agent Evaluation framework while adding:
- Weni Target Support: Native integration with Weni conversational AI platform
- WebSocket Communication: Real-time bidirectional communication with Weni agents
- Session Isolation: Each test case uses unique contact identifiers for proper conversation isolation
All original AWS targets (Bedrock, Q Business, SageMaker, etc.) continue to work exactly as in the original repository.
🤝 Contributing
We welcome contributions! This fork follows the same contribution guidelines as the original project. Please see CONTRIBUTING.md for details.
For Weni-specific contributions:
- Test your changes with actual Weni agents
- Update the Weni target documentation if needed
- Ensure backward compatibility with existing configurations
👏 Contributors
Shout out to these awesome contributors:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file weni_agenteval-1.0.4.tar.gz.
File metadata
- Download URL: weni_agenteval-1.0.4.tar.gz
- Upload date:
- Size: 40.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4793c83eb50ec7dfe0a494bd7453af35c869be93ec48546a6407b6b781005ac0
|
|
| MD5 |
c952c9b4e7dc4bb7256c51982714d135
|
|
| BLAKE2b-256 |
413b1a017b69640c82f5c8953bf254a2738777f5b89b12d2bf44507a4042d264
|
File details
Details for the file weni_agenteval-1.0.4-py3-none-any.whl.
File metadata
- Download URL: weni_agenteval-1.0.4-py3-none-any.whl
- Upload date:
- Size: 55.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0693c8134025911582808b329dd4c238f8efed9ee92aec2fd8e27eba07c498aa
|
|
| MD5 |
44e5d124ec5b57f251fb3ce5919834e4
|
|
| BLAKE2b-256 |
7990261dab4dd5ad0b5262ce54c77c9cbb178962d2b57a80a00d54e32a9b05cc
|