A generative AI-powered framework for testing virtual agents.

These details have not been verified by PyPI

Project links

Project description

GitHub License

Agent Evaluation - Weni Fork

Note: This is a fork of the original Agent Evaluation framework by AWS Labs. This fork adds support for testing Weni conversational AI agents while maintaining all the original functionality for AWS services.

Agent Evaluation is a generative AI-powered framework for testing virtual agents.

Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

✨ Key features

🆕 Weni Agent Support: Built-in support for testing Weni conversational AI agents through their API and WebSocket interface.
Built-in support for popular AWS services including Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. You can also bring your own agent to test using Agent Evaluation.
Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
Define hooks to perform additional tasks such as integration testing.
Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.

🚀 Quick Start with Weni

Installation

Install the package from PyPI:

pip install weni-agenteval

Alternative: Install from source

If you want to install from source for development:

git clone https://github.com/weni-ai/agent-evaluation.git
cd agent-evaluation
pip install -e .

Prerequisites for Weni Target

⚠️ Important: You need both AWS and Weni credentials to run evaluations!

To test Weni agents, you'll need:

AWS Credentials: Required for the evaluator (Claude model via Bedrock)
- AWS Access Key ID
- AWS Secret Access Key
- AWS Session Token
A Weni Project: An active project in the Weni platform
Weni Authentication: Choose one of the following methods:

🚀 Option 1: Weni CLI (Recommended)

Install and authenticate with the Weni CLI:
```
# Install Weni CLI
pip install weni-cli

# Authenticate with Weni
weni login

# Select your project
weni project use [your-project-uuid]
```
Get the Weni CLI from: https://github.com/weni-ai/weni-cli

📋 Option 2: Environment Variables

Set these environment variables manually:
- WENI_PROJECT_UUID: Your project's unique identifier
- WENI_BEARER_TOKEN: Your authentication bearer token
⚙️ Option 3: Configuration File

Provide credentials directly in your test configuration file.

Setting up Environment Variables

💡 Note: If you're using Weni CLI (Option 1 above), you only need to set AWS credentials. The Weni credentials will be handled automatically by the CLI.

macOS/Linux:

# AWS Credentials (required for evaluator)
export AWS_ACCESS_KEY_ID="your-aws-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
export AWS_SESSION_TOKEN="your-aws-session-token"

# Weni Credentials (only needed if NOT using Weni CLI)
export WENI_PROJECT_UUID="your-project-uuid-here"
export WENI_BEARER_TOKEN="your-bearer-token-here"

Windows (Command Prompt):

REM AWS Credentials (required for evaluator)
set AWS_ACCESS_KEY_ID=your-aws-access-key-id
set AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
set AWS_SESSION_TOKEN=your-aws-session-token

REM Weni Credentials (only needed if NOT using Weni CLI)
set WENI_PROJECT_UUID=your-project-uuid-here
set WENI_BEARER_TOKEN=your-bearer-token-here

Windows (PowerShell):

# AWS Credentials (required for evaluator)
$env:AWS_ACCESS_KEY_ID="your-aws-access-key-id"
$env:AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
$env:AWS_SESSION_TOKEN="your-aws-session-token"

# Weni Credentials (only needed if NOT using Weni CLI)
$env:WENI_PROJECT_UUID="your-project-uuid-here"
$env:WENI_BEARER_TOKEN="your-bearer-token-here"

Basic Usage

Create a test configuration file agenteval.yml:

evaluator:
  model: claude-haiku-4_5-global  # or claude-sonnet-4_5-global, claude-haiku-3_5-us
  aws_region: us-east-1

target:
  type: weni

tests:
  greeting:
    steps:
      - Send a greeting "Olá, bom dia!"
      - Ask what "com oq vc pode me ajudar?"
    expected_results:
      - Agent responds with a friendly greeting
      - Agent shows up a menu with options to help the user

  purchase_outside_postal_code:
    steps:
      - Ask information "quero comprar arroz"
      - Give the postal code "04538-132"
    expected_results:
      - Agent responds asking for postal code
      - Agent says it doesn't deliver to this postal code

Run the evaluation:

weni-agenteval run

Note: The tool automatically looks for agenteval.yml in the current directory. You can also specify a different directory with --plan-dir if needed.

Additional CLI options:

# Run with verbose output
weni-agenteval run --verbose

# Run specific tests only
weni-agenteval run --filter "greeting,purchase_outside_postal_code"

# Run from a different directory
weni-agenteval run --plan-dir /path/to/test/directory

# Run in watch mode for real-time conversation monitoring
weni-agenteval run --watch

# Combine watch mode with other options
weni-agenteval run --watch --filter "greeting" --verbose

# Initialize a new test plan template
weni-agenteval init

🔍 Watch Mode

For real-time monitoring of your tests, use the --watch flag to see conversations as they happen:

weni-agenteval run --watch

Watch mode provides:

Real-time conversation display: See user messages and agent responses as they occur
Immediate feedback: User prompts appear instantly when sent to the agent
Visual test results: Clear ✅ PASS / ❌ FAIL indicators for each test
Sequential execution: Tests run one at a time for readable output
Progress tracking: Shows current test progress and overall completion

Perfect for development, debugging, and demonstrations!

Configuration Options for Weni Target

Parameter	Type	Required	Default	Description
`type`	string	Yes	-	Must be `"weni"`
`timeout`	integer	No	`30`	Max seconds to wait for response
`weni_project_uuid`	string	No	-	Project UUID (use Weni CLI or env var instead)
`weni_bearer_token`	string	No	-	Bearer token (use Weni CLI or env var instead)

Advanced Example

Here's a more comprehensive test plan:

evaluator:
  model: claude-haiku-4_5-global  # Recommended for fast, cost-effective evaluations
  aws_region: us-east-1

target:
  type: weni
  timeout: 45

tests:
  # Basic greeting test
  greeting:
    steps:
      - Send a greeting "Olá, como você está?"
    expected_results:
      - Agent responds politely
      - Agent asks how it can help

  # Multi-turn conversation
  product_inquiry:
    steps:
      - Ask "Quais produtos vocês têm?"
      - Follow up with "Qual é o preço do arroz?"
      - Ask "Vocês entregam em São Paulo?"
    expected_results:
      - Agent provides product information
      - Agent gives pricing details
      - Agent confirms delivery area

  # Error handling
  unclear_input:
    steps:
      - Send unclear text "xyz123 !!!"
    expected_results:
      - Agent handles gracefully
      - Agent asks for clarification
      - No error messages shown to user

  # Context maintenance
  context_test:
    steps:
      - Say "Quero comprar feijão"
      - Ask "Qual o prazo de entrega?"
      - Ask "E o frete?"
    expected_results:
      - Agent remembers the product context
      - Agent provides delivery timeframe
      - Agent gives shipping cost information

📚 Documentation

📖 Full Documentation: Visit our comprehensive documentation at https://weni-ai.github.io/weni-agent-evaluation/

The documentation includes:

Complete installation guide with authentication setup
Step-by-step user guide with Weni examples
Detailed Weni target configuration
CLI reference and troubleshooting guides
CI/CD integration examples

For the original AWS Labs framework features, you can also visit https://awslabs.github.io/agent-evaluation/.

To contribute, please refer to CONTRIBUTING.md

🔧 Troubleshooting Weni Target

Common Issues

AWS Authentication Errors

Verify your AWS environment variables are set correctly (ACCESS_KEY_ID, SECRET_ACCESS_KEY, SESSION_TOKEN)
Ensure you have access to Amazon Bedrock in your specified region
Check that your AWS credentials have the necessary Bedrock permissions
Verify the aws_region in your configuration matches your AWS account's region access

Weni Authentication Errors

Using Weni CLI (Recommended): Run weni login to re-authenticate, then weni project use [project-uuid] to select your project
Using Environment Variables: Verify your WENI_BEARER_TOKEN is valid and not expired
Check that the WENI_PROJECT_UUID matches your actual project
Ensure the bearer token has the necessary permissions for the project
Get Weni CLI at: https://github.com/weni-ai/weni-cli

Connection Issues

Verify the Weni API endpoints are accessible from your network
Check for any firewall or proxy settings blocking HTTPS/WSS connections
Ensure your internet connection is stable

Timeout Errors

Increase the timeout parameter if your agent requires more processing time
Check if the agent is properly configured and active in the Weni platform
Verify the agent is not stuck in a processing loop

WebSocket Connection Failures

Ensure the websocket-client package is properly installed
Check for any proxy configurations that might interfere with WebSocket connections
Verify the WebSocket endpoint URL is correct for your project

🆚 Differences from Original

This fork maintains full compatibility with the original AWS Labs Agent Evaluation framework while adding:

Weni Target Support: Native integration with Weni conversational AI platform
WebSocket Communication: Real-time bidirectional communication with Weni agents
Session Isolation: Each test case uses unique contact identifiers for proper conversation isolation

All original AWS targets (Bedrock, Q Business, SageMaker, etc.) continue to work exactly as in the original repository.

🤝 Contributing

We welcome contributions! This fork follows the same contribution guidelines as the original project. Please see CONTRIBUTING.md for details.

For Weni-specific contributions:

Test your changes with actual Weni agents
Update the Weni target documentation if needed
Ensure backward compatibility with existing configurations

👏 Contributors

Shout out to these awesome contributors:

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.11

Nov 29, 2025

1.0.10

Nov 29, 2025

1.0.9

Nov 29, 2025

1.0.8

Nov 29, 2025

1.0.7

Sep 12, 2025

1.0.6

Sep 12, 2025

1.0.5

Sep 12, 2025

1.0.4

Sep 8, 2025

1.0.3

Sep 5, 2025

1.0.2

Sep 5, 2025

1.0.1

Sep 5, 2025

1.0.0

Sep 5, 2025

0.4.1

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weni_agenteval-1.0.11.tar.gz (44.0 kB view details)

Uploaded Nov 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

weni_agenteval-1.0.11-py3-none-any.whl (58.5 kB view details)

Uploaded Nov 29, 2025 Python 3

File details

Details for the file weni_agenteval-1.0.11.tar.gz.

File metadata

Download URL: weni_agenteval-1.0.11.tar.gz
Upload date: Nov 29, 2025
Size: 44.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for weni_agenteval-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`e51c24f88f9477b28009d181c6b3b4e2b1cfe4a1f110702de0875e7d56b9a397`
MD5	`e629d705c3063456ce03f375e584034c`
BLAKE2b-256	`fe07505ade0219d857dded94668be2f0eae538d39334bd7c2f100a8d3b678b8f`

See more details on using hashes here.

File details

Details for the file weni_agenteval-1.0.11-py3-none-any.whl.

File metadata

Download URL: weni_agenteval-1.0.11-py3-none-any.whl
Upload date: Nov 29, 2025
Size: 58.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for weni_agenteval-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2322cba160cf67f940cb80d443f5906d3abaa10a5c95fb8e7315f4106f152dd9`
MD5	`6bf8d66c5223908027d62fb9b24c25aa`
BLAKE2b-256	`0904e691564718c526f5f469b34b86a9f2db531c02d520bbeaa176eba2a2a03a`

See more details on using hashes here.

weni-agenteval 1.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Evaluation - Weni Fork

✨ Key features

🚀 Quick Start with Weni

Installation

Prerequisites for Weni Target

Setting up Environment Variables

Basic Usage

🔍 Watch Mode

Configuration Options for Weni Target

Advanced Example

📚 Documentation

🔧 Troubleshooting Weni Target

Common Issues

🆚 Differences from Original

🤝 Contributing

👏 Contributors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes