Skip to main content

A generative AI-powered framework for testing virtual agents.

Project description

PyPI - Version PyPI - Python Version GitHub License security: bandit Code style: black Built with Material for MkDocs

Agent Evaluation

Agent Evaluation is a generative AI-powered framework for testing virtual agents.

Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

✨ Key features

  • Built-in support for popular AWS services including Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. You can also bring your own agent to test using Agent Evaluation.
  • Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
  • Define hooks to perform additional tasks such as integration testing.
  • Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.

📚 Documentation

To get started, please visit the full documentation here. To contribute, please refer to CONTRIBUTING.md

👏 Contributors

Shout out to these awesome contributors:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_evaluation-0.4.1.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_evaluation-0.4.1-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_evaluation-0.4.1.tar.gz.

File metadata

  • Download URL: agent_evaluation-0.4.1.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for agent_evaluation-0.4.1.tar.gz
Algorithm Hash digest
SHA256 279412bd2074540818e7785c1969b17770bf71c67760c2dc51bcfac6d699fe5b
MD5 d321ddd444482c37882bce49c595bbe6
BLAKE2b-256 09975b31e0125b8dbc86cca20e4d668ce67713728900312beea04acdb652a234

See more details on using hashes here.

File details

Details for the file agent_evaluation-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_evaluation-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62665e60358f62637876821565a7c19254953860cbf052b4ada29840675ea303
MD5 d3162ab491d23a22308824040df03cf1
BLAKE2b-256 5d8178aa63d8c9982cb2a99470fb26f3a3a7b81334306591169f3f6425cf7378

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page