A pytest-like testing framework for AI agents and prompts

These details have not been verified by PyPI

Project links

Project description

AgentTest 🧪

A pytest-like testing framework for AI agents and prompts

AgentTest is a comprehensive testing framework designed specifically for AI agents, providing evaluation, logging, and regression tracking capabilities with a pytest-like interface.

🚀 Key Features

🤖 Intelligent Auto Test Generation: Automatically analyze your code and generate comprehensive test cases with proper imports and function calls
🧪 Pytest-like Interface: Familiar CLI and decorator-based testing
🧠 Smart Code Analysis: Understands project structure, classes, functions, and generates realistic test data
📊 Multiple Evaluation Engines: String similarity, LLM-as-judge, regex, and more
🔄 Git-Aware Logging: Track test results with commit information for regression analysis
🔗 Framework Agnostic: Works with LangChain, LlamaIndex, OpenAI, Anthropic, and custom agents
📈 Regression Tracking: Compare test results across commits and branches
⚡ CI/CD Integration: Built for continuous integration workflows

📦 Installation

# Basic installation
pip install agenttest

🏁 Quick Start

1. Initialize a New Project

# Initialize in current directory
agenttest init

# Or use a framework template
agenttest init --template langchain
agenttest init --template llamaindex

2. Write Your First Test

Create tests/test_my_agent.py:

from agenttest import agent_test

def simple_agent(input_text: str) -> str:
    """A simple agent that echoes the input with a prefix."""
    return f"Agent response: {input_text}"

@agent_test(criteria=["similarity", "llm_judge"])
def test_simple_agent():
    """Test the simple agent with basic input."""
    input_text = "Hello, world!"
    expected = "Agent response: Hello, world!"

    actual = simple_agent(input_text)

    return {
        "input": input_text,
        "expected": expected,
        "actual": actual
    }

@agent_test(criteria=["llm_judge"], tags=["edge_case"])
def test_empty_input():
    """Test agent with empty input."""
    actual = simple_agent("")

    return {
        "input": "",
        "actual": actual,
        "evaluation_criteria": {
            "robustness": "Agent should handle empty input gracefully"
        }
    }

3. Run Tests

# Run all tests
agenttest run

# Run with verbose output
agenttest run --verbose

# Run specific tests by tag
agenttest run --tag edge_case

# Run in CI mode (exit with error on failures)
agenttest run --ci

4. Generate Tests Automatically ✨

AgentTest can automatically analyze your code and generate comprehensive test cases:

# Auto-generate tests for a specific file
agenttest generate examples/agents_sample.py --count 5

# Generate tests with specific format
agenttest generate examples/agents_sample.py --format python --count 3

# Generate tests for multiple files
agenttest generate examples/*.py --count 2

# Save generated tests to a file
agenttest generate agents/my_agent.py --output tests/generated_tests.py

What makes it intelligent?

🔍 Analyzes project structure to generate correct imports
🎯 Understands functions and classes to create proper test calls
📝 Generates realistic test data based on parameter names and types
🧪 Creates multiple test scenarios (basic, edge cases, error handling)
🏗️ Handles class instantiation automatically for method testing

Example generated test:

@agent_test(
    criteria=["execution", "output_type", "functionality"],
    tags=["basic", "function"]
)
def test_handle_customer_query_basic():
    """Test basic functionality of handle_customer_query"""
    input_data = {
        "query": "test query",
        "customer_type": "premium",
        "urgency": "high"
    }

    # Automatically generated function call
    actual = handle_customer_query(**input_data)

    return {
        "input": input_data,
        "actual": actual,
        "evaluation_criteria": {
            "execution": "Function should execute without errors",
            "output_type": "Should return appropriate type"
        }
    }

5. View Test History

# Show recent test runs
agenttest log

# Show runs for specific commit
agenttest log --commit abc123

# Compare results between commits
agenttest compare HEAD~1 HEAD

📖 Detailed Usage

Test Decorators

The @agent_test decorator supports various options:

@agent_test(
    criteria=["similarity", "llm_judge", "regex"],  # Evaluation methods
    tags=["integration", "slow"],                   # Test categorization
    timeout=30,                                     # Test timeout in seconds
    retry_count=2                                   # Retries on failure
)
def test_complex_agent():
    # Your test logic here
    pass

Evaluation Criteria

AgentTest supports multiple built-in evaluators:

String Similarity

@agent_test(criteria=["similarity"])
def test_with_similarity():
    return {
        "input": "What is AI?",
        "expected": "Artificial Intelligence is...",
        "actual": agent_response
    }

LLM-as-Judge

@agent_test(criteria=["llm_judge"])
def test_with_llm_judge():
    return {
        "input": "Summarize this article",
        "actual": agent_response,
        "evaluation_criteria": {
            "accuracy": "Summary should capture key points",
            "conciseness": "Should be 2-3 sentences"
        }
    }

Custom Evaluators

@agent_test(criteria=["regex"])
def test_with_regex():
    return {
        "input": "Generate a phone number",
        "actual": agent_response,
        "pattern": r"\d{3}-\d{3}-\d{4}"  # Phone number pattern
    }

Configuration

Edit .agenttest/config.yaml to customize your setup:

version: '1.0'
project_name: 'My AI Project'

llm:
  provider: 'openai' # or "anthropic" or "gemini"
  model: 'gpt-3.5-turbo' # or 'claude-3-sonnet-20240229' or 'gemini-pro'
  temperature: 0.0

evaluators:
  - name: 'similarity'
    type: 'string_similarity'
    config:
      method: 'cosine'
      threshold: 0.8

  - name: 'llm_judge'
    type: 'llm_as_judge'
    config:
      criteria: ['accuracy', 'relevance']

testing:
  test_dirs: ['tests']
  test_patterns: ['test_*.py', '*_test.py']
  parallel: false
  timeout: 300

logging:
  level: 'INFO'
  git_aware: true
  results_dir: '.agenttest/results'

Framework Integration

LangChain Example

from agenttest import agent_test
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# Your LangChain agent
def create_summarizer():
    llm = OpenAI(temperature=0)
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Summarize: {text}"
    )
    return LLMChain(llm=llm, prompt=prompt)

@agent_test(criteria=["llm_judge"])
def test_langchain_summarizer():
    chain = create_summarizer()
    text = "Long article content..."
    result = chain.run(text=text)

    return {
        "input": text,
        "actual": result,
        "evaluation_criteria": {
            "conciseness": "Summary should be brief",
            "accuracy": "Should capture main points"
        }
    }

Custom Agent Example

import openai
from agenttest import agent_test

class CustomAgent:
    def __init__(self):
        self.client = openai.OpenAI()

    def process(self, query: str) -> str:
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": query}]
        )
        return response.choices[0].message.content

# Gemini Agent Example
import google.generativeai as genai

class GeminiAgent:
    def __init__(self):
        genai.configure(api_key="your-google-api-key")
        self.model = genai.GenerativeModel('gemini-pro')

    def process(self, query: str) -> str:
        response = self.model.generate_content(query)
        return response.text

@agent_test(criteria=["similarity", "llm_judge"])
def test_custom_agent():
    agent = CustomAgent()
    query = "What is the capital of France?"
    result = agent.process(query)

    return {
        "input": query,
        "actual": result,
        "expected": "Paris",
        "evaluation_criteria": {
            "factuality": "Answer should be factually correct"
        }
    }

@agent_test(criteria=["llm_judge"])
def test_gemini_agent():
    agent = GeminiAgent()
    query = "Explain photosynthesis in simple terms"
    result = agent.process(query)

    return {
        "input": query,
        "actual": result,
        "evaluation_criteria": {
            "clarity": "Explanation should be clear and simple",
            "accuracy": "Scientific information should be accurate",
            "completeness": "Should cover the main aspects of photosynthesis"
        }
    }

🔧 Advanced Features

Git Integration

AgentTest automatically tracks git information with each test run:

Commit hash and branch
Changed files
Author and timestamp
Test result history

# View test history
agenttest log --limit 20

# Compare between branches
agenttest compare main feature/new-model

# Compare specific commits
agenttest compare abc123 def456

CI/CD Integration

GitHub Actions Example

name: AgentTest CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: Install dependencies
        run: |
          pip install agenttest[all]
          pip install -r requirements.txt

      - name: Run AgentTest
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
        run: |
          agenttest run --ci --verbose

      - name: Upload test results
        uses: actions/upload-artifact@v2
        with:
          name: test-results
          path: .agenttest/results/

Test Generation

AgentTest can automatically generate test cases by analyzing your agent code:

# Generate tests for all discovered agents
agenttest generate

# Generate for specific agent with custom count
agenttest generate --agent agents/my_agent.py --count 10

# Generate in different formats
agenttest generate --agent agents/my_agent.py --format yaml
agenttest generate --agent agents/my_agent.py --format json

📚 Documentation

AgentTest includes comprehensive documentation built with MkDocs and hosted on GitHub Pages.

🌐 Online Documentation

Visit the full documentation at: https://your-username.github.io/your-repo-name/

🏠 Local Documentation

You can also run the documentation locally:

# Install documentation dependencies
pip install -e ".[docs]"

# Serve documentation locally
mkdocs serve
# Or use the helper script
./scripts/docs.sh serve

The documentation includes:

Installation & Setup: Complete installation guide
Quick Start: Get started in 5 minutes
Auto Test Generation: Comprehensive guide to intelligent test generation
User Guide: Configuration, writing tests, CLI commands
Evaluators: Detailed guide for all evaluation methods
Examples: Practical examples and tutorials
API Reference: Complete API documentation
Git Integration: Advanced git-aware features

📝 Documentation Development

To contribute to documentation:

# Build documentation
./scripts/docs.sh build

# Build with strict mode (fail on warnings)
./scripts/docs.sh build-strict

# Deploy to GitHub Pages
./scripts/docs.sh deploy

See README_DOCS.md for detailed documentation setup instructions.

📚 API Reference

Core Functions

@agent_test(): Decorator to mark test functions
run_test(): Utility to run individual tests programmatically

CLI Commands

agenttest init: Initialize new project
agenttest run: Run tests
agenttest generate: Generate test cases
agenttest log: View test history
agenttest compare: Compare test results

Configuration Classes

Config: Main configuration management
LLMConfig: LLM provider settings
EvaluatorConfig: Evaluator configurations

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone the repository
git clone https://github.com/Nihal-Srivastava05/agent-test
cd agenttest

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
black agenttest/
isort agenttest/
flake8 agenttest/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by pytest's excellent design
Built for the AI agent development community
Special thanks to all contributors

🆘 Support

AgentTest - Making AI agent testing as easy as pytest 🧪✨

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenttest-0.1.0.tar.gz (118.7 kB view details)

Uploaded Jun 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agenttest-0.1.0-py3-none-any.whl (55.8 kB view details)

Uploaded Jun 27, 2025 Python 3

File details

Details for the file agenttest-0.1.0.tar.gz.

File metadata

Download URL: agenttest-0.1.0.tar.gz
Upload date: Jun 27, 2025
Size: 118.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for agenttest-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3487d77193673a155edffaaa7b51b1729221d8a55d73cb6e97535e648b272d07`
MD5	`234ea4c879f8fcc18459e034d0c2d8ea`
BLAKE2b-256	`64962e8f029cbdabb7633bb679d1026e74315d24e63dd63359062b23eaa63f1e`

See more details on using hashes here.

File details

Details for the file agenttest-0.1.0-py3-none-any.whl.

File metadata

Download URL: agenttest-0.1.0-py3-none-any.whl
Upload date: Jun 27, 2025
Size: 55.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for agenttest-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ccd22ac09beccbd0ada1d18c4790d889bcf6ca84722829c6f5a9b114ddee8ce`
MD5	`f16ec0ef36f961c5cac03b99b9276933`
BLAKE2b-256	`85eb0a97407999fbb36599a60f92097e9f51133b86f0dbb5b6932f92ad244d08`

See more details on using hashes here.

agenttest 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentTest 🧪

🚀 Key Features

📦 Installation

🏁 Quick Start

1. Initialize a New Project

2. Write Your First Test

3. Run Tests

4. Generate Tests Automatically ✨

5. View Test History

📖 Detailed Usage

Test Decorators

Evaluation Criteria

String Similarity

LLM-as-Judge

Custom Evaluators

Configuration

Framework Integration

LangChain Example

Custom Agent Example

🔧 Advanced Features

Git Integration

CI/CD Integration

GitHub Actions Example

Test Generation

📚 Documentation

🌐 Online Documentation

🏠 Local Documentation

📝 Documentation Development

📚 API Reference

Core Functions

CLI Commands

Configuration Classes

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

🆘 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes