Skip to main content

AgentSprint TestKit - Universal AI agent benchmarking and testing framework

Project description

AgentSprint TestKit (ASTK) ๐Ÿš€

Universal AI agent benchmarking and testing framework

ASTK is a comprehensive testing framework for AI agents that evaluates performance, intelligence, and capabilities through diverse scenarios. Test your agents against real-world tasks like file analysis, code comprehension, and complex reasoning.

๐ŸŽฏ Features

  • ๐Ÿง  Intelligent Benchmarks: 8 diverse scenarios testing different AI capabilities
  • ๐Ÿ“Š Performance Metrics: Response time, success rate, and quality analysis
  • ๐Ÿ”ง Easy Installation: Simple pip install from PyPI
  • ๐ŸŒ Universal Testing: Works with CLI agents, REST APIs, Python modules, and more
  • ๐Ÿค– Agent Ready: Compatible with LangChain, OpenAI, and custom agents
  • ๐Ÿ“ Built-in Examples: File Q&A agent and project templates
  • โš™๏ธ GitHub Actions: Ready-to-use CI/CD workflow templates

๐Ÿ“‹ Quick Start

1. Install from PyPI

pip install agent-sprint-testkit

2. Verify Installation

astk --help

3. Set API Key

export OPENAI_API_KEY="your-api-key-here"

4. Initialize a Project

astk init my-agent-tests
cd my-agent-tests

5. Run Your First Benchmark

# Benchmark an example agent
astk benchmark examples/agents/file_qa_agent.py

# Or run directly from your project
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

๐Ÿš€ Installation Options

Option 1: Global Installation (Recommended)

pip install agent-sprint-testkit
astk --version

Option 2: Development Setup

# Clone repository
git clone https://github.com/your-org/astk.git
cd astk

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e .

๐Ÿ’ป CLI Commands

Core Commands

# Initialize new project with templates
astk init <project-name>

# Run intelligent benchmarks
astk benchmark <agent-path>

# Generate detailed reports
astk report <results-dir>

# Show example usage
astk examples

Legacy Script Commands (still supported)

# Run intelligent benchmark
python scripts/simple_benchmark.py <agent-path>

# Quick agent runner
python scripts/simple_run.py <agent-path>

๐Ÿค– Available Agents

File Q&A Agent (examples/agents/file_qa_agent.py)

A LangChain-powered agent that can:

  • ๐Ÿ“ List files in directories
  • ๐Ÿ“– Read file contents and summarize
  • ๐Ÿ” Answer questions about file data
  • ๐Ÿงญ Navigate directory structures

Example Usage:

# Direct agent usage
python examples/agents/file_qa_agent.py "What Python files are in this project?"

# Run with simple runner
python scripts/simple_run.py examples/agents/file_qa_agent.py

# Run intelligent benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

๐Ÿงช Benchmark Scenarios

The intelligent benchmark tests 8 diverse scenarios:

Scenario Test Capability
๐Ÿ“ File Discovery Find Python files and entry points File system navigation
โš™๏ธ Config Analysis Analyze configuration files Technical comprehension
๐Ÿ“– README Comprehension Read and explain project Document analysis
๐Ÿ—๏ธ Code Structure Analyze directory structure Architecture understanding
๐Ÿ“š Documentation Search Explore documentation Information retrieval
๐Ÿ”— Dependency Analysis Analyze requirements/dependencies Technical analysis
๐Ÿ’ก Example Exploration Discover example code Code comprehension
๐Ÿงช Test Discovery Find testing framework Development understanding

๐Ÿ“Š Results & Metrics

Benchmarks generate comprehensive results:

{
  "success_rate": 1.0,
  "total_duration_seconds": 25.4,
  "average_scenario_duration": 3.2,
  "average_response_length": 847,
  "scenarios": [...]
}

Metrics Include:

  • โœ… Success Rate: Percentage of completed scenarios
  • โฑ๏ธ Response Time: Duration for each scenario
  • ๐Ÿ“ Response Quality: Length and content analysis
  • ๐ŸŽฏ Scenario Details: Individual query results

๐Ÿ› ๏ธ Available Tools

๐Ÿš€ ASTK CLI (Primary Interface)

# Initialize project with templates
astk init my-project

# Run intelligent benchmarks
astk benchmark <agent-path>

# Generate HTML/JSON reports
astk report <results-dir>

# View usage examples
astk examples

๐Ÿงช Legacy Script Runners (Still Supported)

# Direct benchmark execution
python scripts/simple_benchmark.py <agent-path>

# Basic agent runner
python scripts/simple_run.py <agent-path>

๐Ÿ—๏ธ Project Structure

ASTK/
โ”œโ”€โ”€ ๐Ÿค– examples/agents/          # Example AI agents
โ”‚   โ””โ”€โ”€ file_qa_agent.py         # LangChain File Q&A agent
โ”œโ”€โ”€ ๐Ÿ“Š scripts/                  # Benchmark and utility scripts
โ”‚   โ”œโ”€โ”€ simple_benchmark.py      # Intelligent benchmark runner โญ
โ”‚   โ”œโ”€โ”€ simple_run.py            # Basic agent runner
โ”‚   โ””โ”€โ”€ astk.py                  # Advanced CLI (WIP)
โ”œโ”€โ”€ ๐Ÿง  astk/                     # Core ASTK framework
โ”‚   โ”œโ”€โ”€ benchmarks/              # Benchmark modules
โ”‚   โ”œโ”€โ”€ cli.py                   # Command-line interface
โ”‚   โ””โ”€โ”€ *.py                     # Core modules
โ”œโ”€โ”€ ๐Ÿ“ benchmark_results/        # Generated benchmark results
โ”œโ”€โ”€ โš™๏ธ config/                   # Configuration files
โ””โ”€โ”€ ๐Ÿ“– docs/                     # Documentation

๐ŸŽฎ Usage Examples

Run Agent Directly

python examples/agents/file_qa_agent.py "Analyze the setup.py file"

Quick Agent Test

python scripts/simple_run.py examples/agents/file_qa_agent.py

Full Intelligence Benchmark

python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

Custom Queries

python examples/agents/file_qa_agent.py "What is the purpose of the astk directory?"

๐Ÿ”ง Troubleshooting

Common Issues

๐Ÿ“ฆ Installation Problems

# Update pip and reinstall
pip install --upgrade pip
pip install --upgrade agent-sprint-testkit

# Verify installation
astk --version
which astk

๐Ÿ”‘ OpenAI API Issues

# Verify API key is set
echo $OPENAI_API_KEY

# Set API key
export OPENAI_API_KEY="sk-..."

๐Ÿ Development Environment Issues

# For development setup
git clone https://github.com/your-org/astk.git
cd astk
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .

๐Ÿค– Agent Compatibility

The framework supports multiple agent types:

  • CLI agents: Accept queries as command-line arguments
  • Python modules: Have a chat() method
  • REST APIs: Expose /chat endpoint
  • Custom formats: Use adapter patterns as needed

๐Ÿš€ Creating Your Own Agent

Create a new agent that responds to command-line arguments:

#!/usr/bin/env python3
import sys

async def main():
    if len(sys.argv) > 1:
        query = " ".join(sys.argv[1:])
        # Process query and return response
        print(f"Agent: {response}")
    else:
        # Default behavior
        print("Agent: Ready!")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Then benchmark it:

python scripts/simple_benchmark.py path/to/your_agent.py

๐Ÿ“ˆ Performance Tips

  • โšก Faster Responses: Use GPT-3.5-turbo for speed
  • ๐Ÿง  Better Intelligence: Use GPT-4 for complex reasoning
  • ๐Ÿ’ฐ Cost Optimization: Monitor token usage in results
  • ๐Ÿ”ง Custom Scenarios: Modify scripts/simple_benchmark.py for specific tests

๐Ÿค Contributing

  1. Create new agents in examples/agents/
  2. Add benchmark scenarios in scripts/simple_benchmark.py
  3. Test with: python scripts/simple_benchmark.py your_agent.py

๐Ÿ“„ License

Apache 2.0 License - See LICENSE file for details.


๐ŸŽฏ Ready to benchmark your AI agents? Start with:

# Install globally
pip install agent-sprint-testkit

# Run your first benchmark
astk benchmark examples/agents/file_qa_agent.py

# Or use the legacy script
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

๐Ÿš€ Get started in 3 commands:

pip install agent-sprint-testkit
astk init my-tests
astk examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sprint_testkit-0.1.1.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_sprint_testkit-0.1.1-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_sprint_testkit-0.1.1.tar.gz.

File metadata

  • Download URL: agent_sprint_testkit-0.1.1.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8d66ac895e7d6bfd74d0ded2b3f549cb5a2adc8e79102215ac85866bd8f3b608
MD5 94fd9c883fd1114417f40679e7d20c9c
BLAKE2b-256 5acd476a6e5c96d248d1d17ba3acbbd80c3fe41b33dd5330f4cb7e3f93c3297c

See more details on using hashes here.

File details

Details for the file agent_sprint_testkit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_sprint_testkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c2d781606f9b6e974aec940435f97bae078cbfe0e66b9da6511bcf73f8fffaa8
MD5 1eba5c6354f46c39eb00ec4193af8369
BLAKE2b-256 ad8e4bda4fb98c2d7312cc150b0beeea2cffdb13e6e0f02ad4b9c5bc9c740fa0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page