AgentSprint TestKit - Universal AI agent benchmarking and testing framework
Project description
AgentSprint TestKit (ASTK) ๐
Universal AI agent benchmarking and testing framework
ASTK is a comprehensive testing framework for AI agents that evaluates performance, intelligence, and capabilities through diverse scenarios. Test your agents against real-world tasks like file analysis, code comprehension, and complex reasoning.
๐ฏ Features
- ๐ง Intelligent Benchmarks: 8 diverse scenarios testing different AI capabilities
- ๐ Performance Metrics: Response time, success rate, and quality analysis
- ๐ง Easy Installation: Simple pip install from PyPI
- ๐ Universal Testing: Works with CLI agents, REST APIs, Python modules, and more
- ๐ค Agent Ready: Compatible with LangChain, OpenAI, and custom agents
- ๐ Built-in Examples: File Q&A agent and project templates
- โ๏ธ GitHub Actions: Ready-to-use CI/CD workflow templates
๐ Quick Start
1. Install from PyPI
pip install agent-sprint-testkit
2. Verify Installation
astk --help
3. Set API Key
export OPENAI_API_KEY="your-api-key-here"
4. Initialize a Project
astk init my-agent-tests
cd my-agent-tests
5. Run Your First Benchmark
# Benchmark an example agent
astk benchmark examples/agents/file_qa_agent.py
# Or run directly from your project
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
๐ Installation Options
Option 1: Global Installation (Recommended)
pip install agent-sprint-testkit
astk --version
Option 2: Development Setup
# Clone repository
git clone https://github.com/your-org/astk.git
cd astk
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install -e .
๐ป CLI Commands
Core Commands
# Initialize new project with templates
astk init <project-name>
# Run intelligent benchmarks
astk benchmark <agent-path>
# Generate detailed reports
astk report <results-dir>
# Show example usage
astk examples
Legacy Script Commands (still supported)
# Run intelligent benchmark
python scripts/simple_benchmark.py <agent-path>
# Quick agent runner
python scripts/simple_run.py <agent-path>
๐ค Available Agents
File Q&A Agent (examples/agents/file_qa_agent.py)
A LangChain-powered agent that can:
- ๐ List files in directories
- ๐ Read file contents and summarize
- ๐ Answer questions about file data
- ๐งญ Navigate directory structures
Example Usage:
# Direct agent usage
python examples/agents/file_qa_agent.py "What Python files are in this project?"
# Run with simple runner
python scripts/simple_run.py examples/agents/file_qa_agent.py
# Run intelligent benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
๐งช Benchmark Scenarios
The intelligent benchmark tests 8 diverse scenarios:
| Scenario | Test | Capability |
|---|---|---|
| ๐ File Discovery | Find Python files and entry points | File system navigation |
| โ๏ธ Config Analysis | Analyze configuration files | Technical comprehension |
| ๐ README Comprehension | Read and explain project | Document analysis |
| ๐๏ธ Code Structure | Analyze directory structure | Architecture understanding |
| ๐ Documentation Search | Explore documentation | Information retrieval |
| ๐ Dependency Analysis | Analyze requirements/dependencies | Technical analysis |
| ๐ก Example Exploration | Discover example code | Code comprehension |
| ๐งช Test Discovery | Find testing framework | Development understanding |
๐ Results & Metrics
Benchmarks generate comprehensive results:
{
"success_rate": 1.0,
"total_duration_seconds": 25.4,
"average_scenario_duration": 3.2,
"average_response_length": 847,
"scenarios": [...]
}
Metrics Include:
- โ Success Rate: Percentage of completed scenarios
- โฑ๏ธ Response Time: Duration for each scenario
- ๐ Response Quality: Length and content analysis
- ๐ฏ Scenario Details: Individual query results
๐ ๏ธ Available Tools
๐ ASTK CLI (Primary Interface)
# Initialize project with templates
astk init my-project
# Run intelligent benchmarks
astk benchmark <agent-path>
# Generate HTML/JSON reports
astk report <results-dir>
# View usage examples
astk examples
๐งช Legacy Script Runners (Still Supported)
# Direct benchmark execution
python scripts/simple_benchmark.py <agent-path>
# Basic agent runner
python scripts/simple_run.py <agent-path>
๐๏ธ Project Structure
ASTK/
โโโ ๐ค examples/agents/ # Example AI agents
โ โโโ file_qa_agent.py # LangChain File Q&A agent
โโโ ๐ scripts/ # Benchmark and utility scripts
โ โโโ simple_benchmark.py # Intelligent benchmark runner โญ
โ โโโ simple_run.py # Basic agent runner
โ โโโ astk.py # Advanced CLI (WIP)
โโโ ๐ง astk/ # Core ASTK framework
โ โโโ benchmarks/ # Benchmark modules
โ โโโ cli.py # Command-line interface
โ โโโ *.py # Core modules
โโโ ๐ benchmark_results/ # Generated benchmark results
โโโ โ๏ธ config/ # Configuration files
โโโ ๐ docs/ # Documentation
๐ฎ Usage Examples
Run Agent Directly
python examples/agents/file_qa_agent.py "Analyze the setup.py file"
Quick Agent Test
python scripts/simple_run.py examples/agents/file_qa_agent.py
Full Intelligence Benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
Custom Queries
python examples/agents/file_qa_agent.py "What is the purpose of the astk directory?"
๐ง Troubleshooting
Common Issues
๐ฆ Installation Problems
# Update pip and reinstall
pip install --upgrade pip
pip install --upgrade agent-sprint-testkit
# Verify installation
astk --version
which astk
๐ OpenAI API Issues
# Verify API key is set
echo $OPENAI_API_KEY
# Set API key
export OPENAI_API_KEY="sk-..."
๐ Development Environment Issues
# For development setup
git clone https://github.com/your-org/astk.git
cd astk
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .
๐ค Agent Compatibility
The framework supports multiple agent types:
- CLI agents: Accept queries as command-line arguments
- Python modules: Have a
chat()method - REST APIs: Expose
/chatendpoint - Custom formats: Use adapter patterns as needed
๐ Creating Your Own Agent
Create a new agent that responds to command-line arguments:
#!/usr/bin/env python3
import sys
async def main():
if len(sys.argv) > 1:
query = " ".join(sys.argv[1:])
# Process query and return response
print(f"Agent: {response}")
else:
# Default behavior
print("Agent: Ready!")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Then benchmark it:
python scripts/simple_benchmark.py path/to/your_agent.py
๐ Performance Tips
- โก Faster Responses: Use GPT-3.5-turbo for speed
- ๐ง Better Intelligence: Use GPT-4 for complex reasoning
- ๐ฐ Cost Optimization: Monitor token usage in results
- ๐ง Custom Scenarios: Modify
scripts/simple_benchmark.pyfor specific tests
๐ค Contributing
- Create new agents in
examples/agents/ - Add benchmark scenarios in
scripts/simple_benchmark.py - Test with:
python scripts/simple_benchmark.py your_agent.py
๐ License
Apache 2.0 License - See LICENSE file for details.
๐ฏ Ready to benchmark your AI agents? Start with:
# Install globally
pip install agent-sprint-testkit
# Run your first benchmark
astk benchmark examples/agents/file_qa_agent.py
# Or use the legacy script
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
๐ Get started in 3 commands:
pip install agent-sprint-testkit
astk init my-tests
astk examples
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_sprint_testkit-0.1.1.tar.gz.
File metadata
- Download URL: agent_sprint_testkit-0.1.1.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d66ac895e7d6bfd74d0ded2b3f549cb5a2adc8e79102215ac85866bd8f3b608
|
|
| MD5 |
94fd9c883fd1114417f40679e7d20c9c
|
|
| BLAKE2b-256 |
5acd476a6e5c96d248d1d17ba3acbbd80c3fe41b33dd5330f4cb7e3f93c3297c
|
File details
Details for the file agent_sprint_testkit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: agent_sprint_testkit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2d781606f9b6e974aec940435f97bae078cbfe0e66b9da6511bcf73f8fffaa8
|
|
| MD5 |
1eba5c6354f46c39eb00ec4193af8369
|
|
| BLAKE2b-256 |
ad8e4bda4fb98c2d7312cc150b0beeea2cffdb13e6e0f02ad4b9c5bc9c740fa0
|