AgentSprint TestKit - Universal AI agent benchmarking and testing framework
Project description
AgentSprint TestKit (ASTK) ๐
Benchmark your AI agents with intelligent, diverse test scenarios
ASTK is a comprehensive testing framework for AI agents that evaluates performance, intelligence, and capabilities through diverse scenarios. Test your agents against real-world tasks like file analysis, code comprehension, and complex reasoning.
๐ฏ Features
- ๐ง Intelligent Benchmarks: 8 diverse scenarios testing different AI capabilities
- ๐ Performance Metrics: Response time, success rate, and quality analysis
- ๐ง Easy Setup: Simple Python environment with minimal dependencies
- ๐ค Agent Ready: Works with LangChain, OpenAI, and custom agents
- ๐ File Q&A Agent: Built-in example agent for testing
๐ Quick Start
1. Prerequisites
- Python 3.11+
- OpenAI API Key
2. Setup Environment
# Clone/navigate to ASTK directory
cd /path/to/ASTK
# Create and activate virtual environment
python3.11 -m venv .venv311
source .venv311/bin/activate # On Windows: .venv311\Scripts\activate
# Install dependencies
pip install langchain langchain-openai langchain-core pydantic click psutil
3. Set API Key
export OPENAI_API_KEY="your-api-key-here"
4. Run Your First Benchmark
# Run the intelligent benchmark on the example agent
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
๐ค Available Agents
File Q&A Agent (examples/agents/file_qa_agent.py)
A LangChain-powered agent that can:
- ๐ List files in directories
- ๐ Read file contents and summarize
- ๐ Answer questions about file data
- ๐งญ Navigate directory structures
Example Usage:
# Direct agent usage
python examples/agents/file_qa_agent.py "What Python files are in this project?"
# Run with simple runner
python scripts/simple_run.py examples/agents/file_qa_agent.py
# Run intelligent benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
๐งช Benchmark Scenarios
The intelligent benchmark tests 8 diverse scenarios:
| Scenario | Test | Capability |
|---|---|---|
| ๐ File Discovery | Find Python files and entry points | File system navigation |
| โ๏ธ Config Analysis | Analyze configuration files | Technical comprehension |
| ๐ README Comprehension | Read and explain project | Document analysis |
| ๐๏ธ Code Structure | Analyze directory structure | Architecture understanding |
| ๐ Documentation Search | Explore documentation | Information retrieval |
| ๐ Dependency Analysis | Analyze requirements/dependencies | Technical analysis |
| ๐ก Example Exploration | Discover example code | Code comprehension |
| ๐งช Test Discovery | Find testing framework | Development understanding |
๐ Results & Metrics
Benchmarks generate comprehensive results:
{
"success_rate": 1.0,
"total_duration_seconds": 25.4,
"average_scenario_duration": 3.2,
"average_response_length": 847,
"scenarios": [...]
}
Metrics Include:
- โ Success Rate: Percentage of completed scenarios
- โฑ๏ธ Response Time: Duration for each scenario
- ๐ Response Quality: Length and content analysis
- ๐ฏ Scenario Details: Individual query results
๐ ๏ธ Available Tools
๐ Simple Benchmark Runner
python scripts/simple_benchmark.py <agent_path>
Runs 8 intelligent scenarios and generates detailed performance reports.
๐ง Simple Agent Runner
python scripts/simple_run.py <agent_path>
Runs an agent directly with basic output capture.
๐ ASTK CLI (Advanced)
# Initialize project structure
python scripts/astk.py init
# Run advanced benchmarks (when package issues resolved)
python scripts/astk.py run <agent_path>
# View results
python scripts/astk.py view <results_dir>
๐๏ธ Project Structure
ASTK/
โโโ ๐ค examples/agents/ # Example AI agents
โ โโโ file_qa_agent.py # LangChain File Q&A agent
โโโ ๐ scripts/ # Benchmark and utility scripts
โ โโโ simple_benchmark.py # Intelligent benchmark runner โญ
โ โโโ simple_run.py # Basic agent runner
โ โโโ astk.py # Advanced CLI (WIP)
โโโ ๐ง astk/ # Core ASTK framework
โ โโโ benchmarks/ # Benchmark modules
โ โโโ cli.py # Command-line interface
โ โโโ *.py # Core modules
โโโ ๐ benchmark_results/ # Generated benchmark results
โโโ โ๏ธ config/ # Configuration files
โโโ ๐ docs/ # Documentation
๐ฎ Usage Examples
Run Agent Directly
python examples/agents/file_qa_agent.py "Analyze the setup.py file"
Quick Agent Test
python scripts/simple_run.py examples/agents/file_qa_agent.py
Full Intelligence Benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
Custom Queries
python examples/agents/file_qa_agent.py "What is the purpose of the astk directory?"
๐ง Troubleshooting
Common Issues
๐ Virtual Environment Problems
# Recreate environment
deactivate
rm -rf .venv311
python3.11 -m venv .venv311
source .venv311/bin/activate
pip install langchain langchain-openai langchain-core pydantic
๐ OpenAI API Issues
# Verify API key is set
echo $OPENAI_API_KEY
# Set API key
export OPENAI_API_KEY="sk-..."
๐ฆ Import Errors
# Install missing packages
pip install langchain langchain-openai langchain-core pydantic click psutil
# Verify installation
python -c "from langchain_openai import ChatOpenAI; print('โ
LangChain installed')"
๐ Creating Your Own Agent
Create a new agent that responds to command-line arguments:
#!/usr/bin/env python3
import sys
async def main():
if len(sys.argv) > 1:
query = " ".join(sys.argv[1:])
# Process query and return response
print(f"Agent: {response}")
else:
# Default behavior
print("Agent: Ready!")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Then benchmark it:
python scripts/simple_benchmark.py path/to/your_agent.py
๐ Performance Tips
- โก Faster Responses: Use GPT-3.5-turbo for speed
- ๐ง Better Intelligence: Use GPT-4 for complex reasoning
- ๐ฐ Cost Optimization: Monitor token usage in results
- ๐ง Custom Scenarios: Modify
scripts/simple_benchmark.pyfor specific tests
๐ค Contributing
- Create new agents in
examples/agents/ - Add benchmark scenarios in
scripts/simple_benchmark.py - Test with:
python scripts/simple_benchmark.py your_agent.py
๐ License
Apache 2.0 License - See LICENSE file for details.
๐ฏ Ready to benchmark your AI agents? Start with:
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_sprint_testkit-0.1.0.tar.gz.
File metadata
- Download URL: agent_sprint_testkit-0.1.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7342c1baeb05d3efddc3c2350a374d1300a34fd3b93962be2b4d819d6926e0f2
|
|
| MD5 |
768d87220f7471f6df0c4f0c7f4ef11a
|
|
| BLAKE2b-256 |
d3603c925129ee3e1874c9afb286d51201c5c1a9fb4c7c8ef092b6f3c9a8b2b7
|
File details
Details for the file agent_sprint_testkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_sprint_testkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a34b7a11b3b91db24f429ca2fb0ab31b3cac0b07793aa0490acd74dfb5f72a10
|
|
| MD5 |
1b48f93bef035a71f0040bc7ff687a7d
|
|
| BLAKE2b-256 |
4d4e5e3bb199174180e6ba682d364f893ae849793440621218c97f90762b9eed
|