Skip to main content

AgentSprint TestKit - Universal AI agent benchmarking and testing framework

Project description

AgentSprint TestKit (ASTK) ๐Ÿš€

Benchmark your AI agents with intelligent, diverse test scenarios

ASTK is a comprehensive testing framework for AI agents that evaluates performance, intelligence, and capabilities through diverse scenarios. Test your agents against real-world tasks like file analysis, code comprehension, and complex reasoning.

๐ŸŽฏ Features

  • ๐Ÿง  Intelligent Benchmarks: 8 diverse scenarios testing different AI capabilities
  • ๐Ÿ“Š Performance Metrics: Response time, success rate, and quality analysis
  • ๐Ÿ”ง Easy Setup: Simple Python environment with minimal dependencies
  • ๐Ÿค– Agent Ready: Works with LangChain, OpenAI, and custom agents
  • ๐Ÿ“ File Q&A Agent: Built-in example agent for testing

๐Ÿ“‹ Quick Start

1. Prerequisites

  • Python 3.11+
  • OpenAI API Key

2. Setup Environment

# Clone/navigate to ASTK directory
cd /path/to/ASTK

# Create and activate virtual environment
python3.11 -m venv .venv311
source .venv311/bin/activate  # On Windows: .venv311\Scripts\activate

# Install dependencies
pip install langchain langchain-openai langchain-core pydantic click psutil

3. Set API Key

export OPENAI_API_KEY="your-api-key-here"

4. Run Your First Benchmark

# Run the intelligent benchmark on the example agent
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

๐Ÿค– Available Agents

File Q&A Agent (examples/agents/file_qa_agent.py)

A LangChain-powered agent that can:

  • ๐Ÿ“ List files in directories
  • ๐Ÿ“– Read file contents and summarize
  • ๐Ÿ” Answer questions about file data
  • ๐Ÿงญ Navigate directory structures

Example Usage:

# Direct agent usage
python examples/agents/file_qa_agent.py "What Python files are in this project?"

# Run with simple runner
python scripts/simple_run.py examples/agents/file_qa_agent.py

# Run intelligent benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

๐Ÿงช Benchmark Scenarios

The intelligent benchmark tests 8 diverse scenarios:

Scenario Test Capability
๐Ÿ“ File Discovery Find Python files and entry points File system navigation
โš™๏ธ Config Analysis Analyze configuration files Technical comprehension
๐Ÿ“– README Comprehension Read and explain project Document analysis
๐Ÿ—๏ธ Code Structure Analyze directory structure Architecture understanding
๐Ÿ“š Documentation Search Explore documentation Information retrieval
๐Ÿ”— Dependency Analysis Analyze requirements/dependencies Technical analysis
๐Ÿ’ก Example Exploration Discover example code Code comprehension
๐Ÿงช Test Discovery Find testing framework Development understanding

๐Ÿ“Š Results & Metrics

Benchmarks generate comprehensive results:

{
  "success_rate": 1.0,
  "total_duration_seconds": 25.4,
  "average_scenario_duration": 3.2,
  "average_response_length": 847,
  "scenarios": [...]
}

Metrics Include:

  • โœ… Success Rate: Percentage of completed scenarios
  • โฑ๏ธ Response Time: Duration for each scenario
  • ๐Ÿ“ Response Quality: Length and content analysis
  • ๐ŸŽฏ Scenario Details: Individual query results

๐Ÿ› ๏ธ Available Tools

๐Ÿš€ Simple Benchmark Runner

python scripts/simple_benchmark.py <agent_path>

Runs 8 intelligent scenarios and generates detailed performance reports.

๐Ÿ”ง Simple Agent Runner

python scripts/simple_run.py <agent_path>

Runs an agent directly with basic output capture.

๐Ÿ“‹ ASTK CLI (Advanced)

# Initialize project structure
python scripts/astk.py init

# Run advanced benchmarks (when package issues resolved)
python scripts/astk.py run <agent_path>

# View results
python scripts/astk.py view <results_dir>

๐Ÿ—๏ธ Project Structure

ASTK/
โ”œโ”€โ”€ ๐Ÿค– examples/agents/          # Example AI agents
โ”‚   โ””โ”€โ”€ file_qa_agent.py         # LangChain File Q&A agent
โ”œโ”€โ”€ ๐Ÿ“Š scripts/                  # Benchmark and utility scripts
โ”‚   โ”œโ”€โ”€ simple_benchmark.py      # Intelligent benchmark runner โญ
โ”‚   โ”œโ”€โ”€ simple_run.py            # Basic agent runner
โ”‚   โ””โ”€โ”€ astk.py                  # Advanced CLI (WIP)
โ”œโ”€โ”€ ๐Ÿง  astk/                     # Core ASTK framework
โ”‚   โ”œโ”€โ”€ benchmarks/              # Benchmark modules
โ”‚   โ”œโ”€โ”€ cli.py                   # Command-line interface
โ”‚   โ””โ”€โ”€ *.py                     # Core modules
โ”œโ”€โ”€ ๐Ÿ“ benchmark_results/        # Generated benchmark results
โ”œโ”€โ”€ โš™๏ธ config/                   # Configuration files
โ””โ”€โ”€ ๐Ÿ“– docs/                     # Documentation

๐ŸŽฎ Usage Examples

Run Agent Directly

python examples/agents/file_qa_agent.py "Analyze the setup.py file"

Quick Agent Test

python scripts/simple_run.py examples/agents/file_qa_agent.py

Full Intelligence Benchmark

python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

Custom Queries

python examples/agents/file_qa_agent.py "What is the purpose of the astk directory?"

๐Ÿ”ง Troubleshooting

Common Issues

๐Ÿ Virtual Environment Problems

# Recreate environment
deactivate
rm -rf .venv311
python3.11 -m venv .venv311
source .venv311/bin/activate
pip install langchain langchain-openai langchain-core pydantic

๐Ÿ”‘ OpenAI API Issues

# Verify API key is set
echo $OPENAI_API_KEY

# Set API key
export OPENAI_API_KEY="sk-..."

๐Ÿ“ฆ Import Errors

# Install missing packages
pip install langchain langchain-openai langchain-core pydantic click psutil

# Verify installation
python -c "from langchain_openai import ChatOpenAI; print('โœ… LangChain installed')"

๐Ÿš€ Creating Your Own Agent

Create a new agent that responds to command-line arguments:

#!/usr/bin/env python3
import sys

async def main():
    if len(sys.argv) > 1:
        query = " ".join(sys.argv[1:])
        # Process query and return response
        print(f"Agent: {response}")
    else:
        # Default behavior
        print("Agent: Ready!")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Then benchmark it:

python scripts/simple_benchmark.py path/to/your_agent.py

๐Ÿ“ˆ Performance Tips

  • โšก Faster Responses: Use GPT-3.5-turbo for speed
  • ๐Ÿง  Better Intelligence: Use GPT-4 for complex reasoning
  • ๐Ÿ’ฐ Cost Optimization: Monitor token usage in results
  • ๐Ÿ”ง Custom Scenarios: Modify scripts/simple_benchmark.py for specific tests

๐Ÿค Contributing

  1. Create new agents in examples/agents/
  2. Add benchmark scenarios in scripts/simple_benchmark.py
  3. Test with: python scripts/simple_benchmark.py your_agent.py

๐Ÿ“„ License

Apache 2.0 License - See LICENSE file for details.


๐ŸŽฏ Ready to benchmark your AI agents? Start with:

python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sprint_testkit-0.1.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_sprint_testkit-0.1.0-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_sprint_testkit-0.1.0.tar.gz.

File metadata

  • Download URL: agent_sprint_testkit-0.1.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7342c1baeb05d3efddc3c2350a374d1300a34fd3b93962be2b4d819d6926e0f2
MD5 768d87220f7471f6df0c4f0c7f4ef11a
BLAKE2b-256 d3603c925129ee3e1874c9afb286d51201c5c1a9fb4c7c8ef092b6f3c9a8b2b7

See more details on using hashes here.

File details

Details for the file agent_sprint_testkit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_sprint_testkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a34b7a11b3b91db24f429ca2fb0ab31b3cac0b07793aa0490acd74dfb5f72a10
MD5 1b48f93bef035a71f0040bc7ff687a7d
BLAKE2b-256 4d4e5e3bb199174180e6ba682d364f893ae849793440621218c97f90762b9eed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page