AgentSprint TestKit - Universal AI agent benchmarking and testing framework

These details have not been verified by PyPI

Project links

Project description

AgentSprint TestKit (ASTK) 🚀

Benchmark your AI agents with intelligent, diverse test scenarios

ASTK is a comprehensive testing framework for AI agents that evaluates performance, intelligence, and capabilities through diverse scenarios. Test your agents against real-world tasks like file analysis, code comprehension, and complex reasoning.

🎯 Features

🧠 Intelligent Benchmarks: 8 diverse scenarios testing different AI capabilities
📊 Performance Metrics: Response time, success rate, and quality analysis
🔧 Easy Setup: Simple Python environment with minimal dependencies
🤖 Agent Ready: Works with LangChain, OpenAI, and custom agents
📁 File Q&A Agent: Built-in example agent for testing

📋 Quick Start

1. Prerequisites

Python 3.11+
OpenAI API Key

2. Setup Environment

# Clone/navigate to ASTK directory
cd /path/to/ASTK

# Create and activate virtual environment
python3.11 -m venv .venv311
source .venv311/bin/activate  # On Windows: .venv311\Scripts\activate

# Install dependencies
pip install langchain langchain-openai langchain-core pydantic click psutil

3. Set API Key

export OPENAI_API_KEY="your-api-key-here"

4. Run Your First Benchmark

# Run the intelligent benchmark on the example agent
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

🤖 Available Agents

File Q&A Agent (`examples/agents/file_qa_agent.py`)

A LangChain-powered agent that can:

📁 List files in directories
📖 Read file contents and summarize
🔍 Answer questions about file data
🧭 Navigate directory structures

Example Usage:

# Direct agent usage
python examples/agents/file_qa_agent.py "What Python files are in this project?"

# Run with simple runner
python scripts/simple_run.py examples/agents/file_qa_agent.py

# Run intelligent benchmark
python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

🧪 Benchmark Scenarios

The intelligent benchmark tests 8 diverse scenarios:

Scenario	Test	Capability
📁 File Discovery	Find Python files and entry points	File system navigation
⚙️ Config Analysis	Analyze configuration files	Technical comprehension
📖 README Comprehension	Read and explain project	Document analysis
🏗️ Code Structure	Analyze directory structure	Architecture understanding
📚 Documentation Search	Explore documentation	Information retrieval
🔗 Dependency Analysis	Analyze requirements/dependencies	Technical analysis
💡 Example Exploration	Discover example code	Code comprehension
🧪 Test Discovery	Find testing framework	Development understanding

📊 Results & Metrics

Benchmarks generate comprehensive results:

{
  "success_rate": 1.0,
  "total_duration_seconds": 25.4,
  "average_scenario_duration": 3.2,
  "average_response_length": 847,
  "scenarios": [...]
}

Metrics Include:

✅ Success Rate: Percentage of completed scenarios
⏱️ Response Time: Duration for each scenario
📝 Response Quality: Length and content analysis
🎯 Scenario Details: Individual query results

🛠️ Available Tools

🚀 Simple Benchmark Runner

python scripts/simple_benchmark.py <agent_path>

Runs 8 intelligent scenarios and generates detailed performance reports.

🔧 Simple Agent Runner

python scripts/simple_run.py <agent_path>

Runs an agent directly with basic output capture.

📋 ASTK CLI (Advanced)

# Initialize project structure
python scripts/astk.py init

# Run advanced benchmarks (when package issues resolved)
python scripts/astk.py run <agent_path>

# View results
python scripts/astk.py view <results_dir>

🏗️ Project Structure

ASTK/
├── 🤖 examples/agents/          # Example AI agents
│   └── file_qa_agent.py         # LangChain File Q&A agent
├── 📊 scripts/                  # Benchmark and utility scripts
│   ├── simple_benchmark.py      # Intelligent benchmark runner ⭐
│   ├── simple_run.py            # Basic agent runner
│   └── astk.py                  # Advanced CLI (WIP)
├── 🧠 astk/                     # Core ASTK framework
│   ├── benchmarks/              # Benchmark modules
│   ├── cli.py                   # Command-line interface
│   └── *.py                     # Core modules
├── 📁 benchmark_results/        # Generated benchmark results
├── ⚙️ config/                   # Configuration files
└── 📖 docs/                     # Documentation

🎮 Usage Examples

Run Agent Directly

python examples/agents/file_qa_agent.py "Analyze the setup.py file"

Quick Agent Test

python scripts/simple_run.py examples/agents/file_qa_agent.py

Full Intelligence Benchmark

python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

Custom Queries

python examples/agents/file_qa_agent.py "What is the purpose of the astk directory?"

🔧 Troubleshooting

Common Issues

🐍 Virtual Environment Problems

# Recreate environment
deactivate
rm -rf .venv311
python3.11 -m venv .venv311
source .venv311/bin/activate
pip install langchain langchain-openai langchain-core pydantic

🔑 OpenAI API Issues

# Verify API key is set
echo $OPENAI_API_KEY

# Set API key
export OPENAI_API_KEY="sk-..."

📦 Import Errors

# Install missing packages
pip install langchain langchain-openai langchain-core pydantic click psutil

# Verify installation
python -c "from langchain_openai import ChatOpenAI; print('✅ LangChain installed')"

🚀 Creating Your Own Agent

Create a new agent that responds to command-line arguments:

#!/usr/bin/env python3
import sys

async def main():
    if len(sys.argv) > 1:
        query = " ".join(sys.argv[1:])
        # Process query and return response
        print(f"Agent: {response}")
    else:
        # Default behavior
        print("Agent: Ready!")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Then benchmark it:

python scripts/simple_benchmark.py path/to/your_agent.py

📈 Performance Tips

⚡ Faster Responses: Use GPT-3.5-turbo for speed
🧠 Better Intelligence: Use GPT-4 for complex reasoning
💰 Cost Optimization: Monitor token usage in results
🔧 Custom Scenarios: Modify scripts/simple_benchmark.py for specific tests

🤝 Contributing

Create new agents in examples/agents/
Add benchmark scenarios in scripts/simple_benchmark.py
Test with: python scripts/simple_benchmark.py your_agent.py

📄 License

Apache 2.0 License - See LICENSE file for details.

🎯 Ready to benchmark your AI agents? Start with:

python scripts/simple_benchmark.py examples/agents/file_qa_agent.py

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Jun 6, 2025

0.2.0

Jun 6, 2025

0.1.3

Jun 6, 2025

0.1.2

Jun 6, 2025

0.1.1

Jun 6, 2025

This version

0.1.0

Jun 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sprint_testkit-0.1.0.tar.gz (30.9 kB view details)

Uploaded Jun 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_sprint_testkit-0.1.0-py3-none-any.whl (34.3 kB view details)

Uploaded Jun 6, 2025 Python 3

File details

Details for the file agent_sprint_testkit-0.1.0.tar.gz.

File metadata

Download URL: agent_sprint_testkit-0.1.0.tar.gz
Upload date: Jun 6, 2025
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7342c1baeb05d3efddc3c2350a374d1300a34fd3b93962be2b4d819d6926e0f2`
MD5	`768d87220f7471f6df0c4f0c7f4ef11a`
BLAKE2b-256	`d3603c925129ee3e1874c9afb286d51201c5c1a9fb4c7c8ef092b6f3c9a8b2b7`

See more details on using hashes here.

File details

Details for the file agent_sprint_testkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_sprint_testkit-0.1.0-py3-none-any.whl
Upload date: Jun 6, 2025
Size: 34.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a34b7a11b3b91db24f429ca2fb0ab31b3cac0b07793aa0490acd74dfb5f72a10`
MD5	`1b48f93bef035a71f0040bc7ff687a7d`
BLAKE2b-256	`4d4e5e3bb199174180e6ba682d364f893ae849793440621218c97f90762b9eed`

See more details on using hashes here.

agent-sprint-testkit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentSprint TestKit (ASTK) 🚀

🎯 Features

📋 Quick Start

1. Prerequisites

2. Setup Environment

3. Set API Key

4. Run Your First Benchmark

🤖 Available Agents

File Q&A Agent (examples/agents/file_qa_agent.py)

🧪 Benchmark Scenarios

📊 Results & Metrics

🛠️ Available Tools

🚀 Simple Benchmark Runner

🔧 Simple Agent Runner

📋 ASTK CLI (Advanced)

🏗️ Project Structure

🎮 Usage Examples

Run Agent Directly

Quick Agent Test

Full Intelligence Benchmark

Custom Queries

🔧 Troubleshooting

Common Issues

🚀 Creating Your Own Agent

📈 Performance Tips

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

File Q&A Agent (`examples/agents/file_qa_agent.py`)