AgentSprint TestKit - Professional AI agent evaluation with OpenAI Evals integration

These details have not been verified by PyPI

Project links

Project description

ASTK Package Usage Guide 📖

Step-by-step instructions for using AgentSprint TestKit

This guide shows you exactly how to install and use ASTK to test your AI agents. No technical background required!

🚀 What is ASTK?

ASTK is a tool that tests your AI chatbots and agents to see how well they work. Think of it like a "test suite" for your AI - it asks your agent different questions and measures how good the responses are.

📦 Step 1: Install ASTK

Open your terminal/command prompt and run:

pip install agent-sprint-testkit

✅ Check it worked:

python -m astk.cli --help

You should see a help menu. If you get an error, see Troubleshooting below.

🔑 Step 2: Set Up OpenAI API Key

ASTK uses OpenAI to help evaluate your agent's responses. You need an API key:

Get an API key from OpenAI
Set the key in your terminal:

# On Mac/Linux:
export OPENAI_API_KEY="sk-your-key-here"

# On Windows:
set OPENAI_API_KEY=sk-your-key-here

🏁 Step 3: Your First Test

Option A: Test the Example Agent

ASTK comes with a built-in example agent for testing:

python -m astk.cli init my-first-test
cd my-first-test
python -m astk.cli benchmark examples/agents/file_qa_agent.py

This will:

✅ Create a test project
✅ Run 8 different scenarios
✅ Generate a detailed report
✅ Show you how well the agent performed

Option B: Test Your Own Agent

If you have your own AI agent, you can test it:

python -m astk.cli benchmark path/to/your-agent.py

Your agent must accept questions as command-line arguments:

python your-agent.py "What is 2+2?"
# Should output: "Agent: 4" or similar

📊 Understanding Results

After running a benchmark, you'll see sophisticated results like:

{
  "success_rate": 0.67,           // 67% of tests passed
  "complexity_score": 0.58,       // 58% difficulty-weighted score
  "total_duration_seconds": 45.2, // Took 45 seconds total
  "average_response_length": 1247, // Average response was 1,247 characters
  "difficulty_breakdown": {
    "intermediate": {"success_rate": 1.0, "scenarios": "2/2"},
    "advanced": {"success_rate": 0.6, "scenarios": "3/5"},
    "expert": {"success_rate": 0.4, "scenarios": "2/5"}
  },
  "category_breakdown": {
    "reasoning": {"success_rate": 0.67, "scenarios": "2/3"},
    "creativity": {"success_rate": 0.5, "scenarios": "1/2"},
    "ethics": {"success_rate": 1.0, "scenarios": "2/2"}
  },
  "scenarios": [...]              // Details for each test
}

🎯 What this means:

Core Metrics

Success Rate: Percentage of scenarios completed successfully
Complexity Score: Difficulty-weighted performance (Expert = 3x, Advanced = 2x, Intermediate = 1x)
Duration: How fast your agent responds to complex challenges
Response Length: How detailed and comprehensive the answers are

Advanced Analytics

🎓 Difficulty Breakdown: Performance across challenge levels
- 📘 Intermediate: Basic problem-solving tasks
- 📙 Advanced: Complex multi-step reasoning
- 📕 Expert: Cutting-edge AI capabilities
🏷️ Category Performance: Strengths across different domains
- 🧠 Reasoning: Logic and problem-solving
- 🎨 Creativity: Innovation and design thinking
- ⚖️ Ethics: Responsible AI practices
- 🔗 Integration: System architecture skills

🌟 AI Capability Ratings

Based on your Complexity Score:

🌟 Exceptional AI (80%+): Expert-level reasoning across multiple domains
🔥 Advanced AI (60-79%): Strong performance on sophisticated tasks
💡 Competent AI (40-59%): Good basic capabilities, room for advanced improvement
📚 Developing AI (<40%): Focus on improving reasoning and problem-solving

🧪 What Tests Does ASTK Run?

ASTK automatically tests your agent with 12 sophisticated scenarios across multiple categories:

🧠 Reasoning & Problem-Solving

Test	What it checks	Difficulty
Multi-step Reasoning	Can your agent analyze complex problems, identify security vulnerabilities, and provide detailed solutions?	📙 Advanced
Edge Case Analysis	How well does it handle unusual situations, errors, and unexpected inputs?	📘 Intermediate
Performance Optimization	Can it analyze code for bottlenecks and suggest detailed performance improvements?	📙 Advanced

🎨 Creativity & Innovation

Test	What it checks	Difficulty
Creative Problem Solving	Can your agent design new features and architectures from scratch with implementation details?	📕 Expert
Adaptive Learning Assessment	Can it design self-improving systems and machine learning approaches?	📕 Expert

🔗 System Integration & Architecture

Test	What it checks	Difficulty
Cross-domain Integration	How well can it design complete DevOps and CI/CD strategies?	📕 Expert
Failure Recovery Design	Can it create comprehensive error handling and reliability systems?	📙 Advanced
Scalability Architecture	Can it redesign systems for massive scale (100k+ concurrent users)?	📕 Expert

⚖️ Ethics & Compliance

Test	What it checks	Difficulty
Ethical AI Evaluation	Does it understand AI bias, fairness, and responsible AI practices?	📙 Advanced
Regulatory Compliance	Can it design systems that meet GDPR, CCPA, and AI regulations?	📙 Advanced

💼 Strategic & Future-Tech Analysis

Test	What it checks	Difficulty
Competitive Analysis	Can it analyze markets, competitive positioning, and business strategy?	📘 Intermediate
Quantum Computing Readiness	Does it understand emerging technologies and future-tech implications?	📕 Expert

📊 New Metrics You'll Get:

🧠 Complexity Score: Difficulty-weighted performance (Expert tasks count 3x more than Intermediate)
🎓 Difficulty Breakdown: How well your agent handles Intermediate vs Advanced vs Expert challenges
🏷️ Category Performance: Which areas your agent excels in (Reasoning, Creativity, Ethics, etc.)
🏆 Best Category: Your agent's strongest capability area
🌟 AI Capability Assessment: Overall intelligence rating from "Developing" to "Exceptional"

🎯 Common Use Cases

Testing a Simple Chatbot

# Your chatbot file: my_bot.py
#!/usr/bin/env python3
import sys

def main():
    if len(sys.argv) > 1:
        question = " ".join(sys.argv[1:])
        # Your chatbot logic here
        answer = f"Bot says: {question}"
        print(answer)

if __name__ == "__main__":
    main()

Test it:

python -m astk.cli benchmark my_bot.py

Testing Different Agent Types

CLI Agent (takes command line arguments):

python -m astk.cli benchmark my_cli_agent.py

Python Module Agent (has a chat method):

# ASTK will automatically detect and use the chat() method
python -m astk.cli benchmark my_module_agent.py

REST API Agent:

# ASTK will try to use the /chat endpoint
python -m astk.cli benchmark http://localhost:8000

📋 All Available Commands

# Initialize a new test project
python -m astk.cli init <project-name>

# Run benchmark tests
python -m astk.cli benchmark <agent-path>

# Generate detailed reports
python -m astk.cli report <results-directory>

# Show examples and help
python -m astk.cli examples

# Show version
python -m astk.cli --version

🔧 Troubleshooting

❌ "Command not found: astk"

Problem: Package not installed properly

Solution:

pip install --upgrade pip
pip install agent-sprint-testkit

Still not working? Try:

python -m pip install agent-sprint-testkit

❌ "OpenAI API key not found"

Problem: API key not set

Solution:

# Check if it's set:
echo $OPENAI_API_KEY

# Set it:
export OPENAI_API_KEY="sk-your-key-here"

❌ "Agent failed to respond"

Problem: Your agent doesn't accept command-line arguments

Solution: Make sure your agent works like this:

python your-agent.py "test question"
# Should print something back

Example working agent:

#!/usr/bin/env python3
import sys

if len(sys.argv) > 1:
    question = " ".join(sys.argv[1:])
    print(f"Agent: Here's my response to '{question}'")
else:
    print("Agent: Please ask me a question!")

❌ Permission errors

Problem: Can't install or run commands

Solution:

# Try with user installation:
pip install --user agent-sprint-testkit

# Add to PATH if needed:
export PATH=$PATH:~/.local/bin

🎮 Quick Examples

1. Basic Test Run

pip install agent-sprint-testkit
export OPENAI_API_KEY="your-key"
python -m astk.cli init test-project
cd test-project
python -m astk.cli benchmark examples/agents/file_qa_agent.py

2. Test Your Own Agent

# Create simple agent
echo '#!/usr/bin/env python3
import sys
if len(sys.argv) > 1:
    print(f"Bot: {sys.argv[1]}")' > my_bot.py

chmod +x my_bot.py

# Test it
python -m astk.cli benchmark my_bot.py

3. Multiple Tests

# Test different agents
python -m astk.cli benchmark agent1.py
python -m astk.cli benchmark agent2.py
python -m astk.cli benchmark http://localhost:8000

# Compare results
python -m astk.cli report benchmark_results/

📈 Improving Your Agent

Based on ASTK results, you can improve your agent:

Low success rate? Make sure your agent handles different question types
Slow responses? Optimize your agent's processing speed
Short responses? Add more detailed explanations
Failed scenarios? Test your agent with the specific question types ASTK uses

💡 Tips for Best Results

Test regularly - Run ASTK after every major change to your agent
Check all scenarios - Make sure your agent handles different types of questions
Monitor performance - Watch response times and success rates
Use the reports - ASTK generates detailed reports to help you improve

🚀 Next Steps

Install ASTK: pip install agent-sprint-testkit
Set API key: export OPENAI_API_KEY="your-key"
Run first test: python -m astk.cli init test && cd test && python -m astk.cli examples
Test your agent: python -m astk.cli benchmark your-agent.py
Review results and improve your agent!

🎯 Ready to test your AI agent?

pip install agent-sprint-testkit && python -m astk.cli --help

Need help? Check the main documentation or open an issue.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

Jun 6, 2025

0.2.0

Jun 6, 2025

0.1.3

Jun 6, 2025

0.1.2

Jun 6, 2025

0.1.1

Jun 6, 2025

0.1.0

Jun 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sprint_testkit-0.3.1.tar.gz (63.5 kB view details)

Uploaded Jun 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_sprint_testkit-0.3.1-py3-none-any.whl (52.8 kB view details)

Uploaded Jun 6, 2025 Python 3

File details

Details for the file agent_sprint_testkit-0.3.1.tar.gz.

File metadata

Download URL: agent_sprint_testkit-0.3.1.tar.gz
Upload date: Jun 6, 2025
Size: 63.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`bf8007ad025d256fb2439558226c0ba0dcbef2a8f3d8e73192edf28388bc7985`
MD5	`fce3dc4212ea1bcf166861286b065fbe`
BLAKE2b-256	`2d0c5a9e129ec9ba4a16e3543c4b9ac345eeff7e23422c3e9fdaee2d19838e41`

See more details on using hashes here.

File details

Details for the file agent_sprint_testkit-0.3.1-py3-none-any.whl.

File metadata

Download URL: agent_sprint_testkit-0.3.1-py3-none-any.whl
Upload date: Jun 6, 2025
Size: 52.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e4df3ac03248869264064b0cab41deb4b13421d28fb90312124a795a2b482af`
MD5	`c138ed1a9b91450490fda735c0044c13`
BLAKE2b-256	`6a0887d63c341a036421970ace71de0234cbb04c85c114613718aad8db615e91`

See more details on using hashes here.

agent-sprint-testkit 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ASTK Package Usage Guide 📖

🚀 What is ASTK?

📦 Step 1: Install ASTK

🔑 Step 2: Set Up OpenAI API Key

🏁 Step 3: Your First Test

Option A: Test the Example Agent

Option B: Test Your Own Agent

📊 Understanding Results

Core Metrics

Advanced Analytics

🌟 AI Capability Ratings

🧪 What Tests Does ASTK Run?

🧠 Reasoning & Problem-Solving

🎨 Creativity & Innovation

🔗 System Integration & Architecture

⚖️ Ethics & Compliance

💼 Strategic & Future-Tech Analysis

📊 New Metrics You'll Get:

🎯 Common Use Cases

Testing a Simple Chatbot

Testing Different Agent Types

📋 All Available Commands

🔧 Troubleshooting

❌ "Command not found: astk"

❌ "OpenAI API key not found"

❌ "Agent failed to respond"

❌ Permission errors

🎮 Quick Examples

1. Basic Test Run

2. Test Your Own Agent

3. Multiple Tests

📈 Improving Your Agent

💡 Tips for Best Results

🚀 Next Steps

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes