Skip to main content

AgentSprint TestKit - Universal AI agent benchmarking and testing framework

Project description

ASTK Package Usage Guide ๐Ÿ“–

Step-by-step instructions for using AgentSprint TestKit

This guide shows you exactly how to install and use ASTK to test your AI agents. No technical background required!

๐Ÿš€ What is ASTK?

ASTK is a tool that tests your AI chatbots and agents to see how well they work. Think of it like a "test suite" for your AI - it asks your agent different questions and measures how good the responses are.

๐Ÿ“ฆ Step 1: Install ASTK

Open your terminal/command prompt and run:

pip install agent-sprint-testkit

โœ… Check it worked:

astk --help

You should see a help menu. If you get an error, see Troubleshooting below.

๐Ÿ”‘ Step 2: Set Up OpenAI API Key

ASTK uses OpenAI to help evaluate your agent's responses. You need an API key:

  1. Get an API key from OpenAI
  2. Set the key in your terminal:
# On Mac/Linux:
export OPENAI_API_KEY="sk-your-key-here"

# On Windows:
set OPENAI_API_KEY=sk-your-key-here

๐Ÿ Step 3: Your First Test

Option A: Test the Example Agent

ASTK comes with a built-in example agent for testing:

astk init my-first-test
cd my-first-test
astk benchmark examples/agents/file_qa_agent.py

This will:

  • โœ… Create a test project
  • โœ… Run 8 different scenarios
  • โœ… Generate a detailed report
  • โœ… Show you how well the agent performed

Option B: Test Your Own Agent

If you have your own AI agent, you can test it:

astk benchmark path/to/your-agent.py

Your agent must accept questions as command-line arguments:

python your-agent.py "What is 2+2?"
# Should output: "Agent: 4" or similar

๐Ÿ“Š Understanding Results

After running a benchmark, you'll see sophisticated results like:

{
  "success_rate": 0.67,           // 67% of tests passed
  "complexity_score": 0.58,       // 58% difficulty-weighted score
  "total_duration_seconds": 45.2, // Took 45 seconds total
  "average_response_length": 1247, // Average response was 1,247 characters
  "difficulty_breakdown": {
    "intermediate": {"success_rate": 1.0, "scenarios": "2/2"},
    "advanced": {"success_rate": 0.6, "scenarios": "3/5"},
    "expert": {"success_rate": 0.4, "scenarios": "2/5"}
  },
  "category_breakdown": {
    "reasoning": {"success_rate": 0.67, "scenarios": "2/3"},
    "creativity": {"success_rate": 0.5, "scenarios": "1/2"},
    "ethics": {"success_rate": 1.0, "scenarios": "2/2"}
  },
  "scenarios": [...]              // Details for each test
}

๐ŸŽฏ What this means:

Core Metrics

  • Success Rate: Percentage of scenarios completed successfully
  • Complexity Score: Difficulty-weighted performance (Expert = 3x, Advanced = 2x, Intermediate = 1x)
  • Duration: How fast your agent responds to complex challenges
  • Response Length: How detailed and comprehensive the answers are

Advanced Analytics

  • ๐ŸŽ“ Difficulty Breakdown: Performance across challenge levels
    • ๐Ÿ“˜ Intermediate: Basic problem-solving tasks
    • ๐Ÿ“™ Advanced: Complex multi-step reasoning
    • ๐Ÿ“• Expert: Cutting-edge AI capabilities
  • ๐Ÿท๏ธ Category Performance: Strengths across different domains
    • ๐Ÿง  Reasoning: Logic and problem-solving
    • ๐ŸŽจ Creativity: Innovation and design thinking
    • โš–๏ธ Ethics: Responsible AI practices
    • ๐Ÿ”— Integration: System architecture skills

๐ŸŒŸ AI Capability Ratings

Based on your Complexity Score:

  • ๐ŸŒŸ Exceptional AI (80%+): Expert-level reasoning across multiple domains
  • ๐Ÿ”ฅ Advanced AI (60-79%): Strong performance on sophisticated tasks
  • ๐Ÿ’ก Competent AI (40-59%): Good basic capabilities, room for advanced improvement
  • ๐Ÿ“š Developing AI (<40%): Focus on improving reasoning and problem-solving

๐Ÿงช What Tests Does ASTK Run?

ASTK automatically tests your agent with 12 sophisticated scenarios across multiple categories:

๐Ÿง  Reasoning & Problem-Solving

Test What it checks Difficulty
Multi-step Reasoning Can your agent analyze complex problems, identify security vulnerabilities, and provide detailed solutions? ๐Ÿ“™ Advanced
Edge Case Analysis How well does it handle unusual situations, errors, and unexpected inputs? ๐Ÿ“˜ Intermediate
Performance Optimization Can it analyze code for bottlenecks and suggest detailed performance improvements? ๐Ÿ“™ Advanced

๐ŸŽจ Creativity & Innovation

Test What it checks Difficulty
Creative Problem Solving Can your agent design new features and architectures from scratch with implementation details? ๐Ÿ“• Expert
Adaptive Learning Assessment Can it design self-improving systems and machine learning approaches? ๐Ÿ“• Expert

๐Ÿ”— System Integration & Architecture

Test What it checks Difficulty
Cross-domain Integration How well can it design complete DevOps and CI/CD strategies? ๐Ÿ“• Expert
Failure Recovery Design Can it create comprehensive error handling and reliability systems? ๐Ÿ“™ Advanced
Scalability Architecture Can it redesign systems for massive scale (100k+ concurrent users)? ๐Ÿ“• Expert

โš–๏ธ Ethics & Compliance

Test What it checks Difficulty
Ethical AI Evaluation Does it understand AI bias, fairness, and responsible AI practices? ๐Ÿ“™ Advanced
Regulatory Compliance Can it design systems that meet GDPR, CCPA, and AI regulations? ๐Ÿ“™ Advanced

๐Ÿ’ผ Strategic & Future-Tech Analysis

Test What it checks Difficulty
Competitive Analysis Can it analyze markets, competitive positioning, and business strategy? ๐Ÿ“˜ Intermediate
Quantum Computing Readiness Does it understand emerging technologies and future-tech implications? ๐Ÿ“• Expert

๐Ÿ“Š New Metrics You'll Get:

  • ๐Ÿง  Complexity Score: Difficulty-weighted performance (Expert tasks count 3x more than Intermediate)
  • ๐ŸŽ“ Difficulty Breakdown: How well your agent handles Intermediate vs Advanced vs Expert challenges
  • ๐Ÿท๏ธ Category Performance: Which areas your agent excels in (Reasoning, Creativity, Ethics, etc.)
  • ๐Ÿ† Best Category: Your agent's strongest capability area
  • ๐ŸŒŸ AI Capability Assessment: Overall intelligence rating from "Developing" to "Exceptional"

๐ŸŽฏ Common Use Cases

Testing a Simple Chatbot

# Your chatbot file: my_bot.py
#!/usr/bin/env python3
import sys

def main():
    if len(sys.argv) > 1:
        question = " ".join(sys.argv[1:])
        # Your chatbot logic here
        answer = f"Bot says: {question}"
        print(answer)

if __name__ == "__main__":
    main()

Test it:

astk benchmark my_bot.py

Testing Different Agent Types

CLI Agent (takes command line arguments):

astk benchmark my_cli_agent.py

Python Module Agent (has a chat method):

# ASTK will automatically detect and use the chat() method
astk benchmark my_module_agent.py

REST API Agent:

# ASTK will try to use the /chat endpoint
astk benchmark http://localhost:8000

๐Ÿ“‹ All Available Commands

# Initialize a new test project
astk init <project-name>

# Run benchmark tests
astk benchmark <agent-path>

# Generate detailed reports
astk report <results-directory>

# Show examples and help
astk examples

# Show version
astk --version

๐Ÿ”ง Troubleshooting

โŒ "Command not found: astk"

Problem: Package not installed properly

Solution:

pip install --upgrade pip
pip install agent-sprint-testkit

Still not working? Try:

python -m pip install agent-sprint-testkit

โŒ "OpenAI API key not found"

Problem: API key not set

Solution:

# Check if it's set:
echo $OPENAI_API_KEY

# Set it:
export OPENAI_API_KEY="sk-your-key-here"

โŒ "Agent failed to respond"

Problem: Your agent doesn't accept command-line arguments

Solution: Make sure your agent works like this:

python your-agent.py "test question"
# Should print something back

Example working agent:

#!/usr/bin/env python3
import sys

if len(sys.argv) > 1:
    question = " ".join(sys.argv[1:])
    print(f"Agent: Here's my response to '{question}'")
else:
    print("Agent: Please ask me a question!")

โŒ Permission errors

Problem: Can't install or run commands

Solution:

# Try with user installation:
pip install --user agent-sprint-testkit

# Add to PATH if needed:
export PATH=$PATH:~/.local/bin

๐ŸŽฎ Quick Examples

1. Basic Test Run

pip install agent-sprint-testkit
export OPENAI_API_KEY="your-key"
astk init test-project
cd test-project
astk benchmark examples/agents/file_qa_agent.py

2. Test Your Own Agent

# Create simple agent
echo '#!/usr/bin/env python3
import sys
if len(sys.argv) > 1:
    print(f"Bot: {sys.argv[1]}")' > my_bot.py

chmod +x my_bot.py

# Test it
astk benchmark my_bot.py

3. Multiple Tests

# Test different agents
astk benchmark agent1.py
astk benchmark agent2.py
astk benchmark http://localhost:8000

# Compare results
astk report benchmark_results/

๐Ÿ“ˆ Improving Your Agent

Based on ASTK results, you can improve your agent:

  • Low success rate? Make sure your agent handles different question types
  • Slow responses? Optimize your agent's processing speed
  • Short responses? Add more detailed explanations
  • Failed scenarios? Test your agent with the specific question types ASTK uses

๐Ÿ’ก Tips for Best Results

  1. Test regularly - Run ASTK after every major change to your agent
  2. Check all scenarios - Make sure your agent handles different types of questions
  3. Monitor performance - Watch response times and success rates
  4. Use the reports - ASTK generates detailed reports to help you improve

๐Ÿš€ Next Steps

  1. Install ASTK: pip install agent-sprint-testkit
  2. Set API key: export OPENAI_API_KEY="your-key"
  3. Run first test: astk init test && cd test && astk examples
  4. Test your agent: astk benchmark your-agent.py
  5. Review results and improve your agent!

๐ŸŽฏ Ready to test your AI agent?

pip install agent-sprint-testkit && astk --help

Need help? Check the main documentation or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sprint_testkit-0.1.2.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_sprint_testkit-0.1.2-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file agent_sprint_testkit-0.1.2.tar.gz.

File metadata

  • Download URL: agent_sprint_testkit-0.1.2.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for agent_sprint_testkit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 93e5bac5532b0924caeffbf1d9e374403c95edf9cf84d58f13bd4b2f5de27b8b
MD5 05a68ef97b0a4305af82c98585304452
BLAKE2b-256 d13f4c56df4d3913b271fd7f438dd06b4badb1eb13da75dbca19bfa542e9ab0a

See more details on using hashes here.

File details

Details for the file agent_sprint_testkit-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_sprint_testkit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 23c441acfd3919f6e88cb91a8f5117e3aeda0b4719cd0a068e782db7d967edf8
MD5 6cc458c295dc0fca5f06c0e1ce7ad924
BLAKE2b-256 15d7b79224f58411b673214be0beb2a090f2715922419e0ed23b3a6c06e0e48c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page