The world's easiest way to test anything using AI
Project description
AITest
The world's easiest way to test anything using AI.
AITest is a simple, pytest-like AI testing framework powered by PraisonAI Agents.
Installation
# Using uv (recommended)
uv add aitest
# Using pip
pip install aitest
Quick Start
One Line Test
from aitest import test
result = test("The capital of France is Paris", criteria="factually correct")
assert result.passed
Accuracy Testing
from aitest import accuracy
result = accuracy("4", expected="4")
assert result.score >= 9.0
Criteria Testing
from aitest import criteria
result = criteria("Hello, how can I help you?", criteria="is a friendly greeting")
assert result.passed
CLI Usage
# Basic test
aitest "The capital of France is Paris" --criteria "factually correct"
# Accuracy test
aitest accuracy "4" --expected "4"
# Criteria test
aitest criteria "Hello world" --criteria "is a greeting"
# With options
aitest "Output to test" --criteria "is correct" --threshold 8.0 --verbose
pytest-like Features
AITest replicates pytest's speed, robustness, and power:
Test Discovery
# Discover tests without running (like pytest --collect-only)
aitest collect tests/
aitest collect . --pattern "test_*.py"
Caching (100x Speedup)
# Cache is enabled by default - identical tests skip LLM calls
aitest "test output" --criteria "is correct" # First run: LLM call
aitest "test output" --criteria "is correct" # Second run: cached!
# Cache management
aitest cache-stats # Show cache statistics
aitest cache-clear # Clear the cache
Parallel Execution
from aitest import run_parallel, ParallelRunner
# Run multiple tests in parallel (I/O-bound LLM calls)
runner = ParallelRunner(workers=4)
results = runner.map(lambda x: test(x, criteria="is correct"), outputs)
Timing & Performance
from aitest import Instant, Duration
start = Instant()
# ... run tests ...
duration = start.elapsed()
print(f"Tests completed in {duration}") # "Tests completed in 1.23s"
Test Outcomes (skip, fail, xfail)
from aitest import skip, fail, xfail, importorskip
# Skip a test
skip("Not implemented yet")
# Fail explicitly
fail("This should not happen")
# Expected failure
xfail("Known bug #123")
# Skip if module not available
np = importorskip("numpy")
Conditional Markers
from aitest import mark
import sys
@mark.skipif(sys.platform == 'win32', reason="Not supported on Windows")
def test_unix_only():
pass
@mark.xfail(reason="Known bug #123", strict=False)
def test_known_bug():
pass
Run Command with Durations
# Run tests with duration reporting
aitest run tests/ --durations=5
# Run with fail-fast
aitest run tests/ -x
# Verbose mode
aitest run tests/ -v
Assertion Helpers
from aitest import approx, raises, warns, deprecated_call
# Approximate comparisons (essential for AI scores)
assert result.score == approx(7.5, abs=0.5)
assert 0.1 + 0.2 == approx(0.3)
assert [0.1, 0.2] == approx([0.1, 0.2])
# Exception testing
with raises(ValueError, match="invalid"):
raise ValueError("invalid input")
# Warning testing
with warns(UserWarning):
warnings.warn("test", UserWarning)
# Deprecation testing
with deprecated_call():
warnings.warn("deprecated", DeprecationWarning)
Parametrize (Data-Driven Testing)
from aitest import mark, param
# Test with multiple inputs
@mark.parametrize("x, y, expected", [
(1, 2, 3),
(4, 5, 9),
param(10, 20, 30, id="large_numbers"),
])
def test_add(x, y, expected):
assert x + y == expected
# Single argument
@mark.parametrize("prompt", [
"Hello",
"What is AI?",
"Explain quantum computing",
])
def test_ai_responses(prompt):
result = get_ai_response(prompt)
assert len(result) > 0
Fixtures
from aitest import fixture
@fixture
def ai_client():
"""Create an AI client for testing."""
return AIClient()
@fixture(scope="session")
def expensive_resource():
"""Session-scoped fixture - created once."""
resource = create_expensive_resource()
yield resource
resource.cleanup()
Pytest-like Decorators
from aitest import mark
@mark.criteria("output is helpful and accurate")
def test_helpfulness():
return "Hello! I'm here to help you with any questions."
@mark.accuracy(expected="4")
def test_math():
return "4"
Judge Modules
AITest includes specialized judges for different testing scenarios:
from aitest.judges import (
AccuracyJudge, # Compare output to expected
CriteriaJudge, # Evaluate against criteria
CodeJudge, # Evaluate code quality
APIJudge, # Test API responses
SafetyJudge, # Detect harmful content
)
# Code quality testing
code_judge = CodeJudge()
result = code_judge.judge(
"def add(a, b): return a + b",
criteria="correct implementation"
)
# API response testing
api_judge = APIJudge()
result = api_judge.judge(
'{"status": "ok", "data": [1, 2, 3]}',
expected_fields=["status", "data"]
)
# Safety testing
safety_judge = SafetyJudge()
result = safety_judge.judge("Hello, how can I help you?")
assert result.passed # Safe content
Configuration
from aitest import TestConfig, AITest
# Custom configuration
config = TestConfig(
model="gpt-4o", # LLM model
threshold=8.0, # Pass threshold (1-10)
temperature=0.1, # LLM temperature
verbose=True, # Verbose output
)
tester = AITest(config=config)
result = tester.run("Test output", criteria="is correct")
Environment Variables
export AITEST_MODEL="gpt-4o-mini" # Default model
export OPENAI_API_KEY="sk-..." # OpenAI API key
Custom Judges
Register your own judges:
from aitest import add_judge, get_judge
class MyCustomJudge:
def run(self, output, **kwargs):
from aitest import TestResult
return TestResult(
score=10.0,
passed=True,
reasoning="Custom evaluation"
)
add_judge("custom", MyCustomJudge)
# Use it
judge = get_judge("custom")()
result = judge.run("test output")
API Reference
Core Functions
| Function | Description |
|---|---|
test(output, criteria=None, expected=None) |
Test any output |
accuracy(output, expected) |
Compare to expected output |
criteria(output, criteria) |
Evaluate against criteria |
Classes
| Class | Description |
|---|---|
AITest |
Main testing class |
TestResult |
Test result with score, passed, reasoning |
TestConfig |
Configuration options |
Decorators
| Decorator | Description |
|---|---|
@mark.criteria(criteria) |
Mark test with criteria |
@mark.accuracy(expected) |
Mark test for accuracy |
@mark.skip(reason) |
Skip a test |
How It Works
AITest wraps PraisonAI Agents evaluation framework, providing:
- Simple API - One function to test anything
- Protocol-driven - Extends
JudgeProtocolfor compatibility - Multiple judges - Specialized testing for different scenarios
- CLI support - Test from command line
- pytest integration - Use familiar decorator syntax
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! Please read our contributing guidelines first.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testagent-0.1.0.tar.gz.
File metadata
- Download URL: testagent-0.1.0.tar.gz
- Upload date:
- Size: 123.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5061893c28703f1cfcccb388d8120a7117368148716435f8d1d141d3346259c5
|
|
| MD5 |
e4aecbee8758c7d9093c4c6f87008578
|
|
| BLAKE2b-256 |
4f397f3680a4328e567657cc0d75d0882da736eafd70eb5c44aaf412e6412700
|
File details
Details for the file testagent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: testagent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e48a57ab08a5a61e57d26f9628eef3f238200a929f191ba18ac84c54fa7d66d
|
|
| MD5 |
182ff86da3dad94e26aae16a9d3ec6be
|
|
| BLAKE2b-256 |
a1a4e1d04cd13f754953c932de3c34107af865fbb8b4febf100c17ca0e1310c6
|