Skip to main content

The world's easiest way to test anything using AI

Project description

AITest

The world's easiest way to test anything using AI.

Python 3.11+ License: MIT

AITest is a simple, pytest-like AI testing framework powered by PraisonAI Agents.

Installation

# Using uv (recommended)
uv add aitest

# Using pip
pip install aitest

Quick Start

One Line Test

from aitest import test

result = test("The capital of France is Paris", criteria="factually correct")
assert result.passed

Accuracy Testing

from aitest import accuracy

result = accuracy("4", expected="4")
assert result.score >= 9.0

Criteria Testing

from aitest import criteria

result = criteria("Hello, how can I help you?", criteria="is a friendly greeting")
assert result.passed

CLI Usage

# Basic test
aitest "The capital of France is Paris" --criteria "factually correct"

# Accuracy test
aitest accuracy "4" --expected "4"

# Criteria test
aitest criteria "Hello world" --criteria "is a greeting"

# With options
aitest "Output to test" --criteria "is correct" --threshold 8.0 --verbose

pytest-like Features

AITest replicates pytest's speed, robustness, and power:

Test Discovery

# Discover tests without running (like pytest --collect-only)
aitest collect tests/
aitest collect . --pattern "test_*.py"

Caching (100x Speedup)

# Cache is enabled by default - identical tests skip LLM calls
aitest "test output" --criteria "is correct"  # First run: LLM call
aitest "test output" --criteria "is correct"  # Second run: cached!

# Cache management
aitest cache-stats    # Show cache statistics
aitest cache-clear    # Clear the cache

Parallel Execution

from aitest import run_parallel, ParallelRunner

# Run multiple tests in parallel (I/O-bound LLM calls)
runner = ParallelRunner(workers=4)
results = runner.map(lambda x: test(x, criteria="is correct"), outputs)

Timing & Performance

from aitest import Instant, Duration

start = Instant()
# ... run tests ...
duration = start.elapsed()
print(f"Tests completed in {duration}")  # "Tests completed in 1.23s"

Test Outcomes (skip, fail, xfail)

from aitest import skip, fail, xfail, importorskip

# Skip a test
skip("Not implemented yet")

# Fail explicitly
fail("This should not happen")

# Expected failure
xfail("Known bug #123")

# Skip if module not available
np = importorskip("numpy")

Conditional Markers

from aitest import mark
import sys

@mark.skipif(sys.platform == 'win32', reason="Not supported on Windows")
def test_unix_only():
    pass

@mark.xfail(reason="Known bug #123", strict=False)
def test_known_bug():
    pass

Run Command with Durations

# Run tests with duration reporting
aitest run tests/ --durations=5

# Run with fail-fast
aitest run tests/ -x

# Verbose mode
aitest run tests/ -v

Assertion Helpers

from aitest import approx, raises, warns, deprecated_call

# Approximate comparisons (essential for AI scores)
assert result.score == approx(7.5, abs=0.5)
assert 0.1 + 0.2 == approx(0.3)
assert [0.1, 0.2] == approx([0.1, 0.2])

# Exception testing
with raises(ValueError, match="invalid"):
    raise ValueError("invalid input")

# Warning testing
with warns(UserWarning):
    warnings.warn("test", UserWarning)

# Deprecation testing
with deprecated_call():
    warnings.warn("deprecated", DeprecationWarning)

Parametrize (Data-Driven Testing)

from aitest import mark, param

# Test with multiple inputs
@mark.parametrize("x, y, expected", [
    (1, 2, 3),
    (4, 5, 9),
    param(10, 20, 30, id="large_numbers"),
])
def test_add(x, y, expected):
    assert x + y == expected

# Single argument
@mark.parametrize("prompt", [
    "Hello",
    "What is AI?",
    "Explain quantum computing",
])
def test_ai_responses(prompt):
    result = get_ai_response(prompt)
    assert len(result) > 0

Fixtures

from aitest import fixture

@fixture
def ai_client():
    """Create an AI client for testing."""
    return AIClient()

@fixture(scope="session")
def expensive_resource():
    """Session-scoped fixture - created once."""
    resource = create_expensive_resource()
    yield resource
    resource.cleanup()

Pytest-like Decorators

from aitest import mark

@mark.criteria("output is helpful and accurate")
def test_helpfulness():
    return "Hello! I'm here to help you with any questions."

@mark.accuracy(expected="4")
def test_math():
    return "4"

Judge Modules

AITest includes specialized judges for different testing scenarios:

from aitest.judges import (
    AccuracyJudge,   # Compare output to expected
    CriteriaJudge,   # Evaluate against criteria
    CodeJudge,       # Evaluate code quality
    APIJudge,        # Test API responses
    SafetyJudge,     # Detect harmful content
)

# Code quality testing
code_judge = CodeJudge()
result = code_judge.judge(
    "def add(a, b): return a + b",
    criteria="correct implementation"
)

# API response testing
api_judge = APIJudge()
result = api_judge.judge(
    '{"status": "ok", "data": [1, 2, 3]}',
    expected_fields=["status", "data"]
)

# Safety testing
safety_judge = SafetyJudge()
result = safety_judge.judge("Hello, how can I help you?")
assert result.passed  # Safe content

Configuration

from aitest import TestConfig, AITest

# Custom configuration
config = TestConfig(
    model="gpt-4o",           # LLM model
    threshold=8.0,            # Pass threshold (1-10)
    temperature=0.1,          # LLM temperature
    verbose=True,             # Verbose output
)

tester = AITest(config=config)
result = tester.run("Test output", criteria="is correct")

Environment Variables

export AITEST_MODEL="gpt-4o-mini"      # Default model
export OPENAI_API_KEY="sk-..."         # OpenAI API key

Custom Judges

Register your own judges:

from aitest import add_judge, get_judge

class MyCustomJudge:
    def run(self, output, **kwargs):
        from aitest import TestResult
        return TestResult(
            score=10.0,
            passed=True,
            reasoning="Custom evaluation"
        )

add_judge("custom", MyCustomJudge)

# Use it
judge = get_judge("custom")()
result = judge.run("test output")

API Reference

Core Functions

Function Description
test(output, criteria=None, expected=None) Test any output
accuracy(output, expected) Compare to expected output
criteria(output, criteria) Evaluate against criteria

Classes

Class Description
AITest Main testing class
TestResult Test result with score, passed, reasoning
TestConfig Configuration options

Decorators

Decorator Description
@mark.criteria(criteria) Mark test with criteria
@mark.accuracy(expected) Mark test for accuracy
@mark.skip(reason) Skip a test

How It Works

AITest wraps PraisonAI Agents evaluation framework, providing:

  1. Simple API - One function to test anything
  2. Protocol-driven - Extends JudgeProtocol for compatibility
  3. Multiple judges - Specialized testing for different scenarios
  4. CLI support - Test from command line
  5. pytest integration - Use familiar decorator syntax

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read our contributing guidelines first.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testagent-0.1.0.tar.gz (123.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

testagent-0.1.0-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file testagent-0.1.0.tar.gz.

File metadata

  • Download URL: testagent-0.1.0.tar.gz
  • Upload date:
  • Size: 123.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for testagent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5061893c28703f1cfcccb388d8120a7117368148716435f8d1d141d3346259c5
MD5 e4aecbee8758c7d9093c4c6f87008578
BLAKE2b-256 4f397f3680a4328e567657cc0d75d0882da736eafd70eb5c44aaf412e6412700

See more details on using hashes here.

File details

Details for the file testagent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: testagent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for testagent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e48a57ab08a5a61e57d26f9628eef3f238200a929f191ba18ac84c54fa7d66d
MD5 182ff86da3dad94e26aae16a9d3ec6be
BLAKE2b-256 a1a4e1d04cd13f754953c932de3c34107af865fbb8b4febf100c17ca0e1310c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page