The world's easiest way to test anything using AI

These details have not been verified by PyPI

Project description

AITest

The world's easiest way to test anything using AI.

AITest is a simple, pytest-like AI testing framework powered by PraisonAI Agents.

Installation

# Using uv (recommended)
uv add aitest

# Using pip
pip install aitest

Quick Start

One Line Test

from aitest import test

result = test("The capital of France is Paris", criteria="factually correct")
assert result.passed

Accuracy Testing

from aitest import accuracy

result = accuracy("4", expected="4")
assert result.score >= 9.0

Criteria Testing

from aitest import criteria

result = criteria("Hello, how can I help you?", criteria="is a friendly greeting")
assert result.passed

CLI Usage

# Basic test
aitest "The capital of France is Paris" --criteria "factually correct"

# Accuracy test
aitest accuracy "4" --expected "4"

# Criteria test
aitest criteria "Hello world" --criteria "is a greeting"

# With options
aitest "Output to test" --criteria "is correct" --threshold 8.0 --verbose

pytest-like Features

AITest replicates pytest's speed, robustness, and power:

Test Discovery

# Discover tests without running (like pytest --collect-only)
aitest collect tests/
aitest collect . --pattern "test_*.py"

Caching (100x Speedup)

# Cache is enabled by default - identical tests skip LLM calls
aitest "test output" --criteria "is correct"  # First run: LLM call
aitest "test output" --criteria "is correct"  # Second run: cached!

# Cache management
aitest cache-stats    # Show cache statistics
aitest cache-clear    # Clear the cache

Parallel Execution

from aitest import run_parallel, ParallelRunner

# Run multiple tests in parallel (I/O-bound LLM calls)
runner = ParallelRunner(workers=4)
results = runner.map(lambda x: test(x, criteria="is correct"), outputs)

Timing & Performance

from aitest import Instant, Duration

start = Instant()
# ... run tests ...
duration = start.elapsed()
print(f"Tests completed in {duration}")  # "Tests completed in 1.23s"

Test Outcomes (skip, fail, xfail)

from aitest import skip, fail, xfail, importorskip

# Skip a test
skip("Not implemented yet")

# Fail explicitly
fail("This should not happen")

# Expected failure
xfail("Known bug #123")

# Skip if module not available
np = importorskip("numpy")

Conditional Markers

from aitest import mark
import sys

@mark.skipif(sys.platform == 'win32', reason="Not supported on Windows")
def test_unix_only():
    pass

@mark.xfail(reason="Known bug #123", strict=False)
def test_known_bug():
    pass

Run Command with Durations

# Run tests with duration reporting
aitest run tests/ --durations=5

# Run with fail-fast
aitest run tests/ -x

# Verbose mode
aitest run tests/ -v

Assertion Helpers

from aitest import approx, raises, warns, deprecated_call

# Approximate comparisons (essential for AI scores)
assert result.score == approx(7.5, abs=0.5)
assert 0.1 + 0.2 == approx(0.3)
assert [0.1, 0.2] == approx([0.1, 0.2])

# Exception testing
with raises(ValueError, match="invalid"):
    raise ValueError("invalid input")

# Warning testing
with warns(UserWarning):
    warnings.warn("test", UserWarning)

# Deprecation testing
with deprecated_call():
    warnings.warn("deprecated", DeprecationWarning)

Parametrize (Data-Driven Testing)

from aitest import mark, param

# Test with multiple inputs
@mark.parametrize("x, y, expected", [
    (1, 2, 3),
    (4, 5, 9),
    param(10, 20, 30, id="large_numbers"),
])
def test_add(x, y, expected):
    assert x + y == expected

# Single argument
@mark.parametrize("prompt", [
    "Hello",
    "What is AI?",
    "Explain quantum computing",
])
def test_ai_responses(prompt):
    result = get_ai_response(prompt)
    assert len(result) > 0

Fixtures

from aitest import fixture

@fixture
def ai_client():
    """Create an AI client for testing."""
    return AIClient()

@fixture(scope="session")
def expensive_resource():
    """Session-scoped fixture - created once."""
    resource = create_expensive_resource()
    yield resource
    resource.cleanup()

Pytest-like Decorators

from aitest import mark

@mark.criteria("output is helpful and accurate")
def test_helpfulness():
    return "Hello! I'm here to help you with any questions."

@mark.accuracy(expected="4")
def test_math():
    return "4"

Judge Modules

AITest includes specialized judges for different testing scenarios:

from aitest.judges import (
    AccuracyJudge,   # Compare output to expected
    CriteriaJudge,   # Evaluate against criteria
    CodeJudge,       # Evaluate code quality
    APIJudge,        # Test API responses
    SafetyJudge,     # Detect harmful content
)

# Code quality testing
code_judge = CodeJudge()
result = code_judge.judge(
    "def add(a, b): return a + b",
    criteria="correct implementation"
)

# API response testing
api_judge = APIJudge()
result = api_judge.judge(
    '{"status": "ok", "data": [1, 2, 3]}',
    expected_fields=["status", "data"]
)

# Safety testing
safety_judge = SafetyJudge()
result = safety_judge.judge("Hello, how can I help you?")
assert result.passed  # Safe content

Configuration

from aitest import TestConfig, AITest

# Custom configuration
config = TestConfig(
    model="gpt-4o",           # LLM model
    threshold=8.0,            # Pass threshold (1-10)
    temperature=0.1,          # LLM temperature
    verbose=True,             # Verbose output
)

tester = AITest(config=config)
result = tester.run("Test output", criteria="is correct")

Environment Variables

export AITEST_MODEL="gpt-4o-mini"      # Default model
export OPENAI_API_KEY="sk-..."         # OpenAI API key

Custom Judges

from aitest import add_judge, get_judge

class MyCustomJudge:
    def run(self, output, **kwargs):
        from aitest import TestResult
        return TestResult(
            score=10.0,
            passed=True,
            reasoning="Custom evaluation"
        )

add_judge("custom", MyCustomJudge)

# Use it
judge = get_judge("custom")()
result = judge.run("test output")

API Reference

Core Functions

Function	Description
`test(output, criteria=None, expected=None)`	Test any output
`accuracy(output, expected)`	Compare to expected output
`criteria(output, criteria)`	Evaluate against criteria

Classes

Class	Description
`AITest`	Main testing class
`TestResult`	Test result with score, passed, reasoning
`TestConfig`	Configuration options

Decorators

Decorator	Description
`@mark.criteria(criteria)`	Mark test with criteria
`@mark.accuracy(expected)`	Mark test for accuracy
`@mark.skip(reason)`	Skip a test

How It Works

AITest wraps PraisonAI Agents evaluation framework, providing:

Simple API - One function to test anything
Protocol-driven - Extends JudgeProtocol for compatibility
Multiple judges - Specialized testing for different scenarios
CLI support - Test from command line
pytest integration - Use familiar decorator syntax

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read our contributing guidelines first.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testagent-0.1.0.tar.gz (123.7 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testagent-0.1.0-py3-none-any.whl (32.2 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file testagent-0.1.0.tar.gz.

File metadata

Download URL: testagent-0.1.0.tar.gz
Upload date: Jan 28, 2026
Size: 123.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for testagent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5061893c28703f1cfcccb388d8120a7117368148716435f8d1d141d3346259c5`
MD5	`e4aecbee8758c7d9093c4c6f87008578`
BLAKE2b-256	`4f397f3680a4328e567657cc0d75d0882da736eafd70eb5c44aaf412e6412700`

See more details on using hashes here.

File details

Details for the file testagent-0.1.0-py3-none-any.whl.

File metadata

Download URL: testagent-0.1.0-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 32.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for testagent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e48a57ab08a5a61e57d26f9628eef3f238200a929f191ba18ac84c54fa7d66d`
MD5	`182ff86da3dad94e26aae16a9d3ec6be`
BLAKE2b-256	`a1a4e1d04cd13f754953c932de3c34107af865fbb8b4febf100c17ca0e1310c6`

See more details on using hashes here.

testagent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

AITest

Installation

Quick Start

One Line Test

Accuracy Testing

Criteria Testing

CLI Usage

pytest-like Features

Test Discovery

Caching (100x Speedup)

Parallel Execution

Timing & Performance

Test Outcomes (skip, fail, xfail)

Conditional Markers

Run Command with Durations

Assertion Helpers

Parametrize (Data-Driven Testing)

Fixtures

Pytest-like Decorators

Judge Modules

Configuration

Environment Variables

Custom Judges

API Reference

Core Functions

Classes

Decorators

How It Works

License

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes