Skip to main content

Testing Framework for AI Software

Project description

Rue

License: MIT Python 3.12+

Rue is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.


Installation

uv add rue

Rue 101

Follow pytest habits...

  • Create 'rue_*.py' files
  • Write 'def test_*' functions
  • Use 'rue.resource' instead of 'pytest.fixture'
  • Add 'assert' expressions within the functions
  • Run 'uv run rue test'

...while leveraging Rue APIs.

  • Use 'with metrics()' context to turn failed assertions into quality metrics
  • Use 'has_facts()' and other semantic predicates for asserting natural language
  • Access OTEL span data and assert it with 'follows_policy()' predicate
  • Parse datasets into clearly typed and validated data objects

Example

import rue
from rue import Case, Metric, metrics
from rue.predicates import has_unsupported_facts, follows_policy

from pydantic import BaseModel

@rue.sut
def store_chatbot(prompt: str) -> str:
    return call_llm(prompt)

@rue.metric
def accuracy():
    metric = Metric()
    yield metric

    assert metric.mean > 0.8
    yield metric.mean

class Refs(BaseModel):
    kb: str
    expected_tool: str | None = None

cases = [
    Case(sut_input_values={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
    Case(sut_input_values={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
    Case(sut_input_values={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]

@rue.iter_cases(cases)
@rue.repeat(3)
async def test_chatbot_no_hallucinations(
    case: Case[Refs],
    store_chatbot,
    accuracy: Metric,
    trace_context):
    """AI agent relies on knowledge base and tool calls for transactional questions"""
    response = store_chatbot(**case.sut_input_values)

    # Verify the answer don't have any unsupported facts
    with metrics(accuracy):
        assert not await has_unsupported_facts(response, case.references.kb)

    # Verify tool was called when expected
    if expected_tool := case.references.expected_tool:
        sut_spans = trace_context.get_sut_spans(name="store_chatbot")
        tool_names = [
            s.attributes.get("llm.request.functions.0.name")
            for s in trace_context.get_llm_calls()
            if s.attributes
        ]
        assert expected_tool in tool_names

Run it:

rue test --trace

Use a custom run UUID when you need stable correlation IDs:

rue test --trace --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d

Output:

Rue Test Runner
=================

Collected 1 test

test_example.py::test_chatbot_responds ✓

==================== 1 passed in 0.08s ====================

License

This project is licensed under the MIT License - see the LICENSE file for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rue-0.1.0.tar.gz (186.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rue-0.1.0-py3-none-any.whl (96.3 kB view details)

Uploaded Python 3

File details

Details for the file rue-0.1.0.tar.gz.

File metadata

  • Download URL: rue-0.1.0.tar.gz
  • Upload date:
  • Size: 186.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for rue-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5091ace878c741d7698053950d5ec57b288ca5c5a0a80f46d5937bc87cec9beb
MD5 2e4db7e296f50e4f9dec0347142dee9f
BLAKE2b-256 ef210daf323ecbb06ad2e03d67aa15a6ec2d1dfd5732a68f5cc15f67cf6ebf7e

See more details on using hashes here.

File details

Details for the file rue-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rue-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 96.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for rue-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f702053b5c9ac37df581282da7a1e06e81f188c40a1d440719e32c91f085810
MD5 9db02b7c54fe2bb8ec1f86a21262bdd3
BLAKE2b-256 c1a3a7a5439578c60364bfa5fc084aff3cb3a58cc6e6a8027edaafe82105a217

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page