Testing Framework for AI Software
Project description
Rue
Rue is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.
Installation
uv add rue
Rue 101
Follow pytest habits...
- Create 'rue_*.py' files
- Write 'def test_*' functions
- Use 'rue.resource' instead of 'pytest.fixture'
- Add 'assert' expressions within the functions
- Run 'uv run rue test'
...while leveraging Rue APIs.
- Use 'with metrics()' context to turn failed assertions into quality metrics
- Use 'has_facts()' and other semantic predicates for asserting natural language
- Access OTEL span data and assert it with 'follows_policy()' predicate
- Parse datasets into clearly typed and validated data objects
Example
import rue
from rue import Case, Metric, metrics
from rue.predicates import has_unsupported_facts, follows_policy
from pydantic import BaseModel
@rue.sut
def store_chatbot(prompt: str) -> str:
return call_llm(prompt)
@rue.metric
def accuracy():
metric = Metric()
yield metric
assert metric.mean > 0.8
yield metric.mean
class Refs(BaseModel):
kb: str
expected_tool: str | None = None
cases = [
Case(sut_input_values={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
Case(sut_input_values={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
Case(sut_input_values={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]
@rue.iter_cases(cases)
@rue.repeat(3)
async def test_chatbot_no_hallucinations(
case: Case[Refs],
store_chatbot,
accuracy: Metric,
trace_context):
"""AI agent relies on knowledge base and tool calls for transactional questions"""
response = store_chatbot(**case.sut_input_values)
# Verify the answer don't have any unsupported facts
with metrics(accuracy):
assert not await has_unsupported_facts(response, case.references.kb)
# Verify tool was called when expected
if expected_tool := case.references.expected_tool:
sut_spans = trace_context.get_sut_spans(name="store_chatbot")
tool_names = [
s.attributes.get("llm.request.functions.0.name")
for s in trace_context.get_llm_calls()
if s.attributes
]
assert expected_tool in tool_names
Run it:
rue test --trace
Use a custom run UUID when you need stable correlation IDs:
rue test --trace --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d
Output:
Rue Test Runner
=================
Collected 1 test
test_example.py::test_chatbot_responds ✓
==================== 1 passed in 0.08s ====================
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rue-0.1.0.tar.gz.
File metadata
- Download URL: rue-0.1.0.tar.gz
- Upload date:
- Size: 186.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5091ace878c741d7698053950d5ec57b288ca5c5a0a80f46d5937bc87cec9beb
|
|
| MD5 |
2e4db7e296f50e4f9dec0347142dee9f
|
|
| BLAKE2b-256 |
ef210daf323ecbb06ad2e03d67aa15a6ec2d1dfd5732a68f5cc15f67cf6ebf7e
|
File details
Details for the file rue-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rue-0.1.0-py3-none-any.whl
- Upload date:
- Size: 96.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f702053b5c9ac37df581282da7a1e06e81f188c40a1d440719e32c91f085810
|
|
| MD5 |
9db02b7c54fe2bb8ec1f86a21262bdd3
|
|
| BLAKE2b-256 |
c1a3a7a5439578c60364bfa5fc084aff3cb3a58cc6e6a8027edaafe82105a217
|