AI Testing Framework
Project description
Merit
Merit is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.
Installation
uv add appmerit
Merit 101
Follow pytest habits...
- Create 'merit_*.py' files
- Write 'def merit_*' functions
- Use 'merit.resource' instead of 'pytest.fixture'
- Add 'assert' expressions within the functions
- Run 'uv run merit test'
...while leveraging Merit APIs.
- Use 'with metrics()' context to turn failed assertions into quality metrics
- Use 'has_facts()' and other semantic predicates for asserting natural language
- Access OTEL span data and assert it with 'follows_policy()' predicate
- Parse datasets into clearly typed and validated data objects
Example
import merit
from merit import Case, Metric, metrics
from merit.predicates import has_unsupported_facts, follows_policy
from pydantic import BaseModel
@merit.sut
def store_chatbot(prompt: str) -> str:
return call_llm(prompt)
@merit.metric
def accuracy():
metric = Metric()
yield metric
assert metric.mean > 0.8
yield metric.mean
class Refs(BaseModel):
kb: str
expected_tool: str | None = None
cases = [
Case(sut_input_values={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
Case(sut_input_values={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
Case(sut_input_values={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]
@merit.iter_cases(cases)
@merit.repeat(3)
async def merit_chatbot_no_hallucinations(
case: Case[Refs],
store_chatbot,
accuracy: Metric,
trace_context):
"""AI agent relies on knowledge base and tool calls for transactional questions"""
response = store_chatbot(**case.sut_input_values)
# Verify the answer don't have any unsupported facts
with metrics([accuracy]):
assert not await has_unsupported_facts(response, case.references.kb)
# Verify tool was called when expected
if expected_tool := case.references.expected_tool:
spans = trace_context.get_sut_spans(store_chatbot)
tool_called = spans[1].attributes.get("llm.request.functions.0.name")
assert tool_called == expected_tool
Run it:
merit test
Output:
Merit Test Runner
=================
Collected 1 test
test_example.py::merit_chatbot_responds ✓
==================== 1 passed in 0.08s ====================
Documentation
Full documentation: docs.appmerit.com
Getting Started:
- Quick Start - Get up and running in 5 minutes
Usage:
- Writing Merits - How to define a proper merit suite
- Running Merits - How to execute suits and merits
Concepts:
- Merit - Like test but better
- Resource - Like fixtures but better
- Case - Container for parsed dataset entities
- Metric - Aggregating assertions
- Semantic Predicates - Asserting language and logs
- SUT (System Under Test) - Collecting and accesing traces
API Reference:
- Merit Definitions APIs - Tune discovery and execution
- Merit Predicates APIs - Build your own semantic predicates
- Merit Metric APIs - Build complex metric systems
- Merit Tracing APIs - OpenTelemetry integration
Contributing
We welcome contributions! To get started:
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/merit.git - Create a branch:
git checkout -b your-feature-name - Install dependencies:
uv sync - Make your changes
- Run tests:
uv run merit test - Run lints:
uv run ruff check . - Submit a pull request
For more details, see CONTRIBUTING.md.
Development Setup:
# Clone the repository
git clone https://github.com/appMerit/merit.git
cd merit
# Install dependencies
uv sync
# Run tests
uv run merit test
# Run lints
uv run ruff check .
uv run mypy .
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Documentation: docs.appmerit.com
- GitHub Issues: github.com/appMerit/merit/issues
- Email: support@appmerit.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file appmerit-0.1.1.tar.gz.
File metadata
- Download URL: appmerit-0.1.1.tar.gz
- Upload date:
- Size: 523.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e98bc5ccdbe645a28ba58096fa653ac1433c40bb158b61d04a2ecaf067dc2fc
|
|
| MD5 |
ae2ed18e7760a4896f9fbff4acbf4683
|
|
| BLAKE2b-256 |
ba66180da03ae6c8c331f030eb9fa2132adb455624d6418180ca196398ee487f
|
Provenance
The following attestation bundles were made for appmerit-0.1.1.tar.gz:
Publisher:
publish.yml on appMerit/merit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
appmerit-0.1.1.tar.gz -
Subject digest:
7e98bc5ccdbe645a28ba58096fa653ac1433c40bb158b61d04a2ecaf067dc2fc - Sigstore transparency entry: 827517374
- Sigstore integration time:
-
Permalink:
appMerit/merit@cf99f3e6482b1fd983dd0ccde854b06442fe8755 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/appMerit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cf99f3e6482b1fd983dd0ccde854b06442fe8755 -
Trigger Event:
release
-
Statement type:
File details
Details for the file appmerit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: appmerit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 63.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68246c97a4e466934c043fcf06efe0a19a6279b4767cc84576d1b3410b05034a
|
|
| MD5 |
ca84d824cce2b37777e8bf56b43a1225
|
|
| BLAKE2b-256 |
44c3cacbb920c0c1608c7943b2b5513efc59d388e4f9920e759baf3f4396b276
|
Provenance
The following attestation bundles were made for appmerit-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on appMerit/merit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
appmerit-0.1.1-py3-none-any.whl -
Subject digest:
68246c97a4e466934c043fcf06efe0a19a6279b4767cc84576d1b3410b05034a - Sigstore transparency entry: 827517394
- Sigstore integration time:
-
Permalink:
appMerit/merit@cf99f3e6482b1fd983dd0ccde854b06442fe8755 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/appMerit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cf99f3e6482b1fd983dd0ccde854b06442fe8755 -
Trigger Event:
release
-
Statement type: