Skip to main content

AI Testing Framework

Project description

Merit

License: MIT Python 3.12+ Tests Checks

Merit is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.


Installation

uv add appmerit

Merit 101

Follow pytest habits...

  • Create 'merit_*.py' files
  • Write 'def merit_*' functions
  • Use 'merit.resource' instead of 'pytest.fixture'
  • Add 'assert' expressions within the functions
  • Run 'uv run merit test'

...while leveraging Merit APIs.

  • Use 'with metrics()' context to turn failed assertions into quality metrics
  • Use 'has_facts()' and other semantic predicates for asserting natural language
  • Access OTEL span data and assert it with 'follows_policy()' predicate
  • Parse datasets into clearly typed and validated data objects

Example

import merit
from merit import Case, Metric, metrics
from merit.predicates import has_unsupported_facts, follows_policy

from pydantic import BaseModel

@merit.sut
def store_chatbot(prompt: str) -> str:
    return call_llm(prompt)

@merit.metric
def accuracy():
    metric = Metric()
    yield metric

    assert metric.mean > 0.8
    yield metric.mean

class Refs(BaseModel):
    kb: str
    expected_tool: str | None = None

cases = [
    Case(sut_input_values={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
    Case(sut_input_values={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
    Case(sut_input_values={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]

@merit.iter_cases(cases)
@merit.repeat(3)
async def merit_chatbot_no_hallucinations(
    case: Case[Refs],
    store_chatbot,
    accuracy: Metric,
    trace_context):
    """AI agent relies on knowledge base and tool calls for transactional questions"""
    response = store_chatbot(**case.sut_input_values)

    # Verify the answer don't have any unsupported facts
    with metrics(accuracy):
        assert not await has_unsupported_facts(response, case.references.kb)

    # Verify tool was called when expected
    if expected_tool := case.references.expected_tool:
        sut_spans = trace_context.get_sut_spans(name="store_chatbot")
        tool_names = [
            s.attributes.get("llm.request.functions.0.name")
            for s in trace_context.get_llm_calls()
            if s.attributes
        ]
        assert expected_tool in tool_names

Run it:

merit test --trace

Use a custom run UUID when you need stable correlation IDs:

merit test --trace --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d

Output:

Merit Test Runner
=================

Collected 1 test

test_example.py::merit_chatbot_responds ✓

==================== 1 passed in 0.08s ====================

Documentation

Full documentation: docs.appmerit.com

Getting Started:

Usage:

Concepts:

API Reference:


Contributing

We welcome contributions! To get started:

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/YOUR_USERNAME/merit.git
  3. Create a branch: git checkout -b your-feature-name
  4. Install dependencies: uv sync
  5. Make your changes
  6. Run tests: uv run merit test
  7. Run lints: uv run ruff check .
  8. Submit a pull request

For more details, see CONTRIBUTING.md.

Development Setup:

# Clone the repository
git clone https://github.com/appMerit/merit.git
cd merit

# Install dependencies
uv sync

# Run tests
uv run merit test

# Run lints
uv run ruff check .
uv run mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.


Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

appmerit-0.1.3.tar.gz (577.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

appmerit-0.1.3-py3-none-any.whl (86.6 kB view details)

Uploaded Python 3

File details

Details for the file appmerit-0.1.3.tar.gz.

File metadata

  • Download URL: appmerit-0.1.3.tar.gz
  • Upload date:
  • Size: 577.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for appmerit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ab0cb609efc51e5c305547a7bfcb660b5cbac2047227731c190787ebf34d427c
MD5 449fc5b45d6277aa2457d92b67e18d56
BLAKE2b-256 a26f86038c36a11235441098274f9bd1547dea84d4d8df50e784c815f0c6c6ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for appmerit-0.1.3.tar.gz:

Publisher: publish.yml on appMerit/merit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file appmerit-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: appmerit-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 86.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for appmerit-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7cb351b653fa9ba739771347feb77f88fc596a29788307241b6ae1e2b4b79c34
MD5 fc018a782206f7a2138111ae1038002c
BLAKE2b-256 f11cf61d2f751759058ec8dc8850848b7235055cdf42280187357f6a77de7f37

See more details on using hashes here.

Provenance

The following attestation bundles were made for appmerit-0.1.3-py3-none-any.whl:

Publisher: publish.yml on appMerit/merit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page