Skip to main content

AI Testing Framework

Project description

Merit

License: MIT Python 3.12+ Tests Checks

Merit is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.


Installation

uv add appmerit

Merit 101

Follow pytest habits...

  • Create 'merit_*.py' files
  • Write 'def merit_*' functions
  • Use 'merit.resource' instead of 'pytest.fixture'
  • Add 'assert' expressions within the functions
  • Run 'uv run merit test'

...while leveraging Merit APIs.

  • Use 'with metrics()' context to turn failed assertions into quality metrics
  • Use 'has_facts()' and other semantic predicates for asserting natural language
  • Access OTEL span data and assert it with 'follows_policy()' predicate
  • Parse datasets into clearly typed and validated data objects

Example

import merit
from merit import Case, Metric, metrics
from merit.predicates import has_unsupported_facts, follows_policy

from pydantic import BaseModel

@merit.sut
def store_chatbot(prompt: str) -> str:
    return call_llm(prompt)

@merit.metric
def accuracy():
    metric = Metric()
    yield metric

    assert metric.mean > 0.8
    yield metric.mean

class Refs(BaseModel):
    kb: str
    expected_tool: str | None = None

cases = [
    Case(sut_input_values={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
    Case(sut_input_values={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
    Case(sut_input_values={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]

@merit.iter_cases(cases)
@merit.repeat(3)
async def merit_chatbot_no_hallucinations(
    case: Case[Refs],
    store_chatbot,
    accuracy: Metric,
    trace_context):
    """AI agent relies on knowledge base and tool calls for transactional questions"""
    response = store_chatbot(**case.sut_input_values)

    # Verify the answer don't have any unsupported facts
    with metrics(accuracy):
        assert not await has_unsupported_facts(response, case.references.kb)

    # Verify tool was called when expected
    if expected_tool := case.references.expected_tool:
        sut_spans = trace_context.get_sut_spans(name="store_chatbot")
        tool_names = [
            s.attributes.get("llm.request.functions.0.name")
            for s in trace_context.get_llm_calls()
            if s.attributes
        ]
        assert expected_tool in tool_names

Run it:

merit test --trace

Use a custom run UUID when you need stable correlation IDs:

merit test --trace --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d

Output:

Merit Test Runner
=================

Collected 1 test

test_example.py::merit_chatbot_responds ✓

==================== 1 passed in 0.08s ====================

Documentation

Full documentation: docs.appmerit.com

Getting Started:

Usage:

Concepts:

API Reference:


Contributing

We welcome contributions! To get started:

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/YOUR_USERNAME/merit.git
  3. Create a branch: git checkout -b your-feature-name
  4. Install dependencies: uv sync
  5. Make your changes
  6. Run tests: uv run merit test
  7. Run lints: uv run ruff check .
  8. Submit a pull request

For more details, see CONTRIBUTING.md.

Development Setup:

# Clone the repository
git clone https://github.com/appMerit/merit.git
cd merit

# Install dependencies
uv sync

# Run tests
uv run merit test

# Run lints
uv run ruff check .
uv run mypy .

License

This project is licensed under the MIT License - see the LICENSE file for details.


Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

appmerit-0.1.4.tar.gz (602.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

appmerit-0.1.4-py3-none-any.whl (97.5 kB view details)

Uploaded Python 3

File details

Details for the file appmerit-0.1.4.tar.gz.

File metadata

  • Download URL: appmerit-0.1.4.tar.gz
  • Upload date:
  • Size: 602.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for appmerit-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f32e4776088f924db933c9ecbececb6107c978ddcd15040bc0fe0bc1851af296
MD5 4dd825cb2984ef5b5f1992697cc5abfb
BLAKE2b-256 6d491fb54c80be3b85e19d674d8969db8ec0b2e9d3ad01a8245d5187ab062797

See more details on using hashes here.

Provenance

The following attestation bundles were made for appmerit-0.1.4.tar.gz:

Publisher: publish.yml on appMerit/merit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file appmerit-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: appmerit-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 97.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for appmerit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4ad58507aef2d12f454e0dc74e952ed17b00f0d6c087f80e2c6f85c76584f4b1
MD5 e795108210f628663d149ac1f55b5a48
BLAKE2b-256 edbe68e6ca06417172619359e5ada56d35a9d81e76855b28267a30cf0572dea5

See more details on using hashes here.

Provenance

The following attestation bundles were made for appmerit-0.1.4-py3-none-any.whl:

Publisher: publish.yml on appMerit/merit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page