Skip to main content

A simple package for LLM application evaluation.

Project description

maibench Python SDK

Python toolings to help run evaluations on large langauge model chains.

Getting started

First, install the package.

pip install maibenchai

Next, create a completion function. This maps different inputs your LLM application response. Here's an example.

 from maibench import CompletionFn

 class ExampleCompletionFn(CompletionFn):
    @staticmethod
    def complete(**kwargs) -> str:
        return "This is an example response from an LLM application."

Next, create some text cases. These test cases should each contain the input kwargs of your completion function in additon to an expected output and a grade_fn. Supported grade functions are currently match, includes, fuzzy_match, not_match, not_includes, and not_fuzzy_match. More complex grading functions and model-based grading is coming soon.

EXAMPLE_TEST_CASES = [
    {
        "input": "This is an example input.",
        "expected": "This is an example response from an LLM application.",
        "grade_fn": "match",
    },
    {
        "input": "This is an example input 2.",
        "expected": "This is an example response from an LLM application.",
        "grade_fn": "match",
    },
]

Create and run.

EXAMPLE_TEST_SUITE = TestSuite(name="maibench.example_test_suite", completion_fn=ExampleCompletionFn, test_cases=EXAMPLE_TEST_CASES)
results, id = EXAMPLE_TEST_SUITE.run()

See results at https://maibench.ai/individual-result?id=[INSERT ID]

We hope you enjoy!

Contact support@maibench.ai with any questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maibenchai-0.2.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

maibenchai-0.2-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page