FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

BaptisteMP levon003 thomas-christie

These details have not been verified by PyPI

Project description

FlexEval LLM Evals

FlexEval banner

FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.

Documentation: https://digitalharborfoundation.github.io/FlexEval

Additional details about FlexEval can be found in our paper at the Educational Data Mining 2024 conference.

Usage

Basic usage:

import flexeval
from flexeval.schema import Eval, EvalRun, FileDataSource, Metrics, FunctionItem, Config

data_sources = [FileDataSource(path="vignettes/conversations.jsonl")]
eval = Eval(metrics=Metrics(function=[FunctionItem(name="flesch_reading_ease")]))
config = Config(clear_tables=True)
eval_run = EvalRun(
    data_sources=data_sources,
    database_path="eval_results.db",
    eval=eval,
    config=config,
)
flexeval.run(eval_run)

This example computes Flesch reading ease for every turn in a list of conversations provided in JSONL format. The metric values are stored in an SQLite database called eval_results.db.

See additional usage examples in the vignettes.

Installation

FlexEval is on PyPI as python-flexeval. See the Installation section in the Getting Started guide.

Using pip:

pip install python-flexeval

Basic functionality

FlexEval is designed to be "batteries included" for many basic use cases. It supports the following out-of-the-box:

scoring historical conversations - useful for monitoring live systems.
scoring LLMs:
- locally hosted and served via an endpoint using something like LM Studio
- LLMs accessible by a REST endpoint and accessible via a network call
- any OpenAI LLM
a set of useful rubrics
a set of useful Python functions

Evaluation results are saved in an SQLite database. See the Metric Analysis vignette for a sample analysis demonstrating the structure and utility of the data saved by FlexEval.

Cite this work

If this work is useful to you, please cite our EDM 2024 paper:

S. Thomas Christie, Baptiste Moreau-Pernet, Yu Tian, & John Whitmer. (2024). FlexEval: a customizable tool for chatbot performance evaluation and dialogue analysis. Proceedings of the 17th International Conference on Educational Data Mining, 903-908. Atlanta, Georgia, USA, July 2024. https://doi.org/10.5281/zenodo.12729993

Development

Pull requests are welcome. Feel free to contribute:

New rubrics or functions
Bug fixes
New features

See DEVELOPMENT.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

BaptisteMP levon003 thomas-christie

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

May 21, 2026

0.3.0

Sep 15, 2025

This version

0.2.0

Sep 9, 2025

0.1.5

Aug 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_flexeval-0.2.0.tar.gz (680.6 kB view details)

Uploaded Sep 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

python_flexeval-0.2.0-py3-none-any.whl (72.4 kB view details)

Uploaded Sep 9, 2025 Python 3

File details

Details for the file python_flexeval-0.2.0.tar.gz.

File metadata

Download URL: python_flexeval-0.2.0.tar.gz
Upload date: Sep 9, 2025
Size: 680.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for python_flexeval-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c5337a9007c4e60505752b5a2f2d4916de432938038347732703601b88b3b0d5`
MD5	`04a6a23c0faf4856d1220d4457a9bf39`
BLAKE2b-256	`deab6e9017a32383f3257f6f9ec13287abf2f166e63a1adbd66ed9e7b4fc87d7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_flexeval-0.2.0.tar.gz:

Publisher: deploy-to-pypi.yml on DigitalHarborFoundation/FlexEval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: python_flexeval-0.2.0.tar.gz
- Subject digest: c5337a9007c4e60505752b5a2f2d4916de432938038347732703601b88b3b0d5
- Sigstore transparency entry: 488555683
- Sigstore integration time: Sep 9, 2025
Source repository:
- Permalink: DigitalHarborFoundation/FlexEval@b2103da1e0d51d5954be0edcaf88c28490ee0572
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DigitalHarborFoundation
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy-to-pypi.yml@b2103da1e0d51d5954be0edcaf88c28490ee0572
- Trigger Event: push

File details

Details for the file python_flexeval-0.2.0-py3-none-any.whl.

File metadata

Download URL: python_flexeval-0.2.0-py3-none-any.whl
Upload date: Sep 9, 2025
Size: 72.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for python_flexeval-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b5a87ad3f1431c421d2debf0d0ad156c6ad280a1bb2cafa7d902e763ee185be`
MD5	`34a531104363de06851f3f348e1b1701`
BLAKE2b-256	`21415727be6e4033fa15c80be69f8a3e0ccab81510309aaa40a4ef669d7fb753`

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_flexeval-0.2.0-py3-none-any.whl:

Publisher: deploy-to-pypi.yml on DigitalHarborFoundation/FlexEval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: python_flexeval-0.2.0-py3-none-any.whl
- Subject digest: 3b5a87ad3f1431c421d2debf0d0ad156c6ad280a1bb2cafa7d902e763ee185be
- Sigstore transparency entry: 488555693
- Sigstore integration time: Sep 9, 2025
Source repository:
- Permalink: DigitalHarborFoundation/FlexEval@b2103da1e0d51d5954be0edcaf88c28490ee0572
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DigitalHarborFoundation
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy-to-pypi.yml@b2103da1e0d51d5954be0edcaf88c28490ee0572
- Trigger Event: push

python-flexeval 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FlexEval LLM Evals

Usage

Installation

Basic functionality

Cite this work

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance