FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.
Project description
FlexEval LLM Evals
FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.
Documentation: https://digitalharborfoundation.github.io/FlexEval
Additional details about FlexEval can be found in our paper at the Educational Data Mining 2024 conference.
Usage
Basic usage:
import flexeval
from flexeval.schema import Eval, EvalRun, FileDataSource, Metrics, FunctionItem, Config
data_sources = [FileDataSource(path="vignettes/conversations.jsonl")]
eval = Eval(metrics=Metrics(function=[FunctionItem(name="flesch_reading_ease")]))
config = Config(clear_tables=True)
eval_run = EvalRun(
data_sources=data_sources,
database_path="eval_results.db",
eval=eval,
config=config,
)
flexeval.run(eval_run)
This example computes Flesch reading ease for every turn in a list of conversations provided in JSONL format. The metric values are stored in an SQLite database called eval_results.db
.
See additional usage examples in the vignettes.
Installation
FlexEval is on PyPI as python-flexeval
. See the Installation section in the Getting Started guide.
Using pip
:
pip install python-flexeval
Basic functionality
FlexEval is designed to be "batteries included" for many basic use cases. It supports the following out-of-the-box:
- scoring historical conversations - useful for monitoring live systems.
- scoring LLMs:
- locally hosted and served via an endpoint using something like LM Studio
- LLMs accessible by a REST endpoint and accessible via a network call
- any OpenAI LLM
- a set of useful rubrics
- a set of useful Python functions
Evaluation results are saved in an SQLite database. See the Metric Analysis vignette for a sample analysis demonstrating the structure and utility of the data saved by FlexEval.
Read more in the Getting Started guide.
Cite this work
If this work is useful to you, please cite our EDM 2024 paper:
S. Thomas Christie, Baptiste Moreau-Pernet, Yu Tian, & John Whitmer. (2024). FlexEval: a customizable tool for chatbot performance evaluation and dialogue analysis. Proceedings of the 17th International Conference on Educational Data Mining, 903-908. Atlanta, Georgia, USA, July 2024. https://doi.org/10.5281/zenodo.12729993
Development
Pull requests are welcome. Feel free to contribute:
- New rubrics or functions
- Bug fixes
- New features
See DEVELOPMENT.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file python_flexeval-0.1.5.tar.gz
.
File metadata
- Download URL: python_flexeval-0.1.5.tar.gz
- Upload date:
- Size: 648.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2159c759e2c1fbecd96f4cce314a97100cba90ad3bd5f522266978496cef18da
|
|
MD5 |
b052843c4301ddccdac0c1e4bccde549
|
|
BLAKE2b-256 |
29e40d9fbc5c53ff8c01b971e1b7602ed780e1256abe03c6c228c6b1b41ece20
|
Provenance
The following attestation bundles were made for python_flexeval-0.1.5.tar.gz
:
Publisher:
deploy-to-pypi.yml
on DigitalHarborFoundation/FlexEval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
python_flexeval-0.1.5.tar.gz
-
Subject digest:
2159c759e2c1fbecd96f4cce314a97100cba90ad3bd5f522266978496cef18da
- Sigstore transparency entry: 338183866
- Sigstore integration time:
-
Permalink:
DigitalHarborFoundation/FlexEval@d4fd0f821c66c4d6dd5ab5d5501c6df999a13330
-
Branch / Tag:
refs/tags/v0.1.5
- Owner: https://github.com/DigitalHarborFoundation
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
deploy-to-pypi.yml@d4fd0f821c66c4d6dd5ab5d5501c6df999a13330
-
Trigger Event:
push
-
Statement type:
File details
Details for the file python_flexeval-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: python_flexeval-0.1.5-py3-none-any.whl
- Upload date:
- Size: 71.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e78e176c407a05e2470ebf8f569a5a04b071a128f49f80fb619c1940f354d907
|
|
MD5 |
ba94c7b4b5c093baa933e1bf2c47fec5
|
|
BLAKE2b-256 |
b58a164a5779c61da79c717aebd1f87ba421f6ea8cf284c4871def28889ae5a3
|
Provenance
The following attestation bundles were made for python_flexeval-0.1.5-py3-none-any.whl
:
Publisher:
deploy-to-pypi.yml
on DigitalHarborFoundation/FlexEval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
python_flexeval-0.1.5-py3-none-any.whl
-
Subject digest:
e78e176c407a05e2470ebf8f569a5a04b071a128f49f80fb619c1940f354d907
- Sigstore transparency entry: 338183884
- Sigstore integration time:
-
Permalink:
DigitalHarborFoundation/FlexEval@d4fd0f821c66c4d6dd5ab5d5501c6df999a13330
-
Branch / Tag:
refs/tags/v0.1.5
- Owner: https://github.com/DigitalHarborFoundation
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
deploy-to-pypi.yml@d4fd0f821c66c4d6dd5ab5d5501c6df999a13330
-
Trigger Event:
push
-
Statement type: