HuggingFace community-driven open-source library of evaluation

These details have not been verified by PyPI

Project links

Project description

Tip: For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library LightEval.

🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.

It currently contains:

implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like accuracy = load("accuracy"), get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).
comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
an easy way of adding new evaluation modules to the 🤗 Hub: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with evaluate-cli create [metric name], which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.

🎓 Documentation

🔎 Find a metric, comparison, measurement on the Hub

🌟 Add a new evaluation module

🤗 Evaluate also has lots of useful features like:

Type checking: the input types are checked to make sure that you are using the right input formats for each metric
Metric cards: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.
Community metrics: Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.

Installation

With pip

🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)

pip install evaluate

Usage

🤗 Evaluate's main methods are:

evaluate.list_evaluation_modules() to list the available metrics, comparisons and measurements
evaluate.load(module_name, **kwargs) to instantiate an evaluation module
results = module.compute(*kwargs) to compute the result of an evaluation module

Adding a new evaluation module

First install the necessary dependencies to create a new metric with the following command:

pip install evaluate[template]

Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:

evaluate-cli create "Awesome Metric"

See this step-by-step guide in the documentation for detailed instructions.

Credits

Thanks to @marella for letting us use the evaluate namespace on PyPi previously used by his library.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.6

Sep 18, 2025

0.4.5

Jul 10, 2025

0.4.4

Jun 20, 2025

0.4.3

Sep 11, 2024

0.4.2

Apr 30, 2024

0.4.1

Oct 13, 2023

0.4.0

Dec 13, 2022

0.3.0

Oct 13, 2022

0.2.2

Jul 29, 2022

0.2.1

Jul 28, 2022

0.2.0

Jul 25, 2022

0.1.2

Jun 16, 2022

0.1.1

Jun 8, 2022

0.1.0

May 31, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evaluate-0.4.6.tar.gz (65.7 kB view details)

Uploaded Sep 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evaluate-0.4.6-py3-none-any.whl (84.1 kB view details)

Uploaded Sep 18, 2025 Python 3

File details

Details for the file evaluate-0.4.6.tar.gz.

File metadata

Download URL: evaluate-0.4.6.tar.gz
Upload date: Sep 18, 2025
Size: 65.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for evaluate-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`e07036ca12b3c24331f83ab787f21cc2dbf3631813a1631e63e40897c69a3f21`
MD5	`f41264e8168ce9fc0abe8591acffe5ee`
BLAKE2b-256	`add00c17a8e6e8dc7245f22dea860557c32bae50fc4d287ae030cb0e8ab8720f`

See more details on using hashes here.

File details

Details for the file evaluate-0.4.6-py3-none-any.whl.

File metadata

Download URL: evaluate-0.4.6-py3-none-any.whl
Upload date: Sep 18, 2025
Size: 84.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for evaluate-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bca85bc294f338377b7ac2f861e21c308b11b2a285f510d7d5394d5df437db29`
MD5	`3175e5149720d3715ea74dd1ae36a759`
BLAKE2b-256	`3eaf3e990d8d4002bbc9342adb4facd59506e653da93b2417de0fa6027cb86b1`

See more details on using hashes here.

evaluate 0.4.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

With pip

Usage

Adding a new evaluation module

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes