Skip to main content

HuggingFace community-driven open-source library of evaluation

Project description



Build GitHub Documentation GitHub release Contributor Covenant

🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.

It currently contains:

  • implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like accuracy = load("accuracy"), get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).
  • includes comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
  • an easy way of adding new evaluation modules to the 🤗 Hub: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with evaluate-cli create [metric name], which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.

🎓 Documentation

🔎 Find a metric, comparison, measurement on the Hub

🌟 Add a new evaluation module

🤗 Evaluate also has lots of useful features like:

  • Type checking: the input types are checked to make sure that you are using the right input formats for each metric
  • Metric cards: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.
  • Community metrics: Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.

Installation

With pip

🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)

pip install evaluate

With conda

🤗 Evaluate can be installed using conda as follows:

conda install -c huggingface -c conda-forge evaluate

For more details on installation, check the installation page in the documentation: https://huggingface.co/docs/evaluate/installation

Usage

🤗 Evaluate's main methods are:

  • evaluate.list_evaluation_modules() to list the available metrics, comparisons and measurements
  • evaluate.load(module_name, **kwargs) to instantiate an evaluation module
  • results = module.compute(*kwargs) to compute the result of an evaluation module

Adding a new evaluation module

First install the necessary dependencies to create a new metric with the following command:

pip install evaluate[template]

Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:

evaluate-cli create "Awesome Metric"

See this step-by-step guide in the documentation for detailed instructions.

Credits

Thanks to @marella for letting us use the evaluate namespace on PyPi previously used by his library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evaluate-0.1.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

evaluate-0.1.0-py3-none-any.whl (68.7 kB view details)

Uploaded Python 3

File details

Details for the file evaluate-0.1.0.tar.gz.

File metadata

  • Download URL: evaluate-0.1.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for evaluate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9afb60958184366613a9956658a6f8bec75a199573dd70e7c0c058111e21fed4
MD5 4880f054bec3862bf21cd5c4a35bed43
BLAKE2b-256 d97d32efd61097363a7037723a2abbcac5b3be9f619b1cd5596077441f28c2a0

See more details on using hashes here.

File details

Details for the file evaluate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evaluate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for evaluate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06553e7953a2b40973d21b9e7d6bf02df8d5b4ae3548d1c2cad0fe9eb4f34967
MD5 af2e0d8aebcf02a3edc5e71ea7e0912f
BLAKE2b-256 b9334a6ab48b78b185ab7b128dbebf20ab42aef62bc160c1a51bd442ff20f08c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page