Skip to main content

Evaluation of pretrained language models on mono- or multilingual language tasks.

Project description

Evaluation of pretrained language models on mono- or multilingual language tasks.


PyPI Status Paper License LastCommit Code Coverage Contributor Covenant

Maintainers

Installation

To install the package simply write the following command in your favorite terminal:

$ pip install scandeval

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

$ scandeval --model <model-id>

Here model is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the tasks available. If you want to benchmark on a particular task, then use the --task argument:

$ scandeval --model <model-id> --task sentiment-classification

We can also narrow down which languages we would like to benchmark on. This can be done by setting the --language argument. Here we thus benchmark the model on the Danish sentiment classification task:

$ scandeval --model <model-id> --task sentiment-classification --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:

$ scandeval --model <model-id1> --model <model-id2>

The specific model version/revision to use can also be added after the suffix '@':

$ scandeval --model <model-id>@<commit>

This can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the scandeval command by typing

$ scandeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmarker class, and call this benchmark object with your favorite model:

>>> from scandeval import Benchmarker
>>> benchmark = Benchmarker()
>>> benchmark(model="<model>")

To benchmark on a specific task and/or language, you simply specify the task or language arguments, shown here with same example as above:

>>> benchmark(model="<model>", task="sentiment-classification", language="da")

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can simply leave out the model argument. In this example, we're benchmarking all Danish models on the Danish sentiment classification task:

>>> benchmark(task="sentiment-classification", language="da")

Citing ScandEval

If you want to cite the framework then feel free to use this:

@inproceedings{nielsen2023scandeval,
  title={ScandEval: A Benchmark for Scandinavian Natural Language Processing},
  author={Nielsen, Dan Saattrup},
  booktitle={The 24rd Nordic Conference on Computational Linguistics},
  year={2023}
}

Remarks

The image used in the logo has been created by the amazing Scandinavia and the World team. Go check them out!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandeval-12.1.0.tar.gz (94.3 kB view hashes)

Uploaded Source

Built Distribution

scandeval-12.1.0-py3-none-any.whl (111.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page