ScandEval

Evaluation of pretrained language models on mono- or multilingual language tasks.

These details have not been verified by PyPI

Project links

Project description

Evaluation of pretrained language models on mono- or multilingual language tasks.

Maintainers

Dan Saattrup Nielsen (@saattrupdan, dan.nielsen@alexandra.dk)
Kenneth Enevoldsen (@KennethEnevoldsen, kenneth.enevoldsen@cas.au.dk)

Installation

To install the package simply write the following command in your favorite terminal:

$ pip install scandeval[all]

This will install the ScandEval package with all extras. You can also install the minimal version by leaving out the [all], in which case the package will let you know when an evaluation requires a certain extra dependency, and how you install it.

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

$ scandeval --model <model-id>

Here model is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the tasks available. If you want to benchmark on a particular task, then use the --task argument:

$ scandeval --model <model-id> --task sentiment-classification

We can also narrow down which languages we would like to benchmark on. This can be done by setting the --language argument. Here we thus benchmark the model on the Danish sentiment classification task:

$ scandeval --model <model-id> --task sentiment-classification --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:

$ scandeval --model <model-id1> --model <model-id2>

The specific model version/revision to use can also be added after the suffix '@':

$ scandeval --model <model-id>@<commit>

This can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the scandeval command by typing

$ scandeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmarker class, and call this benchmark object with your favorite model:

>>> from scandeval import Benchmarker
>>> benchmark = Benchmarker()
>>> benchmark(model="<model>")

To benchmark on a specific task and/or language, you simply specify the task or language arguments, shown here with same example as above:

>>> benchmark(model="<model>", task="sentiment-classification", language="da")

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can simply leave out the model argument. In this example, we're benchmarking all Danish models on the Danish sentiment classification task:

>>> benchmark(task="sentiment-classification", language="da")

Benchmarking from Docker

A Dockerfile is provided in the repo, which can be downloaded and run, without needing to clone the repo and installing from source. This can be fetched programmatically by running the following:

$ wget https://raw.githubusercontent.com/ScandEval/ScandEval/main/Dockerfile.cuda

Next, to be able to build the Docker image, first ensure that the NVIDIA Container Toolkit is installed and configured. Ensure that the the CUDA version stated at the top of the Dockerfile matches the CUDA version installed (which you can check using nvidia-smi). After that, we build the image as follows:

$ docker build --pull -t scandeval -f Dockerfile.cuda .

With the Docker image built, we can now evaluate any model as follows:

$ docker run -e args="<scandeval-arguments>" --gpus 1 --name scandeval --rm scandeval

Here <scandeval-arguments> consists of the arguments added to the scandeval CLI argument. This could for instance be --model <model-id> --task sentiment-classification.

Special Thanks :pray:

Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4 in Dutch.
Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4 in Icelandic and Faroese.

Citing ScandEval

If you want to cite the framework then feel free to use this:

@inproceedings{nielsen2023scandeval,
  author = {Nielsen, Dan Saattrup},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  pages = {185--201},
  title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
  year = {2023}
}

Remarks

The image used in the logo has been created by the amazing Scandinavia and the World team. Go check them out!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

13.1.0

Oct 31, 2024

13.0.0

Jul 31, 2024

12.11.0

Jul 3, 2024

12.10.8

Jun 21, 2024

12.10.7

Jun 19, 2024

12.10.6

Jun 19, 2024

12.10.5

Jun 12, 2024

12.10.4

Jun 3, 2024

12.10.3

Jun 3, 2024

12.10.2

May 30, 2024

12.10.1

May 28, 2024

12.10.0

May 8, 2024

12.9.1

Apr 30, 2024

12.9.0

Apr 26, 2024

12.8.0

Apr 23, 2024

12.7.0

Apr 19, 2024

This version

12.6.1

Apr 11, 2024

12.6.0

Apr 10, 2024

12.5.3

Apr 5, 2024

12.5.2

Apr 4, 2024

12.5.1

Apr 3, 2024

12.5.0

Apr 2, 2024

12.4.0

Mar 27, 2024

12.3.2

Mar 19, 2024

12.3.1

Mar 13, 2024

12.3.0

Mar 13, 2024

12.2.1

Mar 12, 2024

12.2.0

Mar 11, 2024

12.1.0

Feb 29, 2024

12.0.0

Feb 26, 2024

11.0.0

Feb 16, 2024

10.0.1

Feb 12, 2024

10.0.0

Feb 12, 2024

9.3.2

Feb 5, 2024

9.3.1

Jan 31, 2024

9.3.0

Jan 29, 2024

9.2.0

Jan 24, 2024

9.1.2

Jan 16, 2024

9.1.1

Jan 15, 2024

9.1.0

Jan 14, 2024

9.0.0

Jan 12, 2024

8.2.1

Dec 20, 2023

8.2.0

Dec 20, 2023

8.1.0

Dec 4, 2023

8.0.0

Nov 29, 2023

7.1.1

Jul 1, 2023

7.1.0

May 15, 2023

7.0.0

May 13, 2023

6.3.0

Apr 12, 2023

6.2.4

Mar 10, 2023

6.2.3

Feb 27, 2023

6.2.2

Feb 25, 2023

6.2.1

Feb 22, 2023

6.2.0

Jan 9, 2023

6.1.1

Jan 2, 2023

6.1.0

Dec 29, 2022

6.0.1

Dec 28, 2022

6.0.0

Dec 24, 2022

5.0.0

Nov 3, 2022

4.0.2

Jul 22, 2022

4.0.1

Jul 14, 2022

4.0.0

Jul 14, 2022

3.0.0

Apr 19, 2022

2.3.2

Feb 11, 2022

2.3.1

Feb 11, 2022

2.3.0

Jan 20, 2022

2.2.0

Jan 18, 2022

2.1.0

Jan 17, 2022

2.0.0

Jan 7, 2022

1.5.9

Dec 14, 2021

1.5.8

Dec 13, 2021

1.5.7

Dec 10, 2021

1.5.6

Dec 10, 2021

1.5.5

Dec 8, 2021

1.5.4

Dec 8, 2021

1.5.3

Dec 8, 2021

1.5.2

Dec 8, 2021

1.5.1

Nov 27, 2021

1.5.0

Nov 26, 2021

1.4.0

Nov 25, 2021

1.3.8

Nov 25, 2021

1.3.7

Nov 25, 2021

1.3.6

Nov 25, 2021

1.3.5

Nov 23, 2021

1.3.4

Nov 11, 2021

1.3.3

Nov 11, 2021

1.3.2

Nov 11, 2021

1.3.1

Nov 11, 2021

1.3.0

Nov 11, 2021

1.2.1

Nov 11, 2021

1.2.0

Oct 15, 2021

1.1.3

Oct 4, 2021

1.1.2

Sep 26, 2021

1.1.1

Sep 26, 2021

1.1.0

Sep 13, 2021

1.0.2

Sep 9, 2021

1.0.1

Sep 9, 2021

1.0.0

Sep 9, 2021

0.17.0

Sep 9, 2021

0.16.0

Sep 7, 2021

0.15.1

Sep 3, 2021

0.15.0

Sep 2, 2021

0.14.1

Sep 2, 2021

0.14.0

Aug 31, 2021

0.13.0

Aug 30, 2021

0.12.0

Aug 26, 2021

0.11.2

Aug 25, 2021

0.11.1

Aug 24, 2021

0.11.0

Aug 23, 2021

0.10.1

Aug 20, 2021

0.10.0

Aug 20, 2021

0.9.0

Aug 19, 2021

0.8.0

Aug 18, 2021

0.7.0

Aug 17, 2021

0.6.0

Aug 15, 2021

0.5.2

Aug 13, 2021

0.5.1

Aug 13, 2021

0.5.0

Aug 12, 2021

0.4.3

Aug 12, 2021

0.4.2

Aug 12, 2021

0.4.1

Aug 12, 2021

0.4.0

Aug 11, 2021

0.3.1

Aug 10, 2021

0.3.0

Aug 10, 2021

0.2.0

Aug 9, 2021

0.1.0

Aug 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandeval-12.6.1.tar.gz (99.8 kB view hashes)

Uploaded Apr 11, 2024 Source

Built Distribution

scandeval-12.6.1-py3-none-any.whl (116.5 kB view hashes)

Uploaded Apr 11, 2024 Python 3

Hashes for scandeval-12.6.1.tar.gz

Hashes for scandeval-12.6.1.tar.gz
Algorithm	Hash digest
SHA256	`36eae95182d45ae247104d3d6cf4db500f53177a87b66cdd6ac21c59df08d524`
MD5	`bd5d963cac2007c3b0bcbe381de2a096`
BLAKE2b-256	`2a34c5ac89b993f33db9c2de88dfe00839438d62e08a6eb54ba7b832c783dae4`

Hashes for scandeval-12.6.1-py3-none-any.whl

Hashes for scandeval-12.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4cba93346221d251b8607439e84c324ff30644e7583e652c66d6979bbb79023d`
MD5	`32d6180372d4f8a7aca1c84bc190c1f7`
BLAKE2b-256	`af69b1b66500bce578a21d05af48bc92ca229fe48afdbfe534e0caac2d1303a4`