EuroEval

The robust European language model benchmark.

These details have not been verified by PyPI

Project links

Project description

The robust European language model benchmark.

(formerly known as ScandEval)

Maintainers

Dan Saattrup Nielsen (@saattrupdan, dan.nielsen@alexandra.dk)
Kenneth Enevoldsen (@KennethEnevoldsen, kenneth.enevoldsen@cas.au.dk)

Installation

To install the package simply write the following command in your favorite terminal:

$ pip install euroeval[all]

This will install the EuroEval package with all extras. You can also install the minimal version by leaving out the [all], in which case the package will let you know when an evaluation requires a certain extra dependency, and how you install it.

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

$ euroeval --model <model-id>

Here model is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the tasks available. If you want to benchmark on a particular task, then use the --task argument:

$ euroeval --model <model-id> --task sentiment-classification

We can also narrow down which languages we would like to benchmark on. This can be done by setting the --language argument. Here we thus benchmark the model on the Danish sentiment classification task:

$ euroeval --model <model-id> --task sentiment-classification --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:

$ euroeval --model <model-id1> --model <model-id2>

The specific model version/revision to use can also be added after the suffix '@':

$ euroeval --model <model-id>@<commit>

This can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the euroeval command by typing

$ euroeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmarker class, and call this benchmark object with your favorite model:

>>> from euroeval import Benchmarker
>>> benchmark = Benchmarker()
>>> benchmark(model="<model>")

To benchmark on a specific task and/or language, you simply specify the task or language arguments, shown here with same example as above:

>>> benchmark(model="<model>", task="sentiment-classification", language="da")

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can simply leave out the model argument. In this example, we're benchmarking all Danish models on the Danish sentiment classification task:

>>> benchmark(task="sentiment-classification", language="da")

Benchmarking from Docker

A Dockerfile is provided in the repo, which can be downloaded and run, without needing to clone the repo and installing from source. This can be fetched programmatically by running the following:

$ wget https://raw.githubusercontent.com/EuroEval/EuroEval/main/Dockerfile.cuda

Next, to be able to build the Docker image, first ensure that the NVIDIA Container Toolkit is installed and configured. Ensure that the the CUDA version stated at the top of the Dockerfile matches the CUDA version installed (which you can check using nvidia-smi). After that, we build the image as follows:

$ docker build --pull -t euroeval -f Dockerfile.cuda .

With the Docker image built, we can now evaluate any model as follows:

$ docker run -e args="<euroeval-arguments>" --gpus 1 --name euroeval --rm euroeval

Here <euroeval-arguments> consists of the arguments added to the euroeval CLI argument. This could for instance be --model <model-id> --task sentiment-classification.

Reproducing the datasets

All datasets used in this project are generated using the scripts located in the src/scripts folder. To reproduce a dataset, run the corresponding script with the following command

$ uv run src/scripts/<name-of-script>.py

Replace with the specific script you wish to execute, e.g.,

$ uv run src/scripts/create_allocine.py

Special Thanks :pray:

Thanks @Mikeriess for evaluating many of the larger models on the leaderboards.
Thanks to OpenAI for sponsoring OpenAI credits as part of their Researcher Access Program.
Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4-turbo in Dutch.
Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in Icelandic and Faroese.
Thanks to CHC for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in German.

Citing EuroEval

If you want to cite the framework then feel free to use this:

@article{nielsen2024encoder,
  title={Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks},
  author={Nielsen, Dan Saattrup and Enevoldsen, Kenneth and Schneider-Kamp, Peter},
  journal={arXiv preprint arXiv:2406.13469},
  year={2024}
}
@inproceedings{nielsen2023scandeval,
  author = {Nielsen, Dan Saattrup},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  pages = {185--201},
  title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
  year = {2023}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

17.2.0

Apr 17, 2026

17.1.0

Mar 24, 2026

17.0.0

Mar 16, 2026

16.17.0

Mar 9, 2026

16.16.1

Feb 25, 2026

16.16.0

Feb 25, 2026

16.15.0

Feb 18, 2026

16.14.0

Feb 13, 2026

16.13.0

Feb 6, 2026

16.12.0

Feb 2, 2026

16.11.0

Jan 21, 2026

16.10.1

Jan 2, 2026

16.10.0

Dec 30, 2025

16.9.0

Dec 16, 2025

16.8.0

Nov 25, 2025

16.7.1

Nov 18, 2025

16.7.0

Nov 10, 2025

16.6.0

Nov 4, 2025

16.5.0

Oct 28, 2025

16.4.0

Oct 21, 2025

16.3.0

Sep 23, 2025

16.2.2

Sep 15, 2025

16.2.1

Sep 15, 2025

16.2.0

Sep 15, 2025

16.1.1

Sep 12, 2025

16.1.0

Sep 11, 2025

16.0.1

Sep 7, 2025

16.0.0

Sep 5, 2025

15.16.0

Aug 12, 2025

15.15.0

Aug 6, 2025

15.14.0

Jul 30, 2025

15.13.0

Jul 21, 2025

15.12.0

Jul 19, 2025

15.11.0

Jul 15, 2025

15.10.1

Jun 20, 2025

15.10.0

Jun 17, 2025

15.9.2

Jun 4, 2025

15.9.1

Jun 1, 2025

15.9.0

May 31, 2025

15.8.2

May 12, 2025

15.8.1

May 8, 2025

15.8.0

May 7, 2025

15.7.2

May 2, 2025

15.7.1

Apr 29, 2025

15.7.0

Apr 28, 2025

15.6.1

Apr 14, 2025

15.6.0

Apr 13, 2025

This version

15.5.0

Apr 7, 2025

15.4.2

Mar 31, 2025

15.4.1

Mar 25, 2025

15.4.0

Mar 24, 2025

15.3.1

Mar 13, 2025

15.3.0

Mar 12, 2025

15.2.0

Feb 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

euroeval-15.5.0.tar.gz (1.3 MB view details)

Uploaded Apr 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

euroeval-15.5.0-py3-none-any.whl (128.0 kB view details)

Uploaded Apr 7, 2025 Python 3

File details

Details for the file euroeval-15.5.0.tar.gz.

File metadata

Download URL: euroeval-15.5.0.tar.gz
Upload date: Apr 7, 2025
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for euroeval-15.5.0.tar.gz
Algorithm	Hash digest
SHA256	`00d9715b67439636cca14eb0f7a9090c2b0de0f341e840efff2bbdd12179df0f`
MD5	`b650160181472fcfa85d9c1fd691db65`
BLAKE2b-256	`4eca8d4943c1e3aa7cc261ea3d0fdab5cf6c1020f3f2aae2579dfbf2cab03f88`

See more details on using hashes here.

File details

Details for the file euroeval-15.5.0-py3-none-any.whl.

File metadata

Download URL: euroeval-15.5.0-py3-none-any.whl
Upload date: Apr 7, 2025
Size: 128.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.12

File hashes

Hashes for euroeval-15.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26cfe7bc0761158dfbe2c3620443d6e2ebf39e1dc86794615ae8d03f3aa54e23`
MD5	`89f73ab0b0bb5e3592bd8417b6d8f300`
BLAKE2b-256	`8b69924313a135349c07da804ee6d0400055497febbce5e7645082680b3a80b4`

See more details on using hashes here.

EuroEval 15.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

The robust European language model benchmark.

Maintainers

Installation

Quickstart

Benchmarking from the Command Line

Benchmarking from a Script

Benchmarking from Docker

Reproducing the datasets

Special Thanks :pray:

Citing EuroEval

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes