ScandEval

Evaluation of pretrained language models on mono- or multilingual language tasks.

These details have not been verified by PyPI

Project links

Project description

Evaluation of pretrained language models on mono- or multilingual language tasks.

Maintainers

Dan Saattrup Nielsen (@saattrupdan, dan.nielsen@alexandra.dk)
Kenneth Enevoldsen (@KennethEnevoldsen, kenneth.enevoldsen@cas.au.dk)

Installation

To install the package simply write the following command in your favorite terminal:

$ pip install scandeval

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

$ scandeval --model-id <model-id>

Here model_id is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the datasets eligible. If you want to benchmark on a specific dataset, this can be done via the --dataset flag. This will for instance evaluate the model on the AngryTweets dataset:

$ scandeval --model-id <model-id> --dataset angry-tweets

We can also separate by language. To benchmark all Danish models on all Danish datasets, say, this can be done using the language tag, like so:

$ scandeval --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:

$ scandeval --model-id <model-id1> --model-id <model-id2> --dataset angry-tweets

The specific model version to use can also be added after the suffix '@':

$ scandeval --model-id <model-id>@<commit>

It can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the scandeval command by typing

$ scandeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmarker class, and call this benchmark object with your favorite models and/or datasets:

>>> from scandeval import Benchmarker
>>> benchmark = Benchmarker()
>>> benchmark('<model-id>')

To benchmark on a specific dataset, you simply specify the second argument, shown here with the AngryTweets dataset again:

>>> benchmark('<model_id>', 'angry-tweets')

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can specify several parameters in the Benchmarker initializer to narrow down the list of models to the ones you care about. As a simple example, the following would benchmark all the Nynorsk models on Nynorsk datasets:

>>> benchmark = Benchmarker(language='nn')
>>> benchmark()

Citing ScandEval

If you want to cite the framework then feel free to use this:

@inproceedings{nielsen2023scandeval,
  title={ScandEval: A Benchmark for Scandinavian Natural Language Processing},
  author={Nielsen, Dan Saattrup},
  booktitle={The 24rd Nordic Conference on Computational Linguistics},
  year={2023}
}

Remarks

The image used in the logo has been created by the amazing Scandinavia and the World team. Go check them out!

Project structure

.
├── .github
│   └── workflows
│       └── ci.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE
├── README.md
├── docs
├── gfx
│   └── scandeval.png
├── makefile
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── src
│   ├── scandeval
│   │   ├── __init__.py
│   │   ├── benchmark_config_factory.py
│   │   ├── benchmark_dataset.py
│   │   ├── benchmarker.py
│   │   ├── callbacks.py
│   │   ├── cli.py
│   │   ├── config.py
│   │   ├── dataset_configs.py
│   │   ├── dataset_factory.py
│   │   ├── dataset_tasks.py
│   │   ├── enums.py
│   │   ├── exceptions.py
│   │   ├── finetuning.py
│   │   ├── generation.py
│   │   ├── languages.py
│   │   ├── model_config.py
│   │   ├── model_loading.py
│   │   ├── model_setups
│   │   │   ├── __init__.py
│   │   │   ├── fresh.py
│   │   │   ├── hf.py
│   │   │   ├── local.py
│   │   │   ├── openai.py
│   │   │   └── utils.py
│   │   ├── named_entity_recognition.py
│   │   ├── openai_models.py
│   │   ├── protocols.py
│   │   ├── question_answering.py
│   │   ├── question_answering_trainer.py
│   │   ├── scores.py
│   │   ├── sequence_classification.py
│   │   ├── speed_benchmark.py
│   │   ├── text_to_text.py
│   │   ├── types.py
│   │   └── utils.py
│   └── scripts
│       ├── create_angry_tweets.py
│       ├── create_dane.py
│       ├── create_mim_gold_ner.py
│       ├── create_mlsum.py
│       ├── create_no_sammendrag.py
│       ├── create_nordjylland_news.py
│       ├── create_norec.py
│       ├── create_norne.py
│       ├── create_rrn.py
│       ├── create_scala.py
│       ├── create_scandiqa.py
│       ├── create_suc3.py
│       ├── create_swedn.py
│       ├── create_swerec.py
│       ├── create_wiki_lingua_nl.py
│       ├── create_wikiann_fo.py
│       ├── fill_in_missing_model_metadata.py
│       ├── fix_dot_env_file.py
│       ├── load_ud_pos.py
│       └── versioning.py
└── tests
    ├── __init__.py
    ├── conftest.py
    ├── test_benchmark_config_factory.py
    ├── test_benchmark_dataset.py
    ├── test_benchmarker.py
    ├── test_callbacks.py
    ├── test_cli.py
    ├── test_config.py
    ├── test_dataset_configs.py
    ├── test_dataset_factory.py
    ├── test_dataset_tasks.py
    ├── test_enums.py
    ├── test_exceptions.py
    ├── test_languages.py
    ├── test_model_config.py
    ├── test_model_loading.py
    ├── test_named_entity_recognition.py
    ├── test_openai_models.py
    ├── test_question_answering.py
    ├── test_question_answering_trainer.py
    ├── test_scores.py
    ├── test_sequence_classification.py
    ├── test_speed_benchmark.py
    ├── test_types.py
    └── test_utils.py

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

13.1.0

Oct 31, 2024

13.0.0

Jul 31, 2024

12.11.0

Jul 3, 2024

12.10.8

Jun 21, 2024

12.10.7

Jun 19, 2024

12.10.6

Jun 19, 2024

12.10.5

Jun 12, 2024

12.10.4

Jun 3, 2024

12.10.3

Jun 3, 2024

12.10.2

May 30, 2024

12.10.1

May 28, 2024

12.10.0

May 8, 2024

12.9.1

Apr 30, 2024

12.9.0

Apr 26, 2024

12.8.0

Apr 23, 2024

12.7.0

Apr 19, 2024

12.6.1

Apr 11, 2024

12.6.0

Apr 10, 2024

12.5.3

Apr 5, 2024

12.5.2

Apr 4, 2024

12.5.1

Apr 3, 2024

12.5.0

Apr 2, 2024

12.4.0

Mar 27, 2024

12.3.2

Mar 19, 2024

12.3.1

Mar 13, 2024

12.3.0

Mar 13, 2024

12.2.1

Mar 12, 2024

12.2.0

Mar 11, 2024

12.1.0

Feb 29, 2024

12.0.0

Feb 26, 2024

11.0.0

Feb 16, 2024

10.0.1

Feb 12, 2024

10.0.0

Feb 12, 2024

9.3.2

Feb 5, 2024

9.3.1

Jan 31, 2024

9.3.0

Jan 29, 2024

9.2.0

Jan 24, 2024

9.1.2

Jan 16, 2024

9.1.1

Jan 15, 2024

9.1.0

Jan 14, 2024

9.0.0

Jan 12, 2024

This version

8.2.1

Dec 20, 2023

8.2.0

Dec 20, 2023

8.1.0

Dec 4, 2023

8.0.0

Nov 29, 2023

7.1.1

Jul 1, 2023

7.1.0

May 15, 2023

7.0.0

May 13, 2023

6.3.0

Apr 12, 2023

6.2.4

Mar 10, 2023

6.2.3

Feb 27, 2023

6.2.2

Feb 25, 2023

6.2.1

Feb 22, 2023

6.2.0

Jan 9, 2023

6.1.1

Jan 2, 2023

6.1.0

Dec 29, 2022

6.0.1

Dec 28, 2022

6.0.0

Dec 24, 2022

5.0.0

Nov 3, 2022

4.0.2

Jul 22, 2022

4.0.1

Jul 14, 2022

4.0.0

Jul 14, 2022

3.0.0

Apr 19, 2022

2.3.2

Feb 11, 2022

2.3.1

Feb 11, 2022

2.3.0

Jan 20, 2022

2.2.0

Jan 18, 2022

2.1.0

Jan 17, 2022

2.0.0

Jan 7, 2022

1.5.9

Dec 14, 2021

1.5.8

Dec 13, 2021

1.5.7

Dec 10, 2021

1.5.6

Dec 10, 2021

1.5.5

Dec 8, 2021

1.5.4

Dec 8, 2021

1.5.3

Dec 8, 2021

1.5.2

Dec 8, 2021

1.5.1

Nov 27, 2021

1.5.0

Nov 26, 2021

1.4.0

Nov 25, 2021

1.3.8

Nov 25, 2021

1.3.7

Nov 25, 2021

1.3.6

Nov 25, 2021

1.3.5

Nov 23, 2021

1.3.4

Nov 11, 2021

1.3.3

Nov 11, 2021

1.3.2

Nov 11, 2021

1.3.1

Nov 11, 2021

1.3.0

Nov 11, 2021

1.2.1

Nov 11, 2021

1.2.0

Oct 15, 2021

1.1.3

Oct 4, 2021

1.1.2

Sep 26, 2021

1.1.1

Sep 26, 2021

1.1.0

Sep 13, 2021

1.0.2

Sep 9, 2021

1.0.1

Sep 9, 2021

1.0.0

Sep 9, 2021

0.17.0

Sep 9, 2021

0.16.0

Sep 7, 2021

0.15.1

Sep 3, 2021

0.15.0

Sep 2, 2021

0.14.1

Sep 2, 2021

0.14.0

Aug 31, 2021

0.13.0

Aug 30, 2021

0.12.0

Aug 26, 2021

0.11.2

Aug 25, 2021

0.11.1

Aug 24, 2021

0.11.0

Aug 23, 2021

0.10.1

Aug 20, 2021

0.10.0

Aug 20, 2021

0.9.0

Aug 19, 2021

0.8.0

Aug 18, 2021

0.7.0

Aug 17, 2021

0.6.0

Aug 15, 2021

0.5.2

Aug 13, 2021

0.5.1

Aug 13, 2021

0.5.0

Aug 12, 2021

0.4.3

Aug 12, 2021

0.4.2

Aug 12, 2021

0.4.1

Aug 12, 2021

0.4.0

Aug 11, 2021

0.3.1

Aug 10, 2021

0.3.0

Aug 10, 2021

0.2.0

Aug 9, 2021

0.1.0

Aug 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandeval-8.2.1.tar.gz (72.8 kB view hashes)

Uploaded Dec 20, 2023 Source

Built Distribution

scandeval-8.2.1-py3-none-any.whl (88.9 kB view hashes)

Uploaded Dec 20, 2023 Python 3

Hashes for scandeval-8.2.1.tar.gz

Hashes for scandeval-8.2.1.tar.gz
Algorithm	Hash digest
SHA256	`2f8be28561e6528657dbd4b76563fdcf6d337449928b26cbd1e8b07265d9acf3`
MD5	`3d49b22e9fdd3523fea1f7a98da0bdc8`
BLAKE2b-256	`d450b9258da39aa94e21d92a5f9088abf3f3f19b0039b853c0838cb971240542`

Hashes for scandeval-8.2.1-py3-none-any.whl

Hashes for scandeval-8.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84f800fac7e74c19460a94e4043faef6e3c3507ec9a02297a5f9ae22c00c2b8a`
MD5	`3a2fd80114cdbfe57d5d0122d2b292ab`
BLAKE2b-256	`27ea0ad6424d59b5bd201ad8a3273fb976beee2bb1c394aa7f7d35ac5efb669a`