Skip to main content

GEM Challenge metrics

Project description

Automatic metrics for GEM benchmark tasks. Can also be used standalone for evaluation of various natural language generation tasks.

Installation

GEM-metrics require recent Python 3, virtualenv or similar is recommended. To install, simply run: ` git clone https://github.com/GEM-benchmark/GEM-metrics cd GEM-metrics pip install -r requirements.txt -r requirements-heavy.txt `

If you want to just run the metrics from console (and don’t need direct access to the source code), you can just run: ` pip install 'gem-metrics[heavy] @ git+https://github.com/GEM-benchmark/GEM-metrics.git' `

Note that some NLTK stuff may be downloaded upon first run into a subdirectory where the code is located, so make sure you have write access when you run this. Also note that all the required Python libraries are around 3 GB in size when installed.

If you don’t need trained metrics (BLEURT, BERTScore, NUBIA, QuestEval), you can ignore the “heavy” part, i.e. only install dependencies from requirements.txt or only use gem-metrics instead of gem-metrics[heavy] if installing without checkout. That way, your installed libraries will be ~300 MB.

Script Usage

To compute all default metrics for a file, run: ` <script> [-r references.json] outputs.json ` Where <script> is either ./run_metrics.py (if you created a checkout) or gem_metrics if you installed directly via pip.

See [test_data](test_data/) for example JSON file formats.

For calculating basic metrics with the unit test data, run: ` ./run_metrics.py -s test_data/unit_tests/sources.json -r test_data/unit_tests/references.json test_data/unit_tests/predictions.json `

Use ./run_metrics.py -h to see all available options.

By default, the “heavy” metrics (BERTScore, BLEURT, NUBIA and QuestEval) aren’t computed. Use –heavy-metrics to compute them.

Library Usage

You can compute metrics for the same JSON format as shown in [test_data](test_data/), or you can work with plain lists of texts (or lists of lists of texts in the case of multi-reference data).

Import GEM-metrics as a library: ` import gem_metrics `

To load data from JSON files: ` preds = gem_metrics.texts.Predictions('path/to/pred-file.json') refs = gem_metrics.texts.References('path/to/ref-file.json') `

To prepare plain lists (assuming the same order): ` preds = gem_metrics.texts.Predictions(list_of_predictions) refs = gem_metrics.texts.References(list_of_references) # input may be list of lists for multi-ref `

Then compute the desired metrics: ` result = gem_metrics.compute(preds, refs, metrics_list=['bleu', 'rouge']) # add list of desired metrics here `

List of supported metrics

Referenceless:

  • local_recall – LocalRecall

  • msttr – MSTTR

  • ngrams – n-gram statistics

  • ttr – TTR

Reference-based:

  • bertscore – BERTScore (heavy)

  • bleu – BLEU

  • bleurt – BLEURT (heavy)

  • chrf – CHRF

  • cider – CIDER

  • meteor – Meteor (heavy)

  • moverscore – MoverScore (heavy)

  • nist – NIST

  • nubia – NUBIA (heavy)

  • prism – Prism

  • questeval – QuestEval (heavy)

  • rouge – ROUGE

  • ter – TER

  • wer – WER

  • yules_i – Yules_I

Source + reference based:

  • sari – SARI

License

Licensed under [the MIT license](LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gem_metrics_fork-0.1.dev0.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

gem_metrics_fork-0.1.dev0-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file gem_metrics_fork-0.1.dev0.tar.gz.

File metadata

  • Download URL: gem_metrics_fork-0.1.dev0.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for gem_metrics_fork-0.1.dev0.tar.gz
Algorithm Hash digest
SHA256 d6095896a17add80d7452eb2b58f77857a08a123b5a6f7b8b11e0fbfa8d982a8
MD5 047a7e6f1e215a7663b6ffb0ccbf78fb
BLAKE2b-256 5c7430a150df386cd977f4956e99a67dc7ccd1b458d85b4bee689a756bd025e8

See more details on using hashes here.

File details

Details for the file gem_metrics_fork-0.1.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for gem_metrics_fork-0.1.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 c279fa24ab25025b3b7564d4bf9b96b2357267cb906ca0f8ad9f46b31ac29e02
MD5 4131d582a379c07d90c7f43167f8be0c
BLAKE2b-256 f3f306efd762e45036abb8b07ada568bbfced1a1f497a8a3f5e02e6937ac8cb5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page