GEM Challenge metrics
Project description
Automatic metrics for GEM benchmark tasks. Can also be used standalone for evaluation of various natural language generation tasks.
Installation
GEM-metrics require recent Python 3, virtualenv or similar is recommended. To install, simply run: ` git clone https://github.com/GEM-benchmark/GEM-metrics cd GEM-metrics pip install -r requirements.txt -r requirements-heavy.txt `
If you want to just run the metrics from console (and don’t need direct access to the source code), you can just run: ` pip install 'gem-metrics[heavy] @ git+https://github.com/GEM-benchmark/GEM-metrics.git' `
Note that some NLTK stuff may be downloaded upon first run into a subdirectory where the code is located, so make sure you have write access when you run this. Also note that all the required Python libraries are around 3 GB in size when installed.
If you don’t need trained metrics (BLEURT, BERTScore, NUBIA, QuestEval), you can ignore the “heavy” part, i.e. only install dependencies from requirements.txt or only use gem-metrics instead of gem-metrics[heavy] if installing without checkout. That way, your installed libraries will be ~300 MB.
Script Usage
To compute all default metrics for a file, run: ` <script> [-r references.json] outputs.json ` Where <script> is either ./run_metrics.py (if you created a checkout) or gem_metrics if you installed directly via pip.
See [test_data](test_data/) for example JSON file formats.
For calculating basic metrics with the unit test data, run: ` ./run_metrics.py -s test_data/unit_tests/sources.json -r test_data/unit_tests/references.json test_data/unit_tests/predictions.json `
Use ./run_metrics.py -h to see all available options.
By default, the “heavy” metrics (BERTScore, BLEURT, NUBIA and QuestEval) aren’t computed. Use –heavy-metrics to compute them.
Library Usage
You can compute metrics for the same JSON format as shown in [test_data](test_data/), or you can work with plain lists of texts (or lists of lists of texts in the case of multi-reference data).
Import GEM-metrics as a library: ` import gem_metrics `
To load data from JSON files: ` preds = gem_metrics.texts.Predictions('path/to/pred-file.json') refs = gem_metrics.texts.References('path/to/ref-file.json') `
To prepare plain lists (assuming the same order): ` preds = gem_metrics.texts.Predictions(list_of_predictions) refs = gem_metrics.texts.References(list_of_references) # input may be list of lists for multi-ref `
Then compute the desired metrics: ` result = gem_metrics.compute(preds, refs, metrics_list=['bleu', 'rouge']) # add list of desired metrics here `
List of supported metrics
Referenceless:
local_recall – LocalRecall
msttr – MSTTR
ngrams – n-gram statistics
ttr – TTR
Reference-based:
bertscore – BERTScore (heavy)
bleu – BLEU
bleurt – BLEURT (heavy)
chrf – CHRF
cider – CIDER
meteor – Meteor (heavy)
moverscore – MoverScore (heavy)
nist – NIST
nubia – NUBIA (heavy)
prism – Prism
questeval – QuestEval (heavy)
rouge – ROUGE
ter – TER
wer – WER
yules_i – Yules_I
Source + reference based:
sari – SARI
License
Licensed under [the MIT license](LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gem_metrics_fork-0.1.dev0.tar.gz
.
File metadata
- Download URL: gem_metrics_fork-0.1.dev0.tar.gz
- Upload date:
- Size: 46.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6095896a17add80d7452eb2b58f77857a08a123b5a6f7b8b11e0fbfa8d982a8 |
|
MD5 | 047a7e6f1e215a7663b6ffb0ccbf78fb |
|
BLAKE2b-256 | 5c7430a150df386cd977f4956e99a67dc7ccd1b458d85b4bee689a756bd025e8 |
File details
Details for the file gem_metrics_fork-0.1.dev0-py3-none-any.whl
.
File metadata
- Download URL: gem_metrics_fork-0.1.dev0-py3-none-any.whl
- Upload date:
- Size: 49.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c279fa24ab25025b3b7564d4bf9b96b2357267cb906ca0f8ad9f46b31ac29e02 |
|
MD5 | 4131d582a379c07d90c7f43167f8be0c |
|
BLAKE2b-256 | f3f306efd762e45036abb8b07ada568bbfced1a1f497a8a3f5e02e6937ac8cb5 |