A fast multithreaded C++ implementation of nltk BLEU.
Project description
fast-bleu Package
This is a fast multithreaded C++ implementation of NLTK BLEU; computing BLEU and SelfBLEU score for a fixed reference set. It can return (Self)BLEU for different (max) n-grams simultaneously and efficiently (e.g. BLEU-2, BLEU-3 and etc.).
Installation
PyPI latest stable release
pip install --user fast-bleu
Sample Usage
Here is an example to compute BLEU-2, BLEU-3, SelfBLEU-2 and SelfBLEU-3:
>>> from fast_bleu import BLEU, SelfBLEU
>>> ref1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
... 'ensures', 'that', 'the', 'military', 'will', 'forever',
... 'heed', 'Party', 'commands']
>>> ref2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
... 'guarantees', 'the', 'military', 'forces', 'always',
... 'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
... 'army', 'always', 'to', 'heed', 'the', 'directions',
... 'of', 'the', 'party']
>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
... 'ensures', 'that', 'the', 'military', 'always',
... 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
... 'interested', 'in', 'world', 'history']
>>> list_of_references = [ref1, ref2, ref3]
>>> hypotheses = [hyp1, hyp2]
>>> weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}
>>> bleu = BLEU(list_of_references, weights)
>>> bleu.get_score(hypotheses)
{'bigram': [0.7453559924999299, 0.0191380231127159], 'trigram': [0.6240726901657495, 0.013720869575946234]}
which means:
-
BLEU-2 for hyp1 is 0.7453559924999299
-
BLEU-2 for hyp2 is 0.0191380231127159
-
BLEU-3 for hyp1 is 0.6240726901657495
-
BLEU-3 for hyp2 is 0.013720869575946234
>>> self_bleu = SelfBLEU(list_of_references, weights)
>>> self_bleu.get_score()
{'bigram': [0.25819888974716115, 0.3615507630310936, 0.37080992435478316],
'trigram': [0.07808966062765045, 0.20140620205719248, 0.21415334758254043]}
which means:
-
SelfBLEU-2 for ref1 is 0.25819888974716115
-
SelfBLEU-2 for ref2 is 0.3615507630310936
-
SelfBLEU-2 for ref3 is 0.37080992435478316
-
SelfBLEU-3 for ref1 is 0.07808966062765045
-
SelfBLEU-3 for ref2 is 0.20140620205719248
-
SelfBLEU-3 for ref3 is 0.21415334758254043
Caution Each token of reference set is converted to string format during computation.
For further details, refer to the documentation provided in the source codes.
Citation
Please cite our paper if it helps with your research.
- ACL Anthology: https://www.aclweb.org/anthology/W19-2311
- Arxiv link: https://arxiv.org/abs/1904.03971
@inproceedings{alihosseini-etal-2019-jointly,
title = {Jointly Measuring Diversity and Quality in Text Generation Models},
author = {Alihosseini, Danial and
Montahaei, Ehsan and
Soleymani Baghshah, Mahdieh},
booktitle = {Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation},
month = {jun},
year = {2019},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/W19-2311},
doi = {10.18653/v1/W19-2311},
pages = {90--98},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file fast-bleu-0.0.6.tar.gz
.
File metadata
- Download URL: fast-bleu-0.0.6.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dc2b4196759b49ac65c689a32c7ef78443453c82b0d1c9715951d85ab6c32a7 |
|
MD5 | cd7e6d0236038852919d5fdcb1682e51 |
|
BLAKE2b-256 | cc36a8871a003621c5d6685368a39b00ebd7ce1405627b93b707fe86eef7d2c0 |