Skip to main content

A fast multithreaded C++ implementation of nltk BLEU.

Project description

fast-bleu Package

This is a fast multithreaded C++ implementation of NLTK BLEU; computing BLEU and SelfBLEU score for a fixed reference set. It can return (Self)BLEU for different (max) n-grams simultaneously and efficiently (e.g. BLEU-2, BLEU-3 and etc.).

Installation

PyPI latest stable release

pip install --user fast-bleu

Sample Usage

Here is an example to compute BLEU-2, BLEU-3, SelfBLEU-2 and SelfBLEU-3:

>>> from fast_bleu import BLEU, SelfBLEU
>>> ref1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']

>>> list_of_references = [ref1, ref2, ref3]
>>> hypotheses = [hyp1, hyp2]
>>> weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}

>>> bleu = BLEU(list_of_references, weights)
>>> bleu.get_score(hypotheses)
{'bigram': [0.7453559924999299, 0.0191380231127159], 'trigram': [0.6240726901657495, 0.013720869575946234]}

which means:

  • BLEU-2 for hyp1 is 0.7453559924999299

  • BLEU-2 for hyp2 is 0.0191380231127159

  • BLEU-3 for hyp1 is 0.6240726901657495

  • BLEU-3 for hyp2 is 0.013720869575946234

>>> self_bleu = SelfBLEU(list_of_references, weights)
>>> self_bleu.get_score()
{'bigram': [0.25819888974716115, 0.3615507630310936, 0.37080992435478316],
        'trigram': [0.07808966062765045, 0.20140620205719248, 0.21415334758254043]}

which means:

  • SelfBLEU-2 for ref1 is 0.25819888974716115

  • SelfBLEU-2 for ref2 is 0.3615507630310936

  • SelfBLEU-2 for ref3 is 0.37080992435478316

  • SelfBLEU-3 for ref1 is 0.07808966062765045

  • SelfBLEU-3 for ref2 is 0.20140620205719248

  • SelfBLEU-3 for ref3 is 0.21415334758254043

Caution Each token of reference set is converted to string format during computation.

For further details, refer to the documentation provided in the source codes.

Citation

Please cite our paper if it helps with your research.

@inproceedings{alihosseini-etal-2019-jointly,
    title = {Jointly Measuring Diversity and Quality in Text Generation Models},
    author = {Alihosseini, Danial  and
      Montahaei, Ehsan  and
      Soleymani Baghshah, Mahdieh},
    booktitle = {Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation},
    month = {jun},
    year = {2019},
    address = {Minneapolis, Minnesota},
    publisher = {Association for Computational Linguistics},
    url = {https://www.aclweb.org/anthology/W19-2311},
    doi = {10.18653/v1/W19-2311},
    pages = {90--98},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast-bleu-0.0.83.tar.gz (12.2 kB view details)

Uploaded Source

File details

Details for the file fast-bleu-0.0.83.tar.gz.

File metadata

  • Download URL: fast-bleu-0.0.83.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for fast-bleu-0.0.83.tar.gz
Algorithm Hash digest
SHA256 487ef834824d2f4adf99b5b5b67b03b2722a6031a6403e70185e02933eb84c99
MD5 d5180fc47ffd74b4243f2f2994321c8c
BLAKE2b-256 e83f62e4500b6d9b19b2a47e91d429adb3b5b87a2a61147b845a197818cd043c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page