Skip to main content

A fast multithreaded C++ implementation of nltk BLEU.

Project description

CAUTION

This package is renamed to "fast-bleu".

Please check the new package to get updated versions.

FastBLEU Package (DEPRECATED)

This is a fast multithreaded C++ implementation of NLTK BLEU; computing BLEU and SelfBLEU score for a fixed reference set. It can return (Self)BLEU for different (max) n-grams simultaneously and efficiently (e.g. BLEU-2, BLEU-3 and etc.).

Installation

PyPI latest stable release

pip install --user FastBLEU

Sample Usage

Here is an example to compute BLEU-2, BLEU-3, SelfBLEU-2 and SelfBLEU-3:

>>> from fast_bleu import BLEU, SelfBLEU
>>> ref1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']

>>> list_of_references = [ref1, ref2, ref3]
>>> hypotheses = [hyp1, hyp2]
>>> weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}

>>> bleu = BLEU(list_of_references, weights)
>>> bleu.get_score(hypotheses)
{'bigram': [0.7453559924999299, 0.0191380231127159], 'trigram': [0.6240726901657495, 0.013720869575946234]}

which means:

  • BLEU-2 for hyp1 is 0.7453559924999299

  • BLEU-2 for hyp2 is 0.0191380231127159

  • BLEU-3 for hyp1 is 0.6240726901657495

  • BLEU-3 for hyp2 is 0.013720869575946234

>>> self_bleu = SelfBLEU(list_of_references, weights)
>>> self_bleu.get_score()
{'bigram': [0.25819888974716115, 0.3615507630310936, 0.37080992435478316],
        'trigram': [0.07808966062765045, 0.20140620205719248, 0.21415334758254043]}

which means:

  • SelfBLEU-2 for ref1 is 0.25819888974716115

  • SelfBLEU-2 for ref2 is 0.3615507630310936

  • SelfBLEU-2 for ref3 is 0.37080992435478316

  • SelfBLEU-3 for ref1 is 0.07808966062765045

  • SelfBLEU-3 for ref2 is 0.20140620205719248

  • SelfBLEU-3 for ref3 is 0.21415334758254043

Caution Each token of reference set is converted to string format during computation.

For further details, refer to the documentation provided in the source codes.

Citation

Please cite our paper if it helps with your research.

@inproceedings{alihosseini-etal-2019-jointly,
    title = {Jointly Measuring Diversity and Quality in Text Generation Models},
    author = {Alihosseini, Danial  and
      Montahaei, Ehsan  and
      Soleymani Baghshah, Mahdieh},
    booktitle = {Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation},
    month = {jun},
    year = {2019},
    address = {Minneapolis, Minnesota},
    publisher = {Association for Computational Linguistics},
    url = {https://www.aclweb.org/anthology/W19-2311},
    doi = {10.18653/v1/W19-2311},
    pages = {90--98},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastBLEU-0.0.41.tar.gz (3.1 kB view details)

Uploaded Source

File details

Details for the file FastBLEU-0.0.41.tar.gz.

File metadata

  • Download URL: FastBLEU-0.0.41.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for FastBLEU-0.0.41.tar.gz
Algorithm Hash digest
SHA256 4f6d5414ef819c2fae1a907e13140d68f53e24353f704100a338c1f781730a9c
MD5 dad731c17864fa42aa984f080274e97e
BLAKE2b-256 76afaf1e6e9004801bee3f4657341846592cf369b3af3c703f09767c90bfd797

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page