A Python Wrapper to calculate standard BLEU scores for NLP
Project description
bleu (Python Package)
A Python Wrapper for the standard BLEU evaluation for Natural Language Generation (NLG).
- GitHub project: https://github.com/zhijing-jin/bleu.
- PyPI package:
pip install
bleu
Installation
Requirement: Python 3
Option 1: Install pip package
pip install --upgrade bleu
Option 2: Build from source
pip install --upgrade git+git://github.com/zhijing-jin/bleu.git
How to Run
The most standard way to calculate BLEU is by Moses' script for detokenized BLEU. This package provides easy calls to it.
Function 1: Calculate the BLEU for lists
If you want to check only one hypothesis (a list of sentences):
>>> from bleu import list_bleu
>>> ref = ['it is a white cat .',
'wow , this dog is huge .']
>>> ref1 = ['This cat is white .',
'wow , this is a huge dog .']
>>> hyp = ['it is a white kitten .',
'wowww , the dog is huge !']
>>> hyp1 = ["it 's a white kitten .",
'wow , this dog is huge !']
>>> list_bleu([ref], hyp)
34.99
>>> list_bleu([ref, ref1], hyp1)
57.91
If you want to check multiple hypothesis (several lists of sentences):
>>> from bleu import multi_list_bleu
>>> multi_list_bleu([ref, ref1], [hyp, hyp1])
[34.99, 57.91]
detok=False
: It is not advisable to use tokenized bleu (by multi-bleu.perl), but if you want to call it, just use detok=False
:
>>> list_bleu([ref], hyp, detok=False)
39.76
# or if you want to test multiple hypotheses
>>> multi_list_bleu([ref, ref1], [hyp, hyp1], detok=False)
[39.76, 47.47]
verbose=True
: If there are unexpected errors, you might want to check the intermediate steps by verbose=True
.
Function 2: Calculate the BLEU for files
If you want to check only one hypothesis file:
# if you already have the following files
>>> from bleu import file_bleu
>>> hyp_file = 'data/hyp0.txt'
>>> ref_files = ['data/ref0.txt', 'data/ref1.txt']
>>> file_bleu(ref_files, hyp_file)
34.99
If you want to check multiple hypothesis files:
>>> from bleu import multi_file_bleu
>>> hyp_file1 = 'data/hyp1.txt'
>>> bleus = multi_file_bleu(ref_files, [hyp_file, hyp_file1])
[34.99, 57.91]
detok=True
: Set it if you want to calculate the (not recommended) tokenized bleu.
verbose=True
: Set it if you want to inspect how the bleu calculations are made:
>>> bleu = file_bleu(ref_files, hyp_file, verbose=True)
[Info] Valid Reference Files: ['data/ref0.txt', 'data/ref1.txt']
[Info] Valid Hypothesis Files: ['data/hyp0.txt']
[Info] #lines in each file: 2
[cmd] perl detokenizer.perl -l en < data/ref0.txt > data/ref0.detok.txt 2>/dev/null
[cmd] perl detokenizer.perl -l en < data/ref1.txt > data/ref1.detok.txt 2>/dev/null
[cmd] perl detokenizer.perl -l en < data/hyp0.txt > data/hyp0.detok.txt 2>/dev/null
[cmd] perl multi-bleu-detok.perl data/ref0.detok.txt data/ref1.detok.txt < data/hyp0.detok.txt
2-ref bleu for data/hyp0.detok.txt: 34.99
>>> bleu
34.99
Option 3: Detokenize files
>>> from bleu import detok_files
>>> detok_ref_files = detok_files(ref_files, tmp_dir='./data', file_prefix='ref_dtk', verbose=True)
[cmd] perl ./TMP_DIR/detokenizer.perl -l en < data/ref0.txt > data/ref_dtk0.txt 2>/dev/null
[cmd] perl ./TMP_DIR/detokenizer.perl -l en < data/ref1.txt > data/ref_dtk1.txt 2>/dev/null
>>> detok_ref_files
['data/ref_dtk0.txt', 'data/ref_dtk1.txt']
In Case of Unexpected Outputs
Check the python file bleu.py and adapt it.
Contact
If you have more questions, feel free to check out the common Q&A, or raise a new GitHub issue.
In case of really urgent needs, contact the author Zhijing Jin (Miss).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.