Multi-process bleu evaluation.
Project description
bleu-mp
Multi-process BLEU evaluation tool.
多进程BLEU评估工具。
Modified from the bleu scoring tool of huggingface evaluation.
改自 huggingface evaluate 的 bleu 评分工具。
https://github.com/huggingface/evaluate/blob/main/metrics/bleu/bleu.py
Install / 安装
pip
pip install -U bleu-mp
dev
git clone https://github.com/One-sixth/bleu-mp
cd bleu-mp
pip install -e .
New Features / 新特性
- Faster!
- The built-in multi-process implementation of python is not used. I use my own multi-process implementation, which is very friendly to the compatibility of Windows and Linux. The memory occupied by each calculation subprocess is very low.
- 更快!
- 不使用python内置的多进程实现。使用我自己的多进程实现,从而对windows和linux的兼容性非常友好,每个计算子进程占用的内存非常低。
Features / 特性
Both string and integer sequences are supported for bleu calculation.
同时支持 字符串和整数序列 进行bleu计算。
Speed test / 速度测试
Test code is in unittest/test.py.
测试代码位于 unittest/test.py。
CPU:i7-8750H
# short str / 短字符串
score (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 2200000, 2200000) (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 2200000, 2200000)
1 process cost time 16.979528665542603
10 process cost time 3.5354034900665283
# long str / 长字符串
score (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 22000000, 22000000) (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 22000000, 22000000)
1 process cost time 103.8217351436615
10 process cost time 22.66322374343872
# short int list / 短整数序列
score (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 800000, 800000) (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 800000, 800000)
1 process cost time 4.874496936798096
10 process cost time 1.1751139163970947
# long int list / 长整数序列
score (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 16000000, 16000000) (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 1.0, 16000000, 16000000)
1 process cost time 47.34107685089111
10 process cost time 10.046519994735718
Warning / 警告
Don't input pytorch's tensor type. It will causes unnecessary memory consumption, and a lot of performance loss.
Please convert to numpy array or list type first.
不要传入 pytorch 的 tensor 类型,这会导致额外的内存消耗和大量的性能损失。
请先转换到 numpy数组 或 list类型。
Demo / 示例
from bleu_mp import compute_bleu
# str
pred_data = ['床前明月光,疑是地上霜', '举头望明月,低头思故乡'] * 1000
tgt_data = [['床前明月光,疑是地上霜'], ['举头望明月,低头思故乡', '静夜思']] * 1000
result = compute_bleu(pred_data, tgt_data)
print('bleu score', result[0])
# int list
pred_data = [[1, 2, 3, 4], [2, 3, 4, 5]] * 1000
tgt_data = [[[1, 2, 3, 4]], [[2, 3, 4, 5], [4, 5, 6]]] * 1000
result = compute_bleu(pred_data, tgt_data)
print('bleu score', result[0])
Reference / 引用
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bleu-mp-0.0.1.tar.gz
.
File metadata
- Download URL: bleu-mp-0.0.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8792eff60fe4b23dce8d18bd537b1b6787e82a2b613100efeb8b807b5274546 |
|
MD5 | b269a13341958b3ca113cc2a83d6dab7 |
|
BLAKE2b-256 | ccc33c72b3abfe2fc0c5aa07c28337dbe085b5482b599458bc2531b5c4efdb6b |
File details
Details for the file bleu_mp-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: bleu_mp-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0a0eaeb65248f01b5f13cccfbde2f2e4ebbf7e656bd6d6eb9fb79b0e3652565 |
|
MD5 | 2498ca0b2f6c230580695b73bf43e40e |
|
BLAKE2b-256 | 475cbf38014eeebdd9cb09fd31ebd8008a2b01c09f356950e34a9caa548dc8d8 |