Skip to main content

A fast python implementation of full ROUGE metrics for automatic summarization.

Project description

ROUGE Metric

PyPI UnitTest codecov License: MIT

A fast Python implementation of full ROUGE metric for automatic summarization evaluation. A Python wrapper of the official ROUGE-1.5.5.pl Perl script is also available.

Features

For the Perl script wrapper:

  • Easy to install: No need to manually download and configure the Perl scripts. It works as long as Perl is installed.
  • Cross platform: Support Linux, macOS and Windows machines.
  • Elegant CLI and API: A user-friendly API and a command line tool are available

For the Python implementation:

  • Full ROUGE support: Implemented ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S and ROUGE-SU metrics, with multi-reference evaluation support.
  • High speed: Pure Python implementation without invoking another process.
  • Correctness: Produce the same results as ROUGE-1.5.5.pl on all ROUGE scores on single document scenarios. The multi-document results might be slightly different, since we do not adopt bootstrap resampling.
  • Flexible and multi-lingual: We only focus on the language-agnostic tokens, and treat a sentence as a list of tokens. The language-aware pre-processing and tokenization are the freedom of user implementation. You may use different method to tokenize different languages, such as nltk for English and jieba for Chinese.

Installation

Install a stable version from PyPI.

pip install rouge-metric

Or install the latest version from GitHub.

pip install git+https://github.com/li-plus/rouge-metric.git@master

For Windows users who want to use the ROUGE-1.5.5.pl script, please install Strawberry Perl and add its binary folder to PATH.

Quick Start

With Command Line Tool

Basic usage:

rouge-metric [options] hypothesis reference [reference ...]

where the options are almost the same as the ROUGE-1.5.5.pl script. Run rouge-metric -h for more details.

For single document with single reference, specify two files.

rouge-metric sample/hypotheses/summary1.txt sample/references/summary1.1.txt -n 2 -w 1.2 -U -2 4

For single document with multiple references, specify a hypothesis file and several reference files.

rouge-metric sample/hypotheses/summary1.txt sample/references/summary1.1.txt sample/references/summary1.2.txt -n 2 -w 1.2 -U -2 4

For multiple documents with multiple references, specify two folders.

rouge-metric sample/hypotheses/ sample/references/ -n 2 -w 1.2 -U -2 4

It directly calls the ROUGE-1.5.5.pl script and you get the original output.

---------------------------------------------
A ROUGE-1 Average_R: 0.51822 (95%-conf.int. 0.42105 - 0.61538)
A ROUGE-1 Average_P: 0.55556 (95%-conf.int. 0.44444 - 0.66667)
A ROUGE-1 Average_F: 0.53622 (95%-conf.int. 0.43243 - 0.64000)
---------------------------------------------
A ROUGE-2 Average_R: 0.19519 (95%-conf.int. 0.11765 - 0.27273)
A ROUGE-2 Average_P: 0.21250 (95%-conf.int. 0.12500 - 0.30000)
A ROUGE-2 Average_F: 0.20346 (95%-conf.int. 0.12121 - 0.28572)
---------------------------------------------
A ROUGE-L Average_R: 0.51822 (95%-conf.int. 0.42105 - 0.61538)
A ROUGE-L Average_P: 0.55556 (95%-conf.int. 0.44444 - 0.66667)
A ROUGE-L Average_F: 0.53622 (95%-conf.int. 0.43243 - 0.64000)
---------------------------------------------
A ROUGE-W-1.2 Average_R: 0.33608 (95%-conf.int. 0.26618 - 0.40599)
A ROUGE-W-1.2 Average_P: 0.47348 (95%-conf.int. 0.38525 - 0.56172)
A ROUGE-W-1.2 Average_F: 0.39308 (95%-conf.int. 0.31483 - 0.47132)
---------------------------------------------
A ROUGE-S4 Average_R: 0.25495 (95%-conf.int. 0.13846 - 0.37143)
A ROUGE-S4 Average_P: 0.29167 (95%-conf.int. 0.15000 - 0.43333)
A ROUGE-S4 Average_F: 0.27200 (95%-conf.int. 0.14400 - 0.40000)
---------------------------------------------
A ROUGE-SU4 Average_R: 0.31495 (95%-conf.int. 0.19512 - 0.43478)
A ROUGE-SU4 Average_P: 0.35527 (95%-conf.int. 0.21053 - 0.50000)
A ROUGE-SU4 Average_F: 0.33382 (95%-conf.int. 0.20253 - 0.46511)

With Perl Script API

Besides the command line tool, you may also use ROUGE-1.5.5.pl programmatically. Note that it is only for English corpus. For non-English summaries, use the Python implementation instead, or convert the tokens to integers separated by space before evaluation.

from rouge_metric import PerlRouge

rouge = PerlRouge(rouge_n_max=3, rouge_l=True, rouge_w=True,
    rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)

# Load summary results and evaluate
hypotheses = [
    'how are you\ni am fine',                       # document 1: hypothesis
    'it is fine today\nwe won the football game',   # document 2: hypothesis
]
references = [[
    'how do you do\nfine thanks',   # document 1: reference 1
    'how old are you\ni am three',  # document 1: reference 2
], [
    'it is sunny today\nlet us go for a walk',  # document 2: reference 1
    'it is a terrible day\nwe lost the game',   # document 2: reference 2
]]

scores = rouge.evaluate(hypotheses, references)
print(scores)

The output is like

{
    'rouge-1': {
        'r': 0.51822, 'r_conf_int': (0.42105, 0.61538),
        'p': 0.55556, 'p_conf_int': (0.44444, 0.66667),
        'f': 0.53622, 'f_conf_int': (0.43243, 0.64)
    },
    'rouge-2': {...}, 'rouge-3': {...}, 'rouge-l': {...},
    'rouge-w-1.2': {...}, 'rouge-s4': {...}, 'rouge-su4': {...}
}

You may also evaluate summaries from existing files.

from rouge_metric import PerlRouge

hypothesis_dir = 'sample/hypotheses'
reference_dir = 'sample/references'
scores = PerlRouge().evaluate_from_files(hypothesis_dir, reference_dir)
print(scores)

With Python Implementation

A fast Python implementation is also available. It has similar API and supports multiple languages.

from rouge_metric import PyRouge

# Load summary results
hypotheses = [
    'how are you\ni am fine',  # document 1: hypothesis
    'it is fine today\nwe won the football game',  # document 2: hypothesis
]
references = [[
    'how do you do\nfine thanks',  # document 1: reference 1
    'how old are you\ni am three',  # document 1: reference 2
], [
    'it is sunny today\nlet us go for a walk',  # document 2: reference 1
    'it is a terrible day\nwe lost the game',  # document 2: reference 2
]]

# Evaluate document-wise ROUGE scores
rouge = PyRouge(rouge_n=(1, 2, 4), rouge_l=True, rouge_w=True,
                rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)
scores = rouge.evaluate(hypotheses, references)
print(scores)

The output is like

{
    'rouge-1': {
        'r': 0.5182186234817814,
        'p': 0.5555555555555556,
        'f': 0.5362379555927943
    },
    'rouge-2': {...}, 'rouge-4': {...}, 'rouge-l': {...},
    'rouge-w-1.2': {...}, 'rouge-s4': {...}, 'rouge-su4': {...}
}

By default, sentences are separated by '\n' and tokens are separated by white space in a document. This tokenization process can be further customized. For example,

from rouge_metric import PyRouge

# Pre-process and tokenize the summaries as you like
hypotheses = [
    ['how are you'.split(), 'i am fine'.split()],                       # document 1: hypothesis
    ['it is fine today'.split(), 'we won the football game'.split()],   # document 2: hypothesis
]
references = [[
    ['how do you do'.split(), 'fine thanks'.split()],   # document 1: reference 1
    ['how old are you'.split(), 'i am three'.split()],  # document 1: reference 2
], [
    ['it is sunny today'.split(), 'let us go for a walk'.split()],  # document 2: reference 1
    ['it is a terrible day'.split(), 'we lost the game'.split()],   # document 2: reference 2
]]

# Evaluate on tokenized documents
rouge = PyRouge(rouge_n=(1, 2, 4), rouge_l=True, rouge_w=True,
                rouge_w_weight=1.2, rouge_s=True, rouge_su=True, skip_gap=4)
scores = rouge.evaluate_tokenized(hypotheses, references)
print(scores)

License

This project is under MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rouge-metric-1.0.1.tar.gz (133.6 kB view details)

Uploaded Source

Built Distribution

rouge_metric-1.0.1-py3-none-any.whl (151.7 kB view details)

Uploaded Python 3

File details

Details for the file rouge-metric-1.0.1.tar.gz.

File metadata

  • Download URL: rouge-metric-1.0.1.tar.gz
  • Upload date:
  • Size: 133.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for rouge-metric-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a605e410c41f26e748bbccb632e14b619d52aa1037b41d2adde8a2cff9472df2
MD5 5cb42aeb76df66190fcd893469f9c9af
BLAKE2b-256 42d1a87e9971e565f9c081f6772576b6e47b917096e9cd7d1f525f777f260e70

See more details on using hashes here.

File details

Details for the file rouge_metric-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: rouge_metric-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 151.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for rouge_metric-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0818e48ec4aa66086ab18721f8ea313e57fe5065181268884b86935a3daac551
MD5 c6a157976b1e1b5a5ec70912c3823bf8
BLAKE2b-256 bb3418ddbc94f65e8b45220b373b2ad2db6bef7549f4b00b4baaaaa47204be1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page