Skip to main content

Pure python implementation of ROUGE-1.5.5.

Project description

Python ROUGE Implementation

Overview

This is a native python implementation of ROUGE, designed to replicate results from the original perl package.

ROUGE was originally introduced in the paper:

Lin, Chin-Yew. ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.

ROUGE for Python

There are ROUGE implementations available for Python, however some are not native python due to their dependency on the perl script, and others provide differing results when compared with the original implementation. This makes it difficult to directly compare with known results.

This package is designed to replicate perl results. It implements:

  • ROUGE-N (N-gram) scoring
  • ROUGE-L (Longest Common Subsequence) scoring
  • Text normalization
  • Bootstrap resampling for confidence interval calculation
  • Optional Porter stemming to remove plurals and word suffixes such as (ing, ion, ment).

Note that not all options provided by the original perl ROUGE script are supported, but the subset of options that are implemented should replicate the original functionality.

Stopword removal

The original ROUGE perl script implemented optional stopword removal (using the -s parameter). However, there were ~600 stopwords used by ROUGE, borrowed from another now defunct package. This word list contained many words that may not be suited to some tasks, such as day and month names and numbers. It also has no clear license for redistribution. Since we are unable to replicate this functionality precisely we do not include stopword removal.

Two flavors of ROUGE-L

In the ROUGE paper, two flavors of ROUGE are described:

  1. sentence-level: Compute longest common subsequence (LCS) between two pieces of text. Newlines are ignored. This is called rougeL in this package.
  2. summary-level: Newlines in the text are interpreted as sentence boundaries, and the LCS is computed between each pair of reference and candidate sentences, and something called union-LCS is computed. This is called rougeLsum in this package. This is the ROUGE-L reported in Get To The Point: Summarization with Pointer-Generator Networks, for example. If your references/candidates do not have newline delimiters, you can use the --split_summaries flag (or optional argument in RougeScorer).

How to run

This package compares target files (containing one example per line) with prediction files in the same format. It can be launched as follows (from google-research/):

python -m rouge.rouge \
    --target_filepattern=*.targets \
    --prediction_filepattern=*.decodes \
    --output_filename=scores.csv \
    --use_stemmer=true \
    --split_summaries=true

Using pip

pip install -r rouge/requirements.txt
pip install rouge-score

Then in python:

from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
scores = scorer.score('The quick brown fox jumps over the lazy dog',
                      'The quick brown dog jumps on the log.')

License

Licensed under the Apache 2.0 License.

Disclaimer

This is not an official Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rouge_score-0.0.7.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rouge_score-0.0.7-py3.10.egg (2.7 kB view details)

Uploaded Egg

File details

Details for the file rouge_score-0.0.7.tar.gz.

File metadata

  • Download URL: rouge_score-0.0.7.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for rouge_score-0.0.7.tar.gz
Algorithm Hash digest
SHA256 bc26e36f50182cd8ee9c75cf46541b077349f3760d37a4404bf899c684020103
MD5 561752fa3150845ea2fd6a52ccf2784a
BLAKE2b-256 9ea7b4b20a8a157a258609e267b79ec6e2a09223fddd2c568ff5b8e73577251e

See more details on using hashes here.

File details

Details for the file rouge_score-0.0.7-py3.10.egg.

File metadata

  • Download URL: rouge_score-0.0.7-py3.10.egg
  • Upload date:
  • Size: 2.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for rouge_score-0.0.7-py3.10.egg
Algorithm Hash digest
SHA256 b805cb794262f2d2107f186989a181f3c761d448b2aa505b3b045fe79547e8f9
MD5 b9bef9f287d0c82d2f69d2d7689ae5c9
BLAKE2b-256 005f035a7a9f7fa20f6177ea2a92f94e0d71b773e79e365afdb0c753cb4ae342

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page