A fork of Multilingual ROUGE (a fork of XL-Sum)

These details have not been verified by PyPI

Project links

Homepage

Project description

Multilingual ROUGE Scoring

This is a port of XL-Sum repository as a simple utility to utilize the multilingual RougeL utility for MIRAGE-Bench, with the following changes:

Added multilingual_rouge/bengali_stemmer from bengali-stemmer repository as it was not available in PyPI.

I do not own the codebase so any questions must be redirected to XL-Sum repository.

Installation

To install this library, please use the following:

python -m unidic download # for japanese segmentation
pip install -U multilingual-rouge

Alternatively to install from source:

python -m unidic download # for japanese segmentation
pip install -e .

Overview

ROUGE is the de facto evaluation metric used for text summarization. However, it was designed specifically for evaluating English texts. Due to the nature of the metric, scores are heavily dependent on text tokenization / stemming / unnecessary character removal, etc. This repo tries to address these issues by adding the following main features using an adaptation of rouge-score: Google's rouge implementation.

Enables multilingual ROUGE scoring by making use of popular word segmentation / stemming algorithms for various languages.
Removes only punctuation characters according to unicode data tables as part of text normalization. This enables basic rouge scoring even with the absence of a segmenter / stemmer for any language.
Provides an easy to use interface for using custom tokenization / stemming implementations.

Supported language names for stemming

bengali, hindi, turkish, arabic, danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, romanian, russian, spanish, swedish

Supported language names for word segmentation

chinese, thai, japanese, burmese

Setup

pip3 install -r requirements.txt
python3 -m unidic download # for japanese segmentation
pip3 install --upgrade ./

Example Usage

Using CLI

python -m rouge_score.rouge \
    --target_filepattern=*.targets \
    --prediction_filepattern=*.decodes \
    --output_filename=scores.csv \
    --use_stemmer=true \ # optional
    --lang="bengali" # optional

Using python

Default usage

from multilingual_rouge import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
scores = scorer.score('The quick brown fox jumps over the lazy dog',
                      'The quick brown dog jumps on the log.')

With provided language

from multilingual_rouge import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="bengali")
scores = scorer.score('তোমার সাথে দেখা হয়ে ভালো লাগলো।',
                      'আপনার সাথে দেখা হয়ে ভালো লাগলো।')

With your own stemming / word segmentation implementation

Custom stemmer/ tokenizer implementations must be callable objects, i.e. functions or classes with __call__ method implemented. If lang is also given, user provided implementations take precedence over the library provided ones.

from multilingual_rouge import rouge_scorer

# example with custom stemming
class DummyStemmer(object):
    def __call__(self, token):
        stem = ""
        # your stemmer implementation
        return stem

# example with custom segmenter/tokenizer
def dummy_tokenize(text):
    tokens = []
    # your tokenizer implementation
    return tokens

scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, 
                                    callable_stemmer=DummyStemmer(),
                                    callable_tokenizer=dummy_tokenize)
scores = scorer.score('The quick brown fox jumps over the lazy dog',
                      'The quick brown dog jumps on the log.')

To see list of all available keyword arguments and reference stemmer and segmenter implementations refer to rouge_scorer.py, stemmers.py and tokenizers.py

License

Originally licensed under the Apache 2.0 License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.1

Mar 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multilingual_rouge-0.0.1.tar.gz (26.3 kB view details)

Uploaded Mar 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multilingual_rouge-0.0.1-py3-none-any.whl (28.5 kB view details)

Uploaded Mar 31, 2025 Python 3

File details

Details for the file multilingual_rouge-0.0.1.tar.gz.

File metadata

Download URL: multilingual_rouge-0.0.1.tar.gz
Upload date: Mar 31, 2025
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.4

File hashes

Hashes for multilingual_rouge-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`cd18a8153137b35d27fec51f92411cfbc3c5599eebc504dd7a3d44dcea2fb06f`
MD5	`943a3548836c361f881c415a52ee2b57`
BLAKE2b-256	`29da3ea92272a41a6d8516bb4fb27f5f7f3d32fa0f52a8cc11cd5d97d5344e64`

See more details on using hashes here.

File details

Details for the file multilingual_rouge-0.0.1-py3-none-any.whl.

File metadata

Download URL: multilingual_rouge-0.0.1-py3-none-any.whl
Upload date: Mar 31, 2025
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.4

File hashes

Hashes for multilingual_rouge-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c8d9c2439aad5ff4e675387dbfa99e424d6e9c1d4d6cbd3a13e3fa6c7d8ece3`
MD5	`4d4403b9228b3dbe4c9103c6643a35a4`
BLAKE2b-256	`6b6a718089b05758bf1e65c4c7b43aaff2fbc8fdadd8eebf8f8f99e11e32e98d`

See more details on using hashes here.

multilingual-rouge 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Multilingual ROUGE Scoring

Installation

Overview

Supported language names for stemming

Supported language names for word segmentation

Setup

Example Usage

Using CLI

Using python

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes