Compute Word Error Rate for Tibetan language text.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Tibetan-WER

This module provides a means to calculate Word Error Rate, and the Syllable Error Rate for Tibetan language text.

Install

Install the library to get started:

pip install --upgrade tibetan_wer

Usage

Basic Usage

The wer function expects a list of predictions and a list of references and returns a dictionary of the micro and macro average WER as well as the total number of substitutions, insertions, and deletions.

from tibetan_wer.metrics import wer

rediction = ['གཞོན་ནུར་གྱུར་པ་ལ་ཕྱག་འཚལ་ལོ༔']
reference = ['འཇམ་དཔལ་གཞོན་ནུར་གྱུར་པ་ལ་ཕྱག་འཚལ་ལོ༔']

result = wer(prediction, reference)

print(f'Micro-Average WER Score: {result['micro_wer']}')
print(f'Macro-Average WER Score: {result['macro_wer']}')
print(f'Substitutions: {result['substitutions']}')
print(f'Insertions: {result['insertions']}')
print(f'Deletions: {result['deletions']}')

The ser function works very similarly.

from tibetan_wer.metrics import ser

prediction = ['གཞོན་ནུར་གྱུར་པ་ལ་ཕྱག་འཚལ་ལོ༔']
reference = ['འཇམ་དཔལ་གཞོན་ནུར་གྱུར་པ་ལ་ཕྱག་འཚལ་ལོ༔']

result = ser(prediction, reference)

print(f'Micro-Average SER Score: {result['micro_ser']:.3f}')
print(f'Macro-Average SER Score: {result['macro_ser']:.3f}')
print(f'Substitutions: {result['substitutions']:.3f}')
print(f'Insertions: {result['insertions']:.3f}')
print(f'Deletions: {result['deletions']:.3f}')

Usage for Model Evaluation

The intended use-case is as part of assessing model training. To use tibetan_wer for this you can define custom metrics for model training like so:

import evaluate
from tibetan_wer.metrics import wer as tib_wer, ser as tib_ser

cer_metric = evaluate.load("cer")

def compute_metrics(pred):
    pred_ids = pred.predictions
    label_ids = pred.label_ids

    # replace -100 with the pad_token_id
    label_ids[label_ids == -100] = tokenizer.pad_token_id

    # we do not want to group tokens when computing the metrics
    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    label_str = tokenizer.batch_decode(label_ids, skip_special_tokens=True)

    cer = cer_metric.compute(predictions=pred_str, references=label_str)
    tib_wer_res = tib_wer(predictions=pred_str, references=label_str)
    tib_ser_res = tib_ser(predictions=pred_str, references=label_str)

    macro_wer = tib_wer_res['macro_wer']
    micro_wer = tib_wer_res['micro_wer']
    word_subs = tib_wer_res['substitutions']
    word_ins = tib_wer_res['insertions']
    word_dels = tib_wer_res['deletions']

    macro_ser = tib_ser_res['macro_ser']
    micro_ser = tib_ser_res['micro_ser']
    syl_subs = tib_ser_res['substitutions']
    syl_ins = tib_ser_res['insertions']
    syl_dels = tib_ser_res['deletions']

    return {"cer": cer,
            "tib_macro_wer": macro_wer,
            "tib_micro_wer": micro_wer,
            "word_substitutions": word_subs,
            "word_insertions":word_ins,
            "word_deletions":word_dels,
            "tib_macro_ser": macro_ser,
            "tib_micro_ser": micro_ser,
            "syllable_substitutions": syl_subs,
            "syllable_insertions": syl_ins,
            "syllable_deletions": syl_dels
            }

You can then set the transformers trainer to use these metrics like so:

trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics, # use custom metrics
    tokenizer=processor.feature_extractor,
)

trainer.train()

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.1.0

Apr 26, 2026

This version

1.0.1

Apr 20, 2025

1.0.0

Apr 20, 2025

0.0.2

Apr 14, 2025

0.0.1

Apr 10, 2025

0.0.0

Apr 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tibetan_wer-1.0.1.tar.gz (8.7 kB view details)

Uploaded Apr 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tibetan_wer-1.0.1-py3-none-any.whl (7.9 kB view details)

Uploaded Apr 20, 2025 Python 3

File details

Details for the file tibetan_wer-1.0.1.tar.gz.

File metadata

Download URL: tibetan_wer-1.0.1.tar.gz
Upload date: Apr 20, 2025
Size: 8.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for tibetan_wer-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f7fec18ca4e2621177f901195056ff8cd776ebe2b7230828c1c3b884eebe151f`
MD5	`7a4907a4c03b24155b5bc72ddb1a82f4`
BLAKE2b-256	`0063b83fcbe0994a8e1abe2ffda71cc741922246e6ac980bfbe22b2b9690dffe`

See more details on using hashes here.

File details

Details for the file tibetan_wer-1.0.1-py3-none-any.whl.

File metadata

Download URL: tibetan_wer-1.0.1-py3-none-any.whl
Upload date: Apr 20, 2025
Size: 7.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for tibetan_wer-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c7a34dc2c52bd3ab9e845b5b2bf2df9b0ecdd06e6fe8c8a03f6bc00ef2f5d1a9`
MD5	`e347164f5b1ac32826a880c98599f12e`
BLAKE2b-256	`59e24d1b5e0f03b4e916700cafcfca4ce2c06b619307cf4bd888dd6adbb8c6bf`

See more details on using hashes here.

tibetan-wer 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tibetan-WER

Install

Usage

Basic Usage

Usage for Model Evaluation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes