Skip to main content

Automatic lyrics transcription evaluation toolkit

Project description

alt-eval

An automatic lyrics transcription (ALT) evaluation toolkit, released with the Jam-ALT benchmark.

The package implements metrics designed to work well with lyrics formatted according to music industry standards (see the Jam-ALT annotation guide), namely:

  • A word error rate (WER) computed on text tokenized in a way that accounts for non-standard spellings common in song lyrics.
  • A case error rate, measuring the rate of incorrectly predicted letter case.
  • Precision, recall and F-score for symbols important for written lyrics:
    • Punctuation
    • Parentheses (used to delimit background vocals)
    • Line breaks
    • Section breaks (i.e. double line breaks)

Under the hood, the text is pre-processed using the sacremoses tokenizer and punctuation normalizer. Note that apostrophes and single quotes are never treated as quotation marks, but as part of a word, marking an elision or a contraction. For writing systems that do not use spaces to separate words (Chinese, Japanese, Thai, Lao, Burmese, …), each character is considered as a separate word, as per Radford et al. (2022). See the test cases for examples of how different languages are tokenized.

Usage

Install the package with pip install alt-eval.

To compute the metrics:

from alt_eval import compute_metrics
compute_metrics(references, hypotheses)

where references and hypotheses are lists of strings. To specify the language (English by default), use the languages parameter, passing either a single language code, or a list of language codes corresponding to individual examples.

For JamALT, use:

from datasets import load_dataset
dataset = load_dataset("audioshake/jam-alt")["test"]
compute_metrics(dataset["text"], transcriptions, languages=dataset["language"])

Use visualize_errors=True to also get a list of HTML snippets that can be used to visualize the errors in each transcript.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alt-eval-1.1.0.tar.gz (10.3 kB view hashes)

Uploaded Source

Built Distribution

alt_eval-1.1.0-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page