Skip to main content

A Python package implementing Paraboth with some improvements: https://aclanthology.org/2023.swisstext-1.3.pdf.

Project description

Diarization Evaluation

This project provides tools for evaluating ASR (Automatic Speech Recognition) predictions against ground truth, using both standard metrics and paraphrase-based metrics. The following steps are performed:

  1. Text normalization.
  2. Sentence emebdding via an embedding model.
  3. Sentence alignment via dynamic time warping aka DTW.
  4. Paraphrasing of the sentences via an LLM.
  5. Selection of the best paraphrases via the word error rate.
  6. Calculation of the paraphrase-based metrics.

Installation

  1. Clone this repository
  2. Install dependencies:
pip install -r requirements.txt

Usage

Normalizer

Please have a look at normalizer.py to see how the text normalization is done.

Standard Metrics (metrics.py)

To calculate standard metrics (WER and BLEU) on corpus:

python metrics.py --gt <path_to_ground_truth_file> --pred <path_to_predictions_file>

Parameters:

  • --gt: Path to the ground truth text file (required)
  • --pred: Path to the predictions text file (required)

Example:

python metrics.py --gt 01_eval_gt.txt --pred 01_eval_gladia.txt

Paraphrase-based Metrics (paraboth.py)

To calculate paraphrase-based metrics: python paraboth.py --gt <path_to_ground_truth_file> --pred <path_to_predictions_file> [optional_parameters]

Parameters:

  • --gt: Path to ground truth text file (required)
  • --pred: Path to predictions text file (required)
  • --n_paraphrases: Number of paraphrases to generate (default: 3)
  • --paraphrase_gt: Whether to paraphrase ground truth (default: True)
  • --paraphrase_pred: Whether to paraphrase predictions (default: True)
  • --window_size: Window size for sentence combinations (default: 2)
  • --min_matching_value: Minimum matching value for sentence alignment (default: 0.5)

Example:

python paraboth.py --gt 01_eval_gt.txt --pred 01_eval_gladia.txt --n_paraphrases 5

This command will:

  1. Generate 5 paraphrases for each sentence
  2. Paraphrase the ground truth (default behavior)
  3. Calculate various metrics including ParaBLEU and ParaWER

Results will be saved in a TSV file named with the prediction file and timestamp.

Output

Both scripts will print metrics to the console. paraboth.py will also save detailed results to a TSV file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paraboth-0.1.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paraboth-0.1.2-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file paraboth-0.1.2.tar.gz.

File metadata

  • Download URL: paraboth-0.1.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for paraboth-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2f93eaafd74a96aa48b954722dfb25f98c5b5dbec54aef96069f87d777a8150f
MD5 010d86e6d7b447b064d953187f17561a
BLAKE2b-256 1dcda1c1f7206c951b10fc0bcbabfc423ce565d66ee9701ae962b52496a2b826

See more details on using hashes here.

File details

Details for the file paraboth-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: paraboth-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for paraboth-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9468aae270340addea503a7077df96e2e3c4e7adbc4cfbc050ddf8b582980871
MD5 066366a0d858019f7cf4a4afe0e52e27
BLAKE2b-256 410267732aa9788688e1c0a165f4f7b681dbbf6150d9bdbbc671faffafd3ea4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page