A Python package implementing Paraboth with some improvements: https://aclanthology.org/2023.swisstext-1.3.pdf.
Project description
Diarization Evaluation
This project provides tools for evaluating ASR (Automatic Speech Recognition) predictions against ground truth, using both standard metrics and paraphrase-based metrics. The following steps are performed:
- Text normalization.
- Sentence emebdding via an embedding model.
- Sentence alignment via dynamic time warping aka DTW.
- Paraphrasing of the sentences via an LLM.
- Selection of the best paraphrases via the word error rate.
- Calculation of the paraphrase-based metrics.
Installation
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
Usage
Normalizer
Please have a look at normalizer.py to see how the text normalization is done.
Standard Metrics (metrics.py)
To calculate standard metrics (WER and BLEU) on corpus:
python metrics.py --gt <path_to_ground_truth_file> --pred <path_to_predictions_file>
Parameters:
--gt: Path to the ground truth text file (required)--pred: Path to the predictions text file (required)
Example:
python metrics.py --gt 01_eval_gt.txt --pred 01_eval_gladia.txt
Paraphrase-based Metrics (paraboth.py)
To calculate paraphrase-based metrics: python paraboth.py --gt <path_to_ground_truth_file> --pred <path_to_predictions_file> [optional_parameters]
Parameters:
--gt: Path to ground truth text file (required)--pred: Path to predictions text file (required)--n_paraphrases: Number of paraphrases to generate (default: 3)--paraphrase_gt: Whether to paraphrase ground truth (default: True)--paraphrase_pred: Whether to paraphrase predictions (default: True)--window_size: Window size for sentence combinations (default: 2)--min_matching_value: Minimum matching value for sentence alignment (default: 0.5)
Example:
python paraboth.py --gt 01_eval_gt.txt --pred 01_eval_gladia.txt --n_paraphrases 5
This command will:
- Generate 5 paraphrases for each sentence
- Paraphrase the ground truth (default behavior)
- Calculate various metrics including ParaBLEU and ParaWER
Results will be saved in a TSV file named with the prediction file and timestamp.
Output
Both scripts will print metrics to the console. paraboth.py will also save detailed results to a TSV file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paraboth-0.1.2.tar.gz.
File metadata
- Download URL: paraboth-0.1.2.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f93eaafd74a96aa48b954722dfb25f98c5b5dbec54aef96069f87d777a8150f
|
|
| MD5 |
010d86e6d7b447b064d953187f17561a
|
|
| BLAKE2b-256 |
1dcda1c1f7206c951b10fc0bcbabfc423ce565d66ee9701ae962b52496a2b826
|
File details
Details for the file paraboth-0.1.2-py3-none-any.whl.
File metadata
- Download URL: paraboth-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9468aae270340addea503a7077df96e2e3c4e7adbc4cfbc050ddf8b582980871
|
|
| MD5 |
066366a0d858019f7cf4a4afe0e52e27
|
|
| BLAKE2b-256 |
410267732aa9788688e1c0a165f4f7b681dbbf6150d9bdbbc671faffafd3ea4e
|