Skip to main content

Effective evaluations for Text-to-Speech (TTS) systems

Project description

AudioEvals

A comprehensive tool for evaluating generated TTS (Text-to-Speech) audio datasets with multiple evaluation metrics.

Evaluation Types

WER (Word Error Rate)

Measures the accuracy of speech-to-text transcription by comparing generated audio against ground truth transcripts.

AudioBox Aesthetics

Evaluates audio quality using AudioBox's aesthetic scoring system, providing metrics for:

  • CE (Content Enjoyment)
  • CU (Content Usefulness)
  • PC (Production Complexity)
  • PQ (Production Quality)

VAD (Voice Activity Detection) Silence

Detects unnaturally long silences in generated audio using Silero VAD with RMS analysis. Provides:

  • Maximum silence duration per file
  • Total duration analysis
  • Silence-to-speech ratio calculations

Using as a Library

Installation

pip install audioevals

Basic Usage

import asyncio
from audioevals.evals import wer_eval, audiobox_eval, vad_eval
from audioevals.utils.audio import AudioData

# Load audio data
audio_data = AudioData.from_wav_file("/path/to/audio.wav")
transcript = "Hello world, this is a test."

WER Evaluation

# Using file path
wer_result = await wer_eval.run_single_file("/path/to/audio.wav", transcript)
print(f"WER: {wer_result['wer_score']:.2f}%")
print(f"STT: {wer_result['stt_transcript']}")
print(f"Words Per Second: {wer_result['words_per_second']}")

# Using AudioData instance
wer_result = await wer_eval.run_audio_data(audio_data, transcript)
print(f"WER: {wer_result['wer_score']:.2f}%")

AudioBox Aesthetics Evaluation

# Using file path
audiobox_result = audiobox_eval.run_single_file("/path/to/audio.wav")
print(f"Content Enjoyment: {audiobox_result['CE']:.2f}")
print(f"Production Quality: {audiobox_result['PQ']:.2f}")

# Using AudioData instance
audiobox_result = audiobox_eval.run_audio_data(audio_data)
print(f"Content Enjoyment: {audiobox_result['CE']:.2f}")

VAD Silence Evaluation

# Using file path
vad_result = vad_eval.run_single_file("/path/to/audio.wav")
print(f"Max silence duration: {vad_result['max_silence_duration']:.2f}s")
print(f"Silence/Speech ratio: {vad_result['silence_to_speech_ratio']:.2f}")

# Using AudioData instance
vad_result = vad_eval.run_audio_data(audio_data)
print(f"Max silence duration: {vad_result['max_silence_duration']:.2f}s")

Complete Example

import asyncio
from audioevals.evals import wer_eval, audiobox_eval, vad_eval
from audioevals.utils.audio import AudioData

async def evaluate_audio_file(file_path, transcript):
    """Complete evaluation of an audio file"""
    
    # Load audio data once
    audio_data = AudioData.from_wav_file(file_path)
    
    # Run all evaluations
    wer_result = await wer_eval.run_audio_data(audio_data, transcript)
    audiobox_result = audiobox_eval.run_audio_data(audio_data)
    vad_result = vad_eval.run_audio_data(audio_data)
    
    return {
        'wer': wer_result,
        'audiobox': audiobox_result,
        'vad': vad_result
    }

# Usage
results = asyncio.run(evaluate_audio_file(
    "/path/to/audio.wav", 
    "Hello world, this is a test."
))

print(f"WER: {results['wer']['wer_score']:.2f}%")
print(f"AudioBox PQ: {results['audiobox']['PQ']:.2f}")
print(f"Max silence: {results['vad']['max_silence_duration']:.2f}s")

Dataset Structure (CLI usage)

The audioevals CLI expects datasets to be structured in a folder, in the following way:

{folder_name}/
├── audios/
│   ├── audio1.wav
│   ├── audio2.wav
│   └── ...
└── transcripts.json

Where transcripts.json should be a map of audio file name to its ground truth transcript, such as:

{
  "001.wav": "He shouted, 'Everyone, please gather 'round! Here's the plan: 1) Set-up at 9:15 a.m.; 2) Lunch at 12:00 p.m. (please RSVP!); 3) Playing — e.g., games, music, etc. — from 1:15 to 4:45; and 4) Clean-up at 5 p.m.'",
  "002.wav": "Hey! What's up? Don't be shy, what can I do for you, cutie?",
  "003.wav": "I'm so excited to see you! I've been waiting for this moment for so long!",
  "004.wav": "What is the difference between weather and climate, and how do scientists study and predict both? Please explain the factors that influence weather patterns and how climate change affects long-term weather trends.",
  "005.wav": "I'm so sad to hear that. I'm here for you. What can I do to help?",
  "006.wav": "She let out a sudden (laughs) at the joke.",
  "007.wav": "He breathed a long (sighs) of relief when the test ended.",
  "008.wav": "Uhh, I'm not sure what to say. hmm... I'm just, ugh, a little bit confused."
}

CLI Usage

You can run evaluations on the dataset by running:

audioevals --dataset {folder_name}

The results will be printed to console as well as saved to {folder_name}/results.json for inspection via something like jupyter notebook.

Running Specific Evaluations

By default, the tool will run all the available evaluations, like WER, AudioBox aesthetics, VAD Silence. But it's possible to run only a select few with the --evals flag:

audioevals --dataset {folder_name} --evals wer vad

Available options are: wer, audiobox, vad

Output

Results are saved to {folder_name}/results.json and include:

  • Metadata about the evaluation run
  • Individual file results for each evaluation type
  • Summary statistics and averages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audioevals-0.6.0.tar.gz (20.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audioevals-0.6.0-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file audioevals-0.6.0.tar.gz.

File metadata

  • Download URL: audioevals-0.6.0.tar.gz
  • Upload date:
  • Size: 20.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for audioevals-0.6.0.tar.gz
Algorithm Hash digest
SHA256 4b1454b07c2cfff5397e42cb3ec0667d784fc0d86a227ea7b61e3a5502a63c70
MD5 5c1c0fddd392f9c5078d7c4efe216911
BLAKE2b-256 9d509232a6cd5ffca90b07a4634896706a34fc031c96a748e81f8da618c5ea2f

See more details on using hashes here.

File details

Details for the file audioevals-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: audioevals-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for audioevals-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 65e22d72326afb0ecbc21ded6b8f7a8c27dd6b5ad86f003e8bd89cd4faac4384
MD5 998f6b637db120e5f1d13b9326952678
BLAKE2b-256 174f90e39234c68b7a7eb2e5b68e2a9a7b1b7ea307c41764a772bf481c22a2df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page