Skip to main content

No project description provided

Project description

TTSDS - Text-to-Speech Distribution Score

PyPI - Version Hugginface Space

TTSDS is a comprehensive benchmark for evaluating the quality of synthetic speech in Text-to-Speech (TTS) systems. It assesses multiple aspects of speech quality including prosody, speaker identity, and intelligibility by comparing synthetic speech with both real speech and noise datasets.

Version 2.1.0

We are excited to release TTSDS 2.1.0! TTSDS2 is multilingual and updated quarterly, with a new dataset every time: you can view the results at https://ttsdsbenchmark.com#leaderboard.

Features

  • Multi-dimensional Evaluation: Assess speech quality across different categories:

    • Prosody (e.g., pitch, speaking rate)
    • Speaker Identity (e.g., speaker verification)
    • Intelligibility (e.g., speech recognition)
    • Generic Features (e.g., embeddings)
    • Environment (e.g., noise robustness)
  • Weighted Scoring: Customizable weights for different evaluation categories

  • Progress Tracking: Real-time progress display with detailed statistics

  • Caching: Efficient caching of intermediate results

  • Error Handling: Robust error handling with optional skipping of failed benchmarks

Installation

System Requirements

# Required system packages
sudo apt-get install ffmpeg automake autoconf unzip sox gfortran subversion libtool

Python Installation

# Basic installation
pip install ttsds

Optional: Fairseq Installation

If you encounter dependency conflicts with fairseq, use this fork:

pip install git+https://github.com/MiniXC/fairseq-noconf

Usage

Basic Example

from ttsds import BenchmarkSuite
from ttsds.util.dataset import Dataset

# Initialize datasets
datasets = [
    Dataset("path/to/your/dataset", name="your_dataset")
]
reference_datasets = [
    Dataset("path/to/reference/dataset", name="reference")
]

# Create benchmark suite
suite = BenchmarkSuite(
    datasets=datasets,
    reference_datasets=reference_datasets,
    write_to_file="results.csv",  # Optional: save results to CSV
    skip_errors=True,  # Optional: skip failed benchmarks
    include_environment=False,  # Optional: exclude environment benchmarks
)

# Run benchmarks
results = suite.run()

# Get aggregated results with weighted scores
aggregated = suite.get_aggregated_results()
print(aggregated)

The datasets should be directories containing wav files. Since this is a distributional score, the wav files do not need to include the same content, and the number of files can vary between datasets. However, results are best when the speaker identities are the same.

Custom Category Weights

from ttsds.benchmarks.benchmark import BenchmarkCategory

suite = BenchmarkSuite(
    datasets=datasets,
    reference_datasets=reference_datasets,
    category_weights={
        BenchmarkCategory.SPEAKER: 0.25,
        BenchmarkCategory.INTELLIGIBILITY: 0.25,
        BenchmarkCategory.PROSODY: 0.25,
        BenchmarkCategory.GENERIC: 0.25,
        BenchmarkCategory.ENVIRONMENT: 0.0,
    },
)

Multilingual

suite = BenchmarkSuite(
    datasets=datasets,
    reference_datasets=reference_datasets,
    multilingual=True,
)

Progress Display

The benchmark suite provides a real-time progress display showing:

  • Overall progress
  • Per-benchmark completion status
  • Estimated time remaining
  • Error messages (if any)

Configuration

Environment Variables

# Set cache directory (default: ~/.cache/ttsds)
export TTSDS_CACHE_DIR=/path/to/cache

Benchmark Categories

  • Speaker: Evaluates speaker identity preservation
  • Intelligibility: Measures speech recognition performance
  • Prosody: Assesses speech rhythm and intonation
  • Generic: General speech quality metrics
  • Environment: Noise robustness evaluation - this is excluded by default, set include_environment=True to include it.

Results

The benchmark results include:

  • Individual benchmark scores
  • Category-wise aggregated scores
  • Overall weighted score
  • Time taken for each benchmark
  • Reference and noise dataset information

Results can be saved to a CSV file for further analysis.

Citation

@misc{minixhofer2024ttsdstexttospeechdistribution,
      title={TTSDS -- Text-to-Speech Distribution Score}, 
      author={Christoph Minixhofer and Ondřej Klejch and Peter Bell},
      year={2024},
      eprint={2407.12707},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2407.12707}, 
}

License

ttsds is distributed under the terms of the MIT license.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttsds-2.1.1.tar.gz (5.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttsds-2.1.1-py3-none-any.whl (5.3 MB view details)

Uploaded Python 3

File details

Details for the file ttsds-2.1.1.tar.gz.

File metadata

  • Download URL: ttsds-2.1.1.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for ttsds-2.1.1.tar.gz
Algorithm Hash digest
SHA256 59a21475fde76f16fce83f5fc9765211a338dd4fe6858b4eee6d61b74a9efac5
MD5 1d75a75c47d776a697fbb6d1860f4819
BLAKE2b-256 01f7b3ebffb67b5dd29d4b84ecf6f1bf008b93e29299485d4373d56533daa0e2

See more details on using hashes here.

File details

Details for the file ttsds-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: ttsds-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for ttsds-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d2dbb8d2455442db24eb0fa564518ed05139cc9099321d0baa600d8c7c573df4
MD5 eb8cd7ec2f57bfd9f09aa694fc7e25b1
BLAKE2b-256 8df54aefe90e41a081c5579244821129ce50072424d9b74394a35cb74c8a4351

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page