Skip to main content

A tool for visualizing static and dynamic vowel spaces during text-to-speech model training

Project description

TTSVowelViz

TTSVowelViz is a tool for visualizing static and dynamic vowel spaces during the training of text-to-speech ( TTS) models. This helps researchers and developers monitor the progression of vowel quality over training steps.


✨ Features

  • 📊 Visualize static and dynamic vowel spaces across training steps.
  • 📈 Track vowel space evolution during model training.
  • 🔍 Examine the shape of the learned vowel space.
  • 🆚 Compare learned vowel spaces at training steps against the ground truth.
  • 🛠️ Customize visualizations with various user-defined inputs and configurations.
  • 🧠 Analyze, evaluate, and interpret TTS systems.
  • 🧩 Easily integrate into TTS training pipelines with minimal effort.

📦 Installation

Install using pip:

pip install ttsvowelviz

Or install the latest version from source:

git clone https://github.com/pasindu-ud/ttsvowelviz.git
cd ttsvowelviz
pip install .

🔧 Usage

Basic Example

from typing import List, Union

from ttsvowelviz import Synthesizer, TTSVowelViz
from ttsvowelviz.forced_aligner import ForcedAligner, WebMAUSBasicAligner
from ttsvowelviz.formant_extractor import FormantExtractor, PraatFormantExtractor


class ExampleSynthesizer(Synthesizer):
    def synthesize(self, step: int, text: str) -> str:
        # Code to generate speech from text at a given step
        return "Path to the synthesized audio file"


static_vowels: List[str] = ["3:", "6", "6:", "I", "O", "U", "e", "i:", "o:", "{", "}:"]
static_time_points: List[Union[int, float]] = [50]
point_vowels: List[str] = ["i:", "o:", "6:"]
dynamic_vowels: List[str] = ["@}", "Ae", "e:", "oI", "{I", "{O"]
dynamic_time_points: List[Union[int, float]] = [20, 50, 80]
intermediate_steps: List[int] = [0, 1000, 3000]
synthesizer: Synthesizer = ExampleSynthesizer()
forced_aligner: ForcedAligner = WebMAUSBasicAligner(language="eng-NZ")
formant_extractor: FormantExtractor = PraatFormantExtractor()
text_list: List[str] = ["Heard foot hud heed head had hard hod thought goose hid heard.",
                        "How'd hear oat hide lloyd hare aid how'd."]
ground_truth_src_dir_path: str = "Path to the ground truth directory"
vowel_space_dst_dir_path: str = "Path to the directory where vowel spaces should be saved"

tool: TTSVowelViz = TTSVowelViz(static_vowels=static_vowels, static_time_points=static_time_points,
                                point_vowels=point_vowels, dynamic_vowels=dynamic_vowels,
                                dynamic_time_points=dynamic_time_points, intermediate_steps=intermediate_steps,
                                synthesizer=synthesizer, forced_aligner=forced_aligner,
                                formant_extractor=formant_extractor, text_list=text_list,
                                ground_truth_src_dir_path=ground_truth_src_dir_path,
                                vowel_space_dst_dir_path=vowel_space_dst_dir_path)
for s in intermediate_steps:
    tool.execute(step=s)

📚 Citation

If you use this tool in your research, please cite:

@misc{ttsvowelviz2025,
  author = {Pasindu Udawatta and Jesin James and B.T. Balamurali and Catherine I. Watson and Ake Nicholas and Binu Abeysinghe},
  title = {TTSVowelViz},
  year = {2025},
  url = {https://github.com/pasindu-ud/ttsvowelviz}
}

📄 License

MIT License. See LICENSE file for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttsvowelviz-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttsvowelviz-0.1.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file ttsvowelviz-0.1.0.tar.gz.

File metadata

  • Download URL: ttsvowelviz-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ttsvowelviz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6c7f6ce7e98c04db34b2b94d0aad715ba7dc987ebe0a793438d0cbc974601a8b
MD5 d8f8d9cca94a6ae45ae14d5f3b3ae81d
BLAKE2b-256 1e3410f2a2c01d28726c8352fa627de5e93efaff8ca02d212b02c4acf6da1cf4

See more details on using hashes here.

File details

Details for the file ttsvowelviz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ttsvowelviz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ttsvowelviz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e94cb28dfa1430f5e66bad2f9e3ac0a6ab42c6248a3eef28e74aa7f138f7ca5
MD5 39c295b8add2fa56969aec56ded6a834
BLAKE2b-256 675ccdfe0941782a29f146280683fd7b4cf3a2b829c4cca46dcac6c68cce23b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page