Skip to main content

A tool for visualizing static and dynamic vowel spaces during text-to-speech model training

Project description

TTSVowelViz

TTSVowelViz is a tool for visualizing static and dynamic vowel spaces during the training of text-to-speech ( TTS) models. This helps researchers and developers monitor the progression of vowel quality over training steps.


✨ Features

  • 📊 Visualize static and dynamic vowel spaces across training steps.
  • 📈 Track vowel space evolution during model training.
  • 🔍 Examine the shape of the learned vowel space.
  • 🆚 Compare learned vowel spaces at training steps against the ground truth.
  • 🛠️ Customize visualizations with various user-defined inputs and configurations.
  • 🧠 Analyze, evaluate, and interpret TTS systems.
  • 🧩 Easily integrate into TTS training pipelines with minimal effort.

📦 Installation

Install using pip:

pip install ttsvowelviz

Or install the latest version from source:

git clone https://github.com/pasindu-ud/ttsvowelviz.git
cd ttsvowelviz
pip install .

🔧 Usage

Basic Example

from typing import List, Union

from ttsvowelviz import Synthesizer, TTSVowelViz
from ttsvowelviz.forced_aligner import ForcedAligner, WebMAUSBasicAligner
from ttsvowelviz.formant_extractor import FormantExtractor, PraatFormantExtractor


class ExampleSynthesizer(Synthesizer):
    def synthesize(self, step: int, text: str) -> str:
        # Code to generate speech from text at a given step
        return "Path to the synthesized audio file"


static_vowels: List[str] = ["3:", "6", "6:", "I", "O", "U", "e", "i:", "o:", "{", "}:"]
static_time_points: List[Union[int, float]] = [50]
point_vowels: List[str] = ["i:", "o:", "6:"]
dynamic_vowels: List[str] = ["@}", "Ae", "e:", "oI", "{I", "{O"]
dynamic_time_points: List[Union[int, float]] = [20, 50, 80]
intermediate_steps: List[int] = [0, 1000, 3000]
synthesizer: Synthesizer = ExampleSynthesizer()
forced_aligner: ForcedAligner = WebMAUSBasicAligner(language="eng-NZ")
formant_extractor: FormantExtractor = PraatFormantExtractor()
text_list: List[str] = ["Heard foot hud heed head had hard hod thought goose hid heard.",
                        "How'd hear oat hide lloyd hare aid how'd."]
ground_truth_src_dir_path: str = "Path to the ground truth directory"
vowel_space_dst_dir_path: str = "Path to the directory where vowel spaces should be saved"

tool: TTSVowelViz = TTSVowelViz(static_vowels=static_vowels, static_time_points=static_time_points,
                                point_vowels=point_vowels, dynamic_vowels=dynamic_vowels,
                                dynamic_time_points=dynamic_time_points, intermediate_steps=intermediate_steps,
                                synthesizer=synthesizer, forced_aligner=forced_aligner,
                                formant_extractor=formant_extractor, text_list=text_list,
                                ground_truth_src_dir_path=ground_truth_src_dir_path,
                                vowel_space_dst_dir_path=vowel_space_dst_dir_path)
for s in intermediate_steps:
    tool.execute(step=s)

📚 Citation

If you use this tool in your research, please cite:

@inproceedings{ttsvowelviz-2026,
  author    = {Pasindu Udawatta and Jesin James and B.T. Balamurali and Catherine I. Watson and Ake Nicholas and Binu Abeysinghe},
  title     = {{TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces}},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference},
  month     = {May},
  year      = {2026},
  address   = {Palma, Spain},
  publisher = {European Language Resources Association},
  url       = {https://pypi.org/project/ttsvowelviz/}}

📄 License

MIT License. See LICENSE file for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttsvowelviz-0.1.1.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttsvowelviz-0.1.1-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file ttsvowelviz-0.1.1.tar.gz.

File metadata

  • Download URL: ttsvowelviz-0.1.1.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ttsvowelviz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 899bb86eabc42325b560825aced1ccd8f63d337c351f86146a797dc0d2a658bf
MD5 47b3b27621561790f110fe310d7668f2
BLAKE2b-256 602a6369cc30cdaab7970e3edf67360a7c2c861ae69146f64d5689e45cfbc088

See more details on using hashes here.

File details

Details for the file ttsvowelviz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ttsvowelviz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ttsvowelviz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b977237fbb6dbd1924fc6edd613c960178ec923b9ce4071c1f5a892da74d0e5d
MD5 24582f13ade4ce7a6e741fcd17dc30df
BLAKE2b-256 34d05fd1b4d0c80a5ef1b2543c66f2ecda37501f4436f5c63b1c7bf97c9819ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page