A tool for visualizing static and dynamic vowel spaces during text-to-speech model training
Project description
TTSVowelViz
TTSVowelViz is a tool for visualizing static and dynamic vowel spaces during the training of text-to-speech ( TTS) models. This helps researchers and developers monitor the progression of vowel quality over training steps.
✨ Features
- 📊 Visualize static and dynamic vowel spaces across training steps.
- 📈 Track vowel space evolution during model training.
- 🔍 Examine the shape of the learned vowel space.
- 🆚 Compare learned vowel spaces at training steps against the ground truth.
- 🛠️ Customize visualizations with various user-defined inputs and configurations.
- 🧠 Analyze, evaluate, and interpret TTS systems.
- 🧩 Easily integrate into TTS training pipelines with minimal effort.
📦 Installation
Install using pip:
pip install ttsvowelviz
Or install the latest version from source:
git clone https://github.com/pasindu-ud/ttsvowelviz.git
cd ttsvowelviz
pip install .
🔧 Usage
Basic Example
from typing import List, Union
from ttsvowelviz import Synthesizer, TTSVowelViz
from ttsvowelviz.forced_aligner import ForcedAligner, WebMAUSBasicAligner
from ttsvowelviz.formant_extractor import FormantExtractor, PraatFormantExtractor
class ExampleSynthesizer(Synthesizer):
def synthesize(self, step: int, text: str) -> str:
# Code to generate speech from text at a given step
return "Path to the synthesized audio file"
static_vowels: List[str] = ["3:", "6", "6:", "I", "O", "U", "e", "i:", "o:", "{", "}:"]
static_time_points: List[Union[int, float]] = [50]
point_vowels: List[str] = ["i:", "o:", "6:"]
dynamic_vowels: List[str] = ["@}", "Ae", "e:", "oI", "{I", "{O"]
dynamic_time_points: List[Union[int, float]] = [20, 50, 80]
intermediate_steps: List[int] = [0, 1000, 3000]
synthesizer: Synthesizer = ExampleSynthesizer()
forced_aligner: ForcedAligner = WebMAUSBasicAligner(language="eng-NZ")
formant_extractor: FormantExtractor = PraatFormantExtractor()
text_list: List[str] = ["Heard foot hud heed head had hard hod thought goose hid heard.",
"How'd hear oat hide lloyd hare aid how'd."]
ground_truth_src_dir_path: str = "Path to the ground truth directory"
vowel_space_dst_dir_path: str = "Path to the directory where vowel spaces should be saved"
tool: TTSVowelViz = TTSVowelViz(static_vowels=static_vowels, static_time_points=static_time_points,
point_vowels=point_vowels, dynamic_vowels=dynamic_vowels,
dynamic_time_points=dynamic_time_points, intermediate_steps=intermediate_steps,
synthesizer=synthesizer, forced_aligner=forced_aligner,
formant_extractor=formant_extractor, text_list=text_list,
ground_truth_src_dir_path=ground_truth_src_dir_path,
vowel_space_dst_dir_path=vowel_space_dst_dir_path)
for s in intermediate_steps:
tool.execute(step=s)
📚 Citation
If you use this tool in your research, please cite:
@inproceedings{ttsvowelviz-2026,
author = {Pasindu Udawatta and Jesin James and B.T. Balamurali and Catherine I. Watson and Ake Nicholas and Binu Abeysinghe},
title = {{TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel Spaces}},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
month = {May},
year = {2026},
address = {Palma, Spain},
publisher = {European Language Resources Association},
url = {https://pypi.org/project/ttsvowelviz/}}
📄 License
MIT License. See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ttsvowelviz-0.1.1.tar.gz.
File metadata
- Download URL: ttsvowelviz-0.1.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
899bb86eabc42325b560825aced1ccd8f63d337c351f86146a797dc0d2a658bf
|
|
| MD5 |
47b3b27621561790f110fe310d7668f2
|
|
| BLAKE2b-256 |
602a6369cc30cdaab7970e3edf67360a7c2c861ae69146f64d5689e45cfbc088
|
File details
Details for the file ttsvowelviz-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ttsvowelviz-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b977237fbb6dbd1924fc6edd613c960178ec923b9ce4071c1f5a892da74d0e5d
|
|
| MD5 |
24582f13ade4ce7a6e741fcd17dc30df
|
|
| BLAKE2b-256 |
34d05fd1b4d0c80a5ef1b2543c66f2ecda37501f4436f5c63b1c7bf97c9819ac
|