High-fidelity speech synthesis for Ukrainian using modern neural networks.
Project description
Text-to-Speech for Ukrainian
High-fidelity speech synthesis for Ukrainian using modern neural networks.
Statuses
Demo
Check out our demo on Hugging Face space or just listen to samples here.
Features
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
Installation
# Install from PyPI
pip install tts-uk
# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk
# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment
Read uv's installation section.
Also, you can download the repository as a ZIP archive.
Getting started
Code example:
import torchaudio
from tts_uk.inference import synthesis
sampling_rate = 44_100
# Perform the synthesis, `synthesis` function returns:
# - mels: Mel spectrograms of the generated audio.
# - wave: The synthesized waveform by a Vocoder as a PyTorch tensor.
# - stats: A dictionary containing synthesis statistics (processing time, duration, speech rate, etc).
mels, wave, stats = synthesis(
text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
voice="tetiana", # tetiana, mykyta, lada
n_takes=1,
use_latest_take=False,
token_dur_scaling=1,
f0_mean=0,
f0_std=0,
energy_mean=0,
energy_std=0,
sigma_decoder=0.8,
sigma_token_duration=0.666,
sigma_f0=1,
sigma_energy=1,
)
print(stats)
# Save the generated audio to a WAV file.
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")
Use these Google colabs:
- CPU inference
- GPU inference on T4 card (long document to synthesize)
Or run synthesis in a terminal:
uv run example.py
If you need to synthesize articles we recommend consider wtpsplit.
Get help and support
Please feel free to connect with us using the Issues section.
License
Code has the MIT license.
Model authors
Acoustic
Vocoder
Community
- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
Also, follow our Speech-UK initiative on Hugging Face!
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tts_uk-1.3.7.tar.gz.
File metadata
- Download URL: tts_uk-1.3.7.tar.gz
- Upload date:
- Size: 859.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ffe120939f72e2fe3fbe99eca219a250e5ff983c824807cfba70aee0019db75
|
|
| MD5 |
eba4ced47f4c9fa7ed61c61aee447f0c
|
|
| BLAKE2b-256 |
a5c30f828ae40ea050358524fc0da56e36134efa90a465ea72b8d88eb3e21bd4
|
File details
Details for the file tts_uk-1.3.7-py3-none-any.whl.
File metadata
- Download URL: tts_uk-1.3.7-py3-none-any.whl
- Upload date:
- Size: 56.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
809b29cc54017e18decf872cac585e2e42d8563ee8bdf0035ce7ab606a928c4f
|
|
| MD5 |
8f09fdc7c06b92e8a190dc9f7fb2b090
|
|
| BLAKE2b-256 |
f6d655c26c9a431ec4ebdb6877fd0d12c30c1672da1801057a2c917bd0c5431d
|