Skip to main content

High-fidelity speech synthesis for Ukrainian using modern neural networks.

Project description

tts_uk

Text-to-Speech for Ukrainian

PyPI Version License MIT PyPI Downloads DOI FOSSA Status

High-fidelity speech synthesis for Ukrainian using modern neural networks.

Statuses

CI Pipeline Dependabot Updates Snyk Security

Demo

HF Space Google Colab

Check out our demo on Hugging Face space or just listen to samples here.

Features

  • Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
  • Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
  • High-fidelity speech generation using the RAD-TTS++ acoustic model;
  • Fast vocoding using Vocos;
  • Synthesizes long sentences effectively;
  • Supports a sampling rate of 44.1 kHz;
  • Tested on Linux environments and Windows/WSL;
  • Python API (requires Python 3.9 or later);
  • CUDA-enabled for GPU acceleration.

Installation

# Install from PyPI
pip install tts-uk

# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk

# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment

Read uv's installation section.

Also, you can download the repository as a ZIP archive.

Getting started

Code example:

import torchaudio

from tts_uk.inference import synthesis

sampling_rate = 44_100

# Perform the synthesis, `synthesis` function returns:
# - mels: Mel spectrograms of the generated audio.
# - wave: The synthesized waveform by a Vocoder as a PyTorch tensor.
# - stats: A dictionary containing synthesis statistics (processing time, duration, speech rate, etc).
mels, wave, stats = synthesis(
    text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
    voice="tetiana",  # tetiana, mykyta, lada
    n_takes=1,
    use_latest_take=False,
    token_dur_scaling=1,
    f0_mean=0,
    f0_std=0,
    energy_mean=0,
    energy_std=0,
    sigma_decoder=0.8,
    sigma_token_duration=0.666,
    sigma_f0=1,
    sigma_energy=1,
)

print(stats)

# Save the generated audio to a WAV file.
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")

Use these Google colabs:

Or run synthesis in a terminal:

uv run example.py

If you need to synthesize articles we recommend consider wtpsplit.

Get help and support

Please feel free to connect with us using the Issues section.

License

Code has the MIT license.

FOSSA Status

Model authors

Acoustic

Vocoder

Community

Discord

Also, follow our Speech-UK initiative on Hugging Face!

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tts_uk-1.3.7.tar.gz (859.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tts_uk-1.3.7-py3-none-any.whl (56.8 kB view details)

Uploaded Python 3

File details

Details for the file tts_uk-1.3.7.tar.gz.

File metadata

  • Download URL: tts_uk-1.3.7.tar.gz
  • Upload date:
  • Size: 859.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for tts_uk-1.3.7.tar.gz
Algorithm Hash digest
SHA256 6ffe120939f72e2fe3fbe99eca219a250e5ff983c824807cfba70aee0019db75
MD5 eba4ced47f4c9fa7ed61c61aee447f0c
BLAKE2b-256 a5c30f828ae40ea050358524fc0da56e36134efa90a465ea72b8d88eb3e21bd4

See more details on using hashes here.

File details

Details for the file tts_uk-1.3.7-py3-none-any.whl.

File metadata

  • Download URL: tts_uk-1.3.7-py3-none-any.whl
  • Upload date:
  • Size: 56.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for tts_uk-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 809b29cc54017e18decf872cac585e2e42d8563ee8bdf0035ce7ab606a928c4f
MD5 8f09fdc7c06b92e8a190dc9f7fb2b090
BLAKE2b-256 f6d655c26c9a431ec4ebdb6877fd0d12c30c1672da1801057a2c917bd0c5431d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page