Ominix TTS: A multilingual TTS system
Project description
Ominix-TTS: Advanced Multilingual Text-to-Speech with Voice Cloning
Ominix-TTS is a cutting-edge text-to-speech synthesis framework that transforms input text into natural-sounding speech using a sophisticated two-stage pipeline. The system excels in producing high-quality audio across multiple languages with voice cloning capabilities.
Key Features
- Two-Stage Synthesis Pipeline: First converts text to semantic tokens, then transforms these tokens into audio waveforms
- Multilingual Support: Handles Chinese, English, Japanese, Korean, and Cantonese with both pure and mixed-language modes
- Voice Cloning: Replicates voice characteristics from a short reference audio sample
- Voice Fusion: Combines multiple reference voices for custom voice creation
- High-Quality Output: Produces natural-sounding speech with proper prosody and intonation
- Configurable Parameters: Offers control over speed, temperature, and other synthesis qualities
Language Codes in Ominix-TTS
Here's a comprehensive table of all language codes supported by the Ominix-TTS system:
| Language Code | Description | Recognition Type |
|---|---|---|
"en" |
Pure English | English only processing |
"zh" |
Mixed Chinese-English | Chinese-English hybrid processing |
"all_zh" |
Pure Chinese | Chinese only processing |
"yue" |
Mixed Cantonese-English | Cantonese-English hybrid processing |
"all_yue" |
Pure Cantonese | Cantonese only processing |
"ja" |
Mixed Japanese-English | Japanese-English hybrid processing |
"all_ja" |
Pure Japanese | Japanese only processing |
"ko" |
Mixed Korean-English | Korean-English hybrid processing |
"all_ko" |
Pure Korean | Korean only processing |
"auto" |
Auto-detect language | Multi-language detection and processing |
"auto_yue" |
Auto-detect with Cantonese support | Multi-language detection including Cantonese |
Technical Architecture
Ominix-TTS operates through coordinated specialized models:
- BERT Models: Extract linguistic features from input text
- CNHuBERT: Processes reference audio to capture voice characteristics
- Text2Semantic Model: Converts text features into semantic tokens
- SoVITS Model: Transforms semantic tokens into audio waveforms
The system supports different model versions (v1, v2, v3) with increasing capabilities and language support, allowing users to balance between quality, speed, and resource requirements.
Applications
Ideal for creating audiobooks, virtual assistants, accessibility tools, content localization, and any application requiring high-quality speech synthesis with the ability to match specific voice characteristics.
Usage
-
Please install
ffmpeg. ffmpeg is used to decode the reference audio file.- For MacOS:
brew install ffmpeg -
Recommend to create one virtual environment to run tests and examples
conda create -n TTS python=3.9
conda activate TTS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ominix_tts-0.1.0.tar.gz.
File metadata
- Download URL: ominix_tts-0.1.0.tar.gz
- Upload date:
- Size: 4.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6a5f1f609fd746513c5abdfecb27aca5c11c8912944a7827d17187ed2b03cf1
|
|
| MD5 |
ad96bddf73fae74049f098ed10ef715a
|
|
| BLAKE2b-256 |
a143bbd59fc56c472d8f67ed47e3bf68d1bd224118fd4a3024b9f21b7cf097c4
|
File details
Details for the file ominix_tts-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ominix_tts-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97fef31928186e9acd6cb3f18d34c39567d12bd92651f9bff8805b6b52167617
|
|
| MD5 |
1060af5c034018ee317252a13e093a68
|
|
| BLAKE2b-256 |
c173ddadd01e1d02ddc63fae5393dfa62b29f1bf383782495ba4dab2cdbe32c9
|