Ominix TTS: A multilingual TTS system

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Ominix-TTS: Advanced Multilingual Text-to-Speech with Voice Cloning

Ominix-TTS is a cutting-edge text-to-speech synthesis framework that transforms input text into natural-sounding speech using a sophisticated two-stage pipeline. The system excels in producing high-quality audio across multiple languages with voice cloning capabilities.

Key Features

Two-Stage Synthesis Pipeline: First converts text to semantic tokens, then transforms these tokens into audio waveforms
Multilingual Support: Handles Chinese, English, Japanese, Korean, and Cantonese with both pure and mixed-language modes
Voice Cloning: Replicates voice characteristics from a short reference audio sample
Voice Fusion: Combines multiple reference voices for custom voice creation
High-Quality Output: Produces natural-sounding speech with proper prosody and intonation
Configurable Parameters: Offers control over speed, temperature, and other synthesis qualities

Language Codes in Ominix-TTS

Here's a comprehensive table of all language codes supported by the Ominix-TTS system:

Language Code	Description	Recognition Type
`"en"`	Pure English	English only processing
`"zh"`	Mixed Chinese-English	Chinese-English hybrid processing
`"all_zh"`	Pure Chinese	Chinese only processing
`"yue"`	Mixed Cantonese-English	Cantonese-English hybrid processing
`"all_yue"`	Pure Cantonese	Cantonese only processing
`"ja"`	Mixed Japanese-English	Japanese-English hybrid processing
`"all_ja"`	Pure Japanese	Japanese only processing
`"ko"`	Mixed Korean-English	Korean-English hybrid processing
`"all_ko"`	Pure Korean	Korean only processing
`"auto"`	Auto-detect language	Multi-language detection and processing
`"auto_yue"`	Auto-detect with Cantonese support	Multi-language detection including Cantonese

Technical Architecture

Ominix-TTS operates through coordinated specialized models:

BERT Models: Extract linguistic features from input text
CNHuBERT: Processes reference audio to capture voice characteristics
Text2Semantic Model: Converts text features into semantic tokens
SoVITS Model: Transforms semantic tokens into audio waveforms

The system supports different model versions (v1, v2, v3) with increasing capabilities and language support, allowing users to balance between quality, speed, and resource requirements.

Applications

Ideal for creating audiobooks, virtual assistants, accessibility tools, content localization, and any application requiring high-quality speech synthesis with the ability to match specific voice characteristics.

Usage

Please install ffmpeg. ffmpeg is used to decode the reference audio file.
- For MacOS:
```
brew install ffmpeg 
```
Recommend to create one virtual environment to run tests and examples

conda create -n TTS python=3.9
conda activate TTS

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

May 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ominix_tts-0.1.0.tar.gz (4.2 MB view details)

Uploaded May 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ominix_tts-0.1.0-py3-none-any.whl (4.3 MB view details)

Uploaded May 5, 2025 Python 3

File details

Details for the file ominix_tts-0.1.0.tar.gz.

File metadata

Download URL: ominix_tts-0.1.0.tar.gz
Upload date: May 5, 2025
Size: 4.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ominix_tts-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e6a5f1f609fd746513c5abdfecb27aca5c11c8912944a7827d17187ed2b03cf1`
MD5	`ad96bddf73fae74049f098ed10ef715a`
BLAKE2b-256	`a143bbd59fc56c472d8f67ed47e3bf68d1bd224118fd4a3024b9f21b7cf097c4`

See more details on using hashes here.

File details

Details for the file ominix_tts-0.1.0-py3-none-any.whl.

File metadata

Download URL: ominix_tts-0.1.0-py3-none-any.whl
Upload date: May 5, 2025
Size: 4.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ominix_tts-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97fef31928186e9acd6cb3f18d34c39567d12bd92651f9bff8805b6b52167617`
MD5	`1060af5c034018ee317252a13e093a68`
BLAKE2b-256	`c173ddadd01e1d02ddc63fae5393dfa62b29f1bf383782495ba4dab2cdbe32c9`

See more details on using hashes here.

ominix-tts 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ominix-TTS: Advanced Multilingual Text-to-Speech with Voice Cloning

Key Features

Language Codes in Ominix-TTS

Technical Architecture

Applications

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes