Skip to main content

Ominix TTS: A multilingual TTS system

Project description

Ominix-TTS: Advanced Multilingual Text-to-Speech with Voice Cloning

Ominix-TTS is a cutting-edge text-to-speech synthesis framework that transforms input text into natural-sounding speech using a sophisticated two-stage pipeline. The system excels in producing high-quality audio across multiple languages with voice cloning capabilities.

Key Features

  • Two-Stage Synthesis Pipeline: First converts text to semantic tokens, then transforms these tokens into audio waveforms
  • Multilingual Support: Handles Chinese, English, Japanese, Korean, and Cantonese with both pure and mixed-language modes
  • Voice Cloning: Replicates voice characteristics from a short reference audio sample
  • Voice Fusion: Combines multiple reference voices for custom voice creation
  • High-Quality Output: Produces natural-sounding speech with proper prosody and intonation
  • Configurable Parameters: Offers control over speed, temperature, and other synthesis qualities

Language Codes in Ominix-TTS

Here's a comprehensive table of all language codes supported by the Ominix-TTS system:

Language Code Description Recognition Type
"en" Pure English English only processing
"zh" Mixed Chinese-English Chinese-English hybrid processing
"all_zh" Pure Chinese Chinese only processing
"yue" Mixed Cantonese-English Cantonese-English hybrid processing
"all_yue" Pure Cantonese Cantonese only processing
"ja" Mixed Japanese-English Japanese-English hybrid processing
"all_ja" Pure Japanese Japanese only processing
"ko" Mixed Korean-English Korean-English hybrid processing
"all_ko" Pure Korean Korean only processing
"auto" Auto-detect language Multi-language detection and processing
"auto_yue" Auto-detect with Cantonese support Multi-language detection including Cantonese

Technical Architecture

Ominix-TTS operates through coordinated specialized models:

  • BERT Models: Extract linguistic features from input text
  • CNHuBERT: Processes reference audio to capture voice characteristics
  • Text2Semantic Model: Converts text features into semantic tokens
  • SoVITS Model: Transforms semantic tokens into audio waveforms

The system supports different model versions (v1, v2, v3) with increasing capabilities and language support, allowing users to balance between quality, speed, and resource requirements.

Applications

Ideal for creating audiobooks, virtual assistants, accessibility tools, content localization, and any application requiring high-quality speech synthesis with the ability to match specific voice characteristics.

Usage

  1. Please install ffmpeg. ffmpeg is used to decode the reference audio file.

    • For MacOS:
    brew install ffmpeg 
    
  2. Recommend to create one virtual environment to run tests and examples

conda create -n TTS python=3.9
conda activate TTS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ominix_tts-0.1.0.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ominix_tts-0.1.0-py3-none-any.whl (4.3 MB view details)

Uploaded Python 3

File details

Details for the file ominix_tts-0.1.0.tar.gz.

File metadata

  • Download URL: ominix_tts-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ominix_tts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e6a5f1f609fd746513c5abdfecb27aca5c11c8912944a7827d17187ed2b03cf1
MD5 ad96bddf73fae74049f098ed10ef715a
BLAKE2b-256 a143bbd59fc56c472d8f67ed47e3bf68d1bd224118fd4a3024b9f21b7cf097c4

See more details on using hashes here.

File details

Details for the file ominix_tts-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ominix_tts-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ominix_tts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97fef31928186e9acd6cb3f18d34c39567d12bd92651f9bff8805b6b52167617
MD5 1060af5c034018ee317252a13e093a68
BLAKE2b-256 c173ddadd01e1d02ddc63fae5393dfa62b29f1bf383782495ba4dab2cdbe32c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page