Skip to main content

Chiluka - A lightweight TTS inference package based on StyleTTS2

Project description

Chiluka

Chiluka (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.

Features

  • Simple, clean API for TTS synthesis
  • Fully self-contained - all models bundled in the package
  • Style transfer from reference audio
  • Multi-language support via phonemizer
  • No external dependencies on other repos
  • Multiple distribution methods - HuggingFace Hub, PyTorch Hub, pip install

Installation

Option 1: pip install (Recommended)

pip install chiluka

Option 2: Install from GitHub

pip install git+https://github.com/Seemanth/chiluka.git

Option 3: From Source

git clone https://github.com/Seemanth/chiluka.git
cd chiluka
pip install -e .

Note: If cloning with Git LFS for bundled weights:

# Install Git LFS first
sudo apt-get install git-lfs  # Ubuntu/Debian
# or: brew install git-lfs    # macOS

git lfs install
git lfs clone https://github.com/Seemanth/chiluka.git

Install espeak-ng (Required)

# Ubuntu/Debian
sudo apt-get install espeak-ng

# macOS
brew install espeak-ng

Quick Start

Method 1: Auto-download from HuggingFace Hub (Recommended)

No need to clone the repo or download weights manually - they download automatically!

from chiluka import Chiluka

# Automatically downloads model weights on first use
tts = Chiluka.from_pretrained()

# Synthesize speech
wav = tts.synthesize(
    text="Hello, this is Chiluka speaking!",
    reference_audio="path/to/reference.wav",
    language="en"
)

# Save to file
tts.save_wav(wav, "output.wav")

Method 2: PyTorch Hub

import torch

# Load directly via torch.hub
tts = torch.hub.load('Seemanth/chiluka', 'chiluka')

# Synthesize
wav = tts.synthesize(
    text="Hello from PyTorch Hub!",
    reference_audio="reference.wav",
    language="en"
)

Method 3: From Specific HuggingFace Repository

from chiluka import Chiluka

# Load from a specific HuggingFace repo
tts = Chiluka.from_pretrained("Seemanth/chiluka-tts")

# Or from a custom/fine-tuned model
tts = Chiluka.from_pretrained("someuser/custom-chiluka-model")

Method 4: Local Weights (if you cloned with Git LFS)

from chiluka import Chiluka

# Uses bundled weights from the cloned repo
tts = Chiluka()

wav = tts.synthesize(
    text="Hello world!",
    reference_audio="reference.wav",
    language="en"
)

Multi-Language Examples

Telugu

from chiluka import Chiluka

tts = Chiluka.from_pretrained()

wav = tts.synthesize(
    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
    reference_audio="path/to/telugu_reference.wav",
    language="te"
)

tts.save_wav(wav, "telugu_output.wav")

Hindi

wav = tts.synthesize(
    text="नमस्ते, मैं चिलुका बोल रहा हूं",
    reference_audio="hindi_reference.wav",
    language="hi"
)

API Reference

Loading the Model

# Auto-download (recommended for most users)
tts = Chiluka.from_pretrained()

# From specific HuggingFace repo
tts = Chiluka.from_pretrained("username/model-name")

# With options
tts = Chiluka.from_pretrained(
    repo_id="username/model-name",  # HuggingFace repo
    device="cuda",                   # or "cpu"
    force_download=False,            # Re-download even if cached
    token="hf_xxx"                   # For private repos
)

# Local weights (if cloned with Git LFS)
tts = Chiluka(
    config_path=None,       # Uses bundled config
    checkpoint_path=None,   # Uses bundled checkpoint
    pretrained_dir=None,    # Uses bundled pretrained models
    device=None             # Auto-detect cuda/cpu
)

synthesize()

wav = tts.synthesize(
    text="Hello world",           # Text to synthesize
    reference_audio="ref.wav",    # Reference audio for style
    language="en",                # Language code
    alpha=0.3,                    # Acoustic style mixing (0-1)
    beta=0.7,                     # Prosodic style mixing (0-1)
    diffusion_steps=5,            # Quality vs speed tradeoff
    embedding_scale=1.0,          # Classifier-free guidance
    sr=24000                      # Sample rate
)

Other Methods

# Save audio to file
tts.save_wav(wav, "output.wav", sr=24000)

# Play audio (requires pyaudio)
tts.play(wav, sr=24000)

# Get style embedding from audio
style = tts.compute_style("reference.wav", sr=24000)

Synthesis Parameters

Parameter Default Description
alpha 0.3 Acoustic style mixing (0=reference only, 1=predicted only)
beta 0.7 Prosodic style mixing (0=reference only, 1=predicted only)
diffusion_steps 5 Diffusion sampling steps (more = better quality, slower)
embedding_scale 1.0 Classifier-free guidance scale

Supported Languages

Uses phonemizer with espeak-ng:

Language Code
English (US) en-us
English (UK) en-gb
Telugu te
Hindi hi
Tamil ta
Kannada kn

See espeak-ng documentation for full list.

Hub Utilities

Clear Cache

from chiluka import clear_cache

# Clear all cached models
clear_cache()

# Clear specific repo cache
clear_cache("username/model-name")

Push Your Own Model to HuggingFace

from chiluka import push_to_hub

push_to_hub(
    local_dir="./my-trained-model",
    repo_id="myusername/my-chiluka-model",
    token="hf_your_token"
)

Get Cache Directory

from chiluka import get_cache_dir

print(get_cache_dir())  # ~/.cache/chiluka

Environment Variables

Variable Description
CHILUKA_CACHE Custom cache directory (default: ~/.cache/chiluka)
HF_TOKEN HuggingFace API token for private repos

Requirements

  • Python >= 3.8
  • PyTorch >= 1.13.0
  • CUDA (recommended for faster inference)
  • espeak-ng

Package Structure

chiluka/
├── chiluka/
│   ├── __init__.py
│   ├── inference.py          # Main Chiluka API
│   ├── hub.py                # HuggingFace Hub utilities
│   ├── text_utils.py
│   ├── utils.py
│   ├── configs/
│   ├── checkpoints/
│   ├── pretrained/
│   └── models/
├── hubconf.py                # PyTorch Hub config
├── examples/
├── setup.py
└── README.md

Training Your Own Model

This package is for inference only. To train your own model, use the original StyleTTS2 repository.

After training:

  1. Copy your checkpoint to a directory
  2. Push to HuggingFace Hub using push_to_hub()
  3. Load with Chiluka.from_pretrained("your-repo")

Credits

Based on StyleTTS2 by Yinghao Aaron Li et al.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chiluka-0.1.0.tar.gz (44.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chiluka-0.1.0-py3-none-any.whl (47.3 kB view details)

Uploaded Python 3

File details

Details for the file chiluka-0.1.0.tar.gz.

File metadata

  • Download URL: chiluka-0.1.0.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for chiluka-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0897a284d13a472104c579b20ab03b20106ec83ed364214ed7d636aaaf605838
MD5 1d8c4e57b41338a8e130aad3fa30db03
BLAKE2b-256 af0a9420bb193117813dc3fbc8121960a33a2072dca2b3e5df90156b7efb5154

See more details on using hashes here.

File details

Details for the file chiluka-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chiluka-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for chiluka-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a78f4eb2f1edbf42b840df056d75d01491bb940c81a02c25a8d31a86b09ddfec
MD5 14d3c3e6304591ada29c2100bce1341c
BLAKE2b-256 d8799f887b7be3fd3d2337b0c19303650b73825565f44784aeaedcabbf9e04df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page