Chiluka - A lightweight TTS inference package based on StyleTTS2
Project description
Chiluka
Chiluka (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.
Features
- Simple, clean API for TTS synthesis
- Fully self-contained - all models bundled in the package
- Style transfer from reference audio
- Multi-language support via phonemizer
- No external dependencies on other repos
- Multiple distribution methods - HuggingFace Hub, PyTorch Hub, pip install
Installation
Option 1: pip install (Recommended)
pip install chiluka
Option 2: Install from GitHub
pip install git+https://github.com/Seemanth/chiluka.git
Option 3: From Source
git clone https://github.com/Seemanth/chiluka.git
cd chiluka
pip install -e .
Note: If cloning with Git LFS for bundled weights:
# Install Git LFS first
sudo apt-get install git-lfs # Ubuntu/Debian
# or: brew install git-lfs # macOS
git lfs install
git lfs clone https://github.com/Seemanth/chiluka.git
Install espeak-ng (Required)
# Ubuntu/Debian
sudo apt-get install espeak-ng
# macOS
brew install espeak-ng
Quick Start
Method 1: Auto-download from HuggingFace Hub (Recommended)
No need to clone the repo or download weights manually - they download automatically!
from chiluka import Chiluka
# Automatically downloads model weights on first use
tts = Chiluka.from_pretrained()
# Synthesize speech
wav = tts.synthesize(
text="Hello, this is Chiluka speaking!",
reference_audio="path/to/reference.wav",
language="en"
)
# Save to file
tts.save_wav(wav, "output.wav")
Method 2: PyTorch Hub
import torch
# Load directly via torch.hub
tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
# Synthesize
wav = tts.synthesize(
text="Hello from PyTorch Hub!",
reference_audio="reference.wav",
language="en"
)
Method 3: From Specific HuggingFace Repository
from chiluka import Chiluka
# Load from a specific HuggingFace repo
tts = Chiluka.from_pretrained("Seemanth/chiluka-tts")
# Or from a custom/fine-tuned model
tts = Chiluka.from_pretrained("someuser/custom-chiluka-model")
Method 4: Local Weights (if you cloned with Git LFS)
from chiluka import Chiluka
# Uses bundled weights from the cloned repo
tts = Chiluka()
wav = tts.synthesize(
text="Hello world!",
reference_audio="reference.wav",
language="en"
)
Multi-Language Examples
Telugu
from chiluka import Chiluka
tts = Chiluka.from_pretrained()
wav = tts.synthesize(
text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
reference_audio="path/to/telugu_reference.wav",
language="te"
)
tts.save_wav(wav, "telugu_output.wav")
Hindi
wav = tts.synthesize(
text="नमस्ते, मैं चिलुका बोल रहा हूं",
reference_audio="hindi_reference.wav",
language="hi"
)
API Reference
Loading the Model
# Auto-download (recommended for most users)
tts = Chiluka.from_pretrained()
# From specific HuggingFace repo
tts = Chiluka.from_pretrained("username/model-name")
# With options
tts = Chiluka.from_pretrained(
repo_id="username/model-name", # HuggingFace repo
device="cuda", # or "cpu"
force_download=False, # Re-download even if cached
token="hf_xxx" # For private repos
)
# Local weights (if cloned with Git LFS)
tts = Chiluka(
config_path=None, # Uses bundled config
checkpoint_path=None, # Uses bundled checkpoint
pretrained_dir=None, # Uses bundled pretrained models
device=None # Auto-detect cuda/cpu
)
synthesize()
wav = tts.synthesize(
text="Hello world", # Text to synthesize
reference_audio="ref.wav", # Reference audio for style
language="en", # Language code
alpha=0.3, # Acoustic style mixing (0-1)
beta=0.7, # Prosodic style mixing (0-1)
diffusion_steps=5, # Quality vs speed tradeoff
embedding_scale=1.0, # Classifier-free guidance
sr=24000 # Sample rate
)
Other Methods
# Save audio to file
tts.save_wav(wav, "output.wav", sr=24000)
# Play audio (requires pyaudio)
tts.play(wav, sr=24000)
# Get style embedding from audio
style = tts.compute_style("reference.wav", sr=24000)
Synthesis Parameters
| Parameter | Default | Description |
|---|---|---|
alpha |
0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
beta |
0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
diffusion_steps |
5 | Diffusion sampling steps (more = better quality, slower) |
embedding_scale |
1.0 | Classifier-free guidance scale |
Supported Languages
Uses phonemizer with espeak-ng:
| Language | Code |
|---|---|
| English (US) | en-us |
| English (UK) | en-gb |
| Telugu | te |
| Hindi | hi |
| Tamil | ta |
| Kannada | kn |
See espeak-ng documentation for full list.
Hub Utilities
Clear Cache
from chiluka import clear_cache
# Clear all cached models
clear_cache()
# Clear specific repo cache
clear_cache("username/model-name")
Push Your Own Model to HuggingFace
from chiluka import push_to_hub
push_to_hub(
local_dir="./my-trained-model",
repo_id="myusername/my-chiluka-model",
token="hf_your_token"
)
Get Cache Directory
from chiluka import get_cache_dir
print(get_cache_dir()) # ~/.cache/chiluka
Environment Variables
| Variable | Description |
|---|---|
CHILUKA_CACHE |
Custom cache directory (default: ~/.cache/chiluka) |
HF_TOKEN |
HuggingFace API token for private repos |
Requirements
- Python >= 3.8
- PyTorch >= 1.13.0
- CUDA (recommended for faster inference)
- espeak-ng
Package Structure
chiluka/
├── chiluka/
│ ├── __init__.py
│ ├── inference.py # Main Chiluka API
│ ├── hub.py # HuggingFace Hub utilities
│ ├── text_utils.py
│ ├── utils.py
│ ├── configs/
│ ├── checkpoints/
│ ├── pretrained/
│ └── models/
├── hubconf.py # PyTorch Hub config
├── examples/
├── setup.py
└── README.md
Training Your Own Model
This package is for inference only. To train your own model, use the original StyleTTS2 repository.
After training:
- Copy your checkpoint to a directory
- Push to HuggingFace Hub using
push_to_hub() - Load with
Chiluka.from_pretrained("your-repo")
Credits
Based on StyleTTS2 by Yinghao Aaron Li et al.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chiluka-0.1.0.tar.gz.
File metadata
- Download URL: chiluka-0.1.0.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0897a284d13a472104c579b20ab03b20106ec83ed364214ed7d636aaaf605838
|
|
| MD5 |
1d8c4e57b41338a8e130aad3fa30db03
|
|
| BLAKE2b-256 |
af0a9420bb193117813dc3fbc8121960a33a2072dca2b3e5df90156b7efb5154
|
File details
Details for the file chiluka-0.1.0-py3-none-any.whl.
File metadata
- Download URL: chiluka-0.1.0-py3-none-any.whl
- Upload date:
- Size: 47.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a78f4eb2f1edbf42b840df056d75d01491bb940c81a02c25a8d31a86b09ddfec
|
|
| MD5 |
14d3c3e6304591ada29c2100bce1341c
|
|
| BLAKE2b-256 |
d8799f887b7be3fd3d2337b0c19303650b73825565f44784aeaedcabbf9e04df
|