Text-to-audio generator using Chatterbox TTS with automatic long text handling
Project description
Text-to-Audio
Text-to-speech generator using Chatterbox TTS with automatic long text handling. Runs locally on CPU, NVIDIA GPU (CUDA), or Apple Silicon (M1-M4).
Installation
uv sync
CLI Usage
# Basic usage
uv run text-to-audio "Hello world"
# Specify output file
uv run text-to-audio "Hello world" -o hello.wav
# Voice cloning with reference audio
uv run text-to-audio "Hello world" -r voice.wav -o output.wav
# Generate from text file (handles long text automatically)
uv run text-to-audio -i input.txt -o output.wav
# Use different models
uv run text-to-audio "Hello" -m standard # Higher quality (default)
uv run text-to-audio "Hello" -m turbo # Faster
uv run text-to-audio "Bonjour" -m multilingual -l fr
# Adjust expressiveness
uv run text-to-audio "Excited text!" -e 0.8 --cfg 0.5
# With emotion tags
uv run text-to-audio "That's funny [laugh] really!"
# Quiet mode (no progress output)
uv run text-to-audio "Hello" -q -o output.wav
CLI Options
| Option | Description | Default |
|---|---|---|
text |
Text to convert (positional) | - |
-i, --input |
Input text file | - |
-o, --output |
Output audio file | output.wav |
-r, --reference |
Reference audio for voice cloning | - |
-m, --model |
Model: turbo, standard, multilingual |
turbo |
-l, --language |
Language code (for multilingual) | en |
-d, --device |
Device: auto, cuda, mps, cpu |
auto |
-e, --exaggeration |
Expressiveness (0.0-1.0) | 0.3 |
--cfg |
CFG weight (0.0-1.0) | 0.3 |
--max-chunk |
Max chars per chunk for long text | 250 |
-q, --quiet |
Suppress progress output | - |
-v, --version |
Show version | - |
Python API
from text_to_audio import TextToAudio
# Initialize
tts = TextToAudio(
model_type="turbo", # or "standard", "multilingual"
device="auto", # or "cuda", "mps", "cpu"
max_chunk_chars=250, # for long text splitting
)
# Generate audio
wav = tts.generate(
text="Your text here. Can be very long - it will be split automatically.",
audio_prompt_path="voice.wav", # optional: voice cloning
exaggeration=0.3,
cfg_weight=0.3,
language="en", # for multilingual model
)
# Save to file
tts.save(wav, "output.wav")
# Access sample rate
print(f"Sample rate: {tts.sample_rate}")
Progress Callback
For long text, track generation progress:
def on_progress(current, total):
print(f"Generating chunk {current}/{total}")
wav = tts.generate(
text=long_text,
progress_callback=on_progress,
)
Long Text Handling
Text longer than max_chunk_chars (default 250) is automatically split at sentence boundaries. Audio chunks are concatenated seamlessly. This prevents quality degradation that occurs when generating very long audio in one pass.
Models
| Model | Description |
|---|---|
turbo |
Fast generation, good quality (default) |
standard |
Higher quality, slower |
multilingual |
23+ languages support |
Supported Languages (Multilingual Model)
Use -m multilingual -l <code>:
Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Turkish
Emotion Tags (Turbo Model)
Paralinguistic tags are native to the Turbo model:
[laugh]- laughter[chuckle]- light laughter[sigh]- sighing[cough]- coughing[gasp]- gasping[groan]- groaning[sniff]- sniffing[shush]- shushing[clear throat]- throat clearing[yawn]- yawning
Example: "That's hilarious [laugh] tell me more!"
Troubleshooting
Perth Watermarker Error
If you see TypeError: 'NoneType' object is not callable related to PerthImplicitWatermarker, the package automatically applies a workaround. This is a known issue when using uv without setuptools.
Device Selection
- Apple Silicon (M1/M2/M3/M4): Uses
mpsautomatically - NVIDIA GPU: Uses
cudaautomatically - CPU fallback: Works on any system
Force a specific device with -d cpu or -d mps.
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_to_audio-1.0.1.tar.gz.
File metadata
- Download URL: text_to_audio-1.0.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6686aec3dc0e4867f25d76b89f8dced7d3dd4451aa1986bb203f3c14652756c
|
|
| MD5 |
6be1e3fc64c62e99e4c4d3d0b994b072
|
|
| BLAKE2b-256 |
558776cc2f1e0f8991ad12c4b50a849fbce2c3568aa0bfa90afe2df65e765d69
|
Provenance
The following attestation bundles were made for text_to_audio-1.0.1.tar.gz:
Publisher:
publish.yml on Kotivskyi/text-to-audio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
text_to_audio-1.0.1.tar.gz -
Subject digest:
b6686aec3dc0e4867f25d76b89f8dced7d3dd4451aa1986bb203f3c14652756c - Sigstore transparency entry: 790910226
- Sigstore integration time:
-
Permalink:
Kotivskyi/text-to-audio@a1f2239cc8f774602c97f91c0d62e14fd4967b9c -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/Kotivskyi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1f2239cc8f774602c97f91c0d62e14fd4967b9c -
Trigger Event:
release
-
Statement type:
File details
Details for the file text_to_audio-1.0.1-py3-none-any.whl.
File metadata
- Download URL: text_to_audio-1.0.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0a98659a7ca4bca8697af8c5f370913e642428f3ade9cc2fac30227d9b1dade
|
|
| MD5 |
953032f5ef7d2027a0d8d315efe6b2ed
|
|
| BLAKE2b-256 |
5fac53f1bc97dd770c9a66f55197c5975ffa3b9d4825b987ea889bcdeb7e3302
|
Provenance
The following attestation bundles were made for text_to_audio-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on Kotivskyi/text-to-audio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
text_to_audio-1.0.1-py3-none-any.whl -
Subject digest:
a0a98659a7ca4bca8697af8c5f370913e642428f3ade9cc2fac30227d9b1dade - Sigstore transparency entry: 790910229
- Sigstore integration time:
-
Permalink:
Kotivskyi/text-to-audio@a1f2239cc8f774602c97f91c0d62e14fd4967b9c -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/Kotivskyi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1f2239cc8f774602c97f91c0d62e14fd4967b9c -
Trigger Event:
release
-
Statement type: