Text-to-speech using neural audio codec and causal language models

These details have not been verified by PyPI

Project links

Project description

Kani-TTS

A simple and efficient text-to-speech library using neural audio codecs and causal language models.

Features

Simple, intuitive API
Built on Hugging Face Transformers and NVIDIA NeMo
High-quality audio generation using neural codecs
GPU acceleration support

Installation

From PyPI (once published)

pip install kani-tts

From source

git clone https://github.com/yourusername/kani-tts.git
cd kani-tts
pip install -e .

Optional dependencies

For saving audio files:

pip install kani-tts[audio]

For development:

pip install kani-tts[dev]

Quick Start

from kani_tts import KaniTTS

# Initialize model (replace with your model name)
model = KaniTTS('your-model-name-here')

# Generate audio from text
audio, text = model("Hello, world!")

# Save to file (requires soundfile)
model.save_audio(audio, "output.wav")

Advanced Usage

Custom Configuration

from kani_tts import KaniTTS

model = KaniTTS(
    'your-model-name',
    temperature=0.7,           # Control randomness (default: 0.6)
    top_p=0.9,                 # Nucleus sampling (default: 0.95)
    max_new_tokens=2000,       # Max audio length (default: 1800)
    repetition_penalty=1.2,    # Prevent repetition (default: 1.1)
)

audio, text = model("Your text here")

Working with Audio Output

The generated audio is a NumPy array sampled at 22kHz:

import numpy as np
import soundfile as sf

audio, text = model("Generate speech from this text")

# Audio is a numpy array
print(audio.shape)  # (num_samples,)
print(audio.dtype)  # float32/float64

# Save using soundfile
sf.write('output.wav', audio, 22050)

# Or use the built-in method
model.save_audio(audio, 'output.wav', sample_rate=22050)

Batch Processing

texts = [
    "First sentence to synthesize.",
    "Second sentence to synthesize.",
    "Third sentence to synthesize."
]

for i, text in enumerate(texts):
    audio, _ = model(text)
    model.save_audio(audio, f"output_{i}.wav")

Architecture

Kani-TTS uses a two-stage architecture:

Text → Audio Tokens: A causal language model generates audio token sequences from text
Audio Tokens → Waveform: NVIDIA NeMo's nano codec decodes tokens into audio waveforms

The system uses special tokens to mark different segments:

Text boundaries (start/end of text)
Speech boundaries (start/end of speech)
Speaker turns (human/AI)

Audio tokens are organized in 4-channel codebooks, with each channel representing different aspects of the audio signal.

Requirements

Python 3.10 or higher
CUDA-capable GPU (recommended) or CPU
PyTorch 2.0 or higher
Transformers library
NeMo Toolkit

Model Compatibility

This library works with causal language models trained for TTS with the following characteristics:

Extended vocabulary including audio tokens
Special tokens for speech/text boundaries
Compatible with NeMo nano codec (22kHz, 0.6kbps, 12.5fps)

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use Kani-TTS in your research, please cite:

@software{kani_tts,
  title = {Kani-TTS: Text-to-Speech using Neural Audio Codec},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/kani-tts}
}

Acknowledgments

Built on Hugging Face Transformers
Uses NVIDIA NeMo audio codec
Powered by PyTorch

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Jan 21, 2026

0.0.4

Nov 3, 2025

0.0.3 yanked

Nov 2, 2025

Reason this release was yanked:

old version of nemo

This version

0.0.1 yanked

Nov 1, 2025

Reason this release was yanked:

test version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kani_tts-0.0.1.tar.gz (7.0 kB view details)

Uploaded Nov 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kani_tts-0.0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Nov 1, 2025 Python 3

File details

Details for the file kani_tts-0.0.1.tar.gz.

File metadata

Download URL: kani_tts-0.0.1.tar.gz
Upload date: Nov 1, 2025
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for kani_tts-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`629839bce7dbc19ae9c95ca983b646ed2a8d83af9ae7acc902bdc8668c80b23d`
MD5	`604f82647fa712bc13f2f3eddfb848f6`
BLAKE2b-256	`8073e09a87bdf8ab9951a52881459f99e52c621d0d67881800db3021edc5626b`

See more details on using hashes here.

File details

Details for the file kani_tts-0.0.1-py3-none-any.whl.

File metadata

Download URL: kani_tts-0.0.1-py3-none-any.whl
Upload date: Nov 1, 2025
Size: 7.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for kani_tts-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bcf926365e97ef408a29aa35516665289b1e16c51552733959b8c0a79c0d0e05`
MD5	`b63dcb32f6b8303d210dddddb3db99f1`
BLAKE2b-256	`73875e3c2c108a86958f2a4a5058093485adbf409b59fd7babb8eb2bcb4bbb4f`

See more details on using hashes here.

kani-tts 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Kani-TTS

Features

Installation

From PyPI (once published)

From source

Optional dependencies

Quick Start

Advanced Usage

Custom Configuration

Working with Audio Output

Batch Processing

Architecture

Requirements

Model Compatibility

License

Contributing

Citation

Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes