Lightweight streaming text-to-speech with Kokoro engine

These details have not been verified by PyPI

Project links

Project description

streaming-tts

Lightweight streaming text-to-speech with Kokoro engine.

Features

Streaming TTS: Real-time audio synthesis with callback support
Voice Blending: Mix multiple voices with weighted formulas
Pause Tags: Insert natural pauses with [pause:1.5s] syntax
Text Normalization: Convert URLs, emails, numbers, money to spoken form
Smart Chunking: Token-aware text splitting for optimal quality
Multi-Format Output: Export to WAV, MP3, Opus, FLAC, AAC (with [audio] extra)
54 Voices: American, British English + 7 other languages

Installation

pip install streaming-tts

For development installation:

pip install -e .

Optional Extras

# Japanese language support
pip install streaming-tts[jp]

# Chinese language support
pip install streaming-tts[zh]

# Korean language support
pip install streaming-tts[ko]

# All features
pip install streaming-tts[all]

Note: Non-English languages require espeak-ng to be installed on your system.

Quick Start

from streaming_tts import TextToAudioStream, KokoroEngine

# Initialize the engine
engine = KokoroEngine(voice="af_heart")

# Create stream and play
stream = TextToAudioStream(engine)
stream.feed("Hello, world! This is a test of streaming text to speech.").play()

Pause Tags

Insert natural pauses in your text:

text = "Hello! [pause:1s] How are you? [pause:500ms] I hope you're well."
stream.feed(text).play()

Text Normalization

Automatically convert special content to spoken form:

from streaming_tts import normalize_text, NormalizationOptions

options = NormalizationOptions(normalize=True)

# URLs, emails, numbers, money, etc.
text = "Visit https://example.com or email user@test.com. Price: $42.50"
normalized = normalize_text(text, options)
# -> "Visit https example dot com or email user at test dot com. Price: forty-two dollars and fifty cents"

Smart Chunking

Split long text into optimal chunks for TTS:

from streaming_tts import smart_split, process_text_with_pauses
import time

# Process text with pauses
for item in process_text_with_pauses(text, normalize=True):
    if isinstance(item, float):
        time.sleep(item)  # Pause
    else:
        stream.feed(item).play()  # Speak

Multi-Format Audio Export

from streaming_tts import StreamingAudioWriter

# Requires: pip install streaming-tts[audio]
writer = StreamingAudioWriter("mp3", sample_rate=24000)

for audio_chunk in audio_chunks:
    mp3_data = writer.write_chunk(audio_chunk)
    # Stream or save mp3_data

final_data = writer.write_chunk(finalize=True)
writer.close()

Usage with Callbacks

from streaming_tts import TextToAudioStream, KokoroEngine

def on_audio_chunk(chunk):
    # Process audio chunk (e.g., send over websocket)
    pass

def on_stream_stop():
    print("Audio stream finished")

engine = KokoroEngine(voice="af_heart")
stream = TextToAudioStream(engine, on_audio_stream_stop=on_stream_stop)
stream.feed("Hello world").play(muted=True, on_audio_chunk=on_audio_chunk)

Available Voices

American English (lang_code='a')

Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
Male: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British English (lang_code='b')

Female: bf_alice, bf_emma, bf_isabella, bf_lily
Male: bm_daniel, bm_fable, bm_george, bm_lewis

Other Languages

Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
Spanish: ef_dora, em_alex, em_santa
French: ff_siwis
Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
Italian: if_sara, im_nicola
Portuguese: pf_dora, pm_alex, pm_santa

Voice Blending

You can blend multiple voices using weighted formulas:

engine = KokoroEngine(voice="0.3*af_sarah + 0.7*am_adam")

API Reference

KokoroEngine

KokoroEngine(
    voice="af_heart",        # Voice name or blend formula
    default_speed=1.0,       # Speech speed multiplier
    trim_silence=True,       # Remove silence from audio
    debug=False              # Enable debug output
)

TextToAudioStream

TextToAudioStream(
    engine,                      # KokoroEngine instance
    on_audio_stream_start=None,  # Callback when audio starts
    on_audio_stream_stop=None,   # Callback when audio stops
    on_audio_chunk=None,         # Callback for each audio chunk
    on_word=None,                # Callback for word timing
    muted=False                  # Disable speaker output
)

Requirements

Python 3.9-3.12
PyAudio (may require system dependencies)
Torch

Windows Note

PyAudio on Windows may require Visual C++ Build Tools. If you encounter issues:

pip install pipwin
pipwin install pyaudio

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.8

Dec 17, 2025

0.3.6

Dec 17, 2025

0.3.5

Dec 17, 2025

0.3.4

Dec 16, 2025

0.3.3

Dec 16, 2025

This version

0.3.0

Dec 16, 2025

0.2.7

Dec 15, 2025

0.2.6

Dec 15, 2025

0.2.4

Dec 15, 2025

0.1.0

Dec 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streaming_tts-0.3.0.tar.gz (56.7 kB view details)

Uploaded Dec 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

streaming_tts-0.3.0-py3-none-any.whl (62.1 kB view details)

Uploaded Dec 16, 2025 Python 3

File details

Details for the file streaming_tts-0.3.0.tar.gz.

File metadata

Download URL: streaming_tts-0.3.0.tar.gz
Upload date: Dec 16, 2025
Size: 56.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`1e4d75d4ec48d63fa85ed67d5e244fedfea464fb80f95aefeb04772b425f8972`
MD5	`160eaac099f7621249bcb5ddbc9ae5ff`
BLAKE2b-256	`f70b9f93d27def97ab5e412d01e432e60868b92251d673b7f0a8861caa7ca26d`

See more details on using hashes here.

File details

Details for the file streaming_tts-0.3.0-py3-none-any.whl.

File metadata

Download URL: streaming_tts-0.3.0-py3-none-any.whl
Upload date: Dec 16, 2025
Size: 62.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e7c40491d715340b4c92dd47f1bb5ed862a31c48365e2695c876c8d2ab58d6e`
MD5	`b88d259dff3162362a6bd6e87e3429f8`
BLAKE2b-256	`fad8d06df50bf57508e9aa2474a789cd273e59c67ce30fdff64d9624ac6690a1`

See more details on using hashes here.

streaming-tts 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

streaming-tts

Features

Installation

Optional Extras

Quick Start

Pause Tags

Text Normalization

Smart Chunking

Multi-Format Audio Export

Usage with Callbacks

Available Voices

American English (lang_code='a')

British English (lang_code='b')

Other Languages

Voice Blending

API Reference

KokoroEngine

TextToAudioStream

Requirements

Windows Note

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes