Skip to main content

Lightweight streaming text-to-speech with Kokoro engine

Project description

streaming-tts.jpg


streaming-tts

Lightweight streaming text-to-speech with Kokoro engine.

Features

  • Streaming TTS: Real-time audio synthesis with callback support
  • Voice Blending: Mix multiple voices with weighted formulas
  • Pause Tags: Insert natural pauses with [pause:1.5s] syntax
  • Text Normalization: Convert URLs, emails, numbers, money to spoken form
  • Smart Chunking: Token-aware text splitting for optimal quality
  • Multi-Format Output: Export to WAV, MP3, Opus, FLAC, AAC (with [audio] extra)
  • 54 Voices: American, British English + 7 other languages

Installation

pip install streaming-tts

For development installation:

pip install -e .

Optional Extras

# Japanese language support
pip install streaming-tts[jp]

# Chinese language support
pip install streaming-tts[zh]

# Korean language support
pip install streaming-tts[ko]

# All features
pip install streaming-tts[all]

Note: Non-English languages require espeak-ng to be installed on your system.

Quick Start

from streaming_tts import TextToAudioStream, KokoroEngine

# Initialize the engine
engine = KokoroEngine(voice="af_heart")

# Create stream and play
stream = TextToAudioStream(engine)
stream.feed("Hello, world! This is a test of streaming text to speech.").play()

Pause Tags

Insert natural pauses in your text:

text = "Hello! [pause:1s] How are you? [pause:500ms] I hope you're well."
stream.feed(text).play()

Text Normalization

Automatically convert special content to spoken form:

from streaming_tts import normalize_text, NormalizationOptions

options = NormalizationOptions(normalize=True)

# URLs, emails, numbers, money, etc.
text = "Visit https://example.com or email user@test.com. Price: $42.50"
normalized = normalize_text(text, options)
# -> "Visit https example dot com or email user at test dot com. Price: forty-two dollars and fifty cents"

Smart Chunking

Split long text into optimal chunks for TTS:

from streaming_tts import smart_split, process_text_with_pauses
import time

# Process text with pauses
for item in process_text_with_pauses(text, normalize=True):
    if isinstance(item, float):
        time.sleep(item)  # Pause
    else:
        stream.feed(item).play()  # Speak

Multi-Format Audio Export

from streaming_tts import StreamingAudioWriter

# Requires: pip install streaming-tts[audio]
writer = StreamingAudioWriter("mp3", sample_rate=24000)

for audio_chunk in audio_chunks:
    mp3_data = writer.write_chunk(audio_chunk)
    # Stream or save mp3_data

final_data = writer.write_chunk(finalize=True)
writer.close()

Usage with Callbacks

from streaming_tts import TextToAudioStream, KokoroEngine

def on_audio_chunk(chunk):
    # Process audio chunk (e.g., send over websocket)
    pass

def on_stream_stop():
    print("Audio stream finished")

engine = KokoroEngine(voice="af_heart")
stream = TextToAudioStream(engine, on_audio_stream_stop=on_stream_stop)
stream.feed("Hello world").play(muted=True, on_audio_chunk=on_audio_chunk)

Available Voices

American English (lang_code='a')

  • Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
  • Male: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British English (lang_code='b')

  • Female: bf_alice, bf_emma, bf_isabella, bf_lily
  • Male: bm_daniel, bm_fable, bm_george, bm_lewis

Other Languages

  • Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
  • Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
  • Spanish: ef_dora, em_alex, em_santa
  • French: ff_siwis
  • Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
  • Italian: if_sara, im_nicola
  • Portuguese: pf_dora, pm_alex, pm_santa

Voice Blending

You can blend multiple voices using weighted formulas:

engine = KokoroEngine(voice="0.3*af_sarah + 0.7*am_adam")

API Reference

KokoroEngine

KokoroEngine(
    voice="af_heart",        # Voice name or blend formula
    default_speed=1.0,       # Speech speed multiplier
    trim_silence=True,       # Remove silence from audio
    debug=False              # Enable debug output
)

TextToAudioStream

TextToAudioStream(
    engine,                      # KokoroEngine instance
    on_audio_stream_start=None,  # Callback when audio starts
    on_audio_stream_stop=None,   # Callback when audio stops
    on_audio_chunk=None,         # Callback for each audio chunk
    on_word=None,                # Callback for word timing
    muted=False                  # Disable speaker output
)

Requirements

  • Python 3.9-3.12
  • PyAudio (may require system dependencies)
  • Torch

Windows Note

PyAudio on Windows may require Visual C++ Build Tools. If you encounter issues:

pip install pipwin
pipwin install pyaudio

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streaming_tts-0.3.8.tar.gz (68.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streaming_tts-0.3.8-py3-none-any.whl (74.2 kB view details)

Uploaded Python 3

File details

Details for the file streaming_tts-0.3.8.tar.gz.

File metadata

  • Download URL: streaming_tts-0.3.8.tar.gz
  • Upload date:
  • Size: 68.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.8.tar.gz
Algorithm Hash digest
SHA256 12f43222ff20af91171a393764f5b57928d3bb88f177588626956d1d856a5f5f
MD5 a2e922186679eba38c4d06e044ca1f59
BLAKE2b-256 ceed02167535095c32ffdcaf512a04dcf5cd6eb3b60bf222e079603cca48213c

See more details on using hashes here.

File details

Details for the file streaming_tts-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: streaming_tts-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 74.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8e94d97384a0a1fceb102677780563e03ba5ed4e42e27f19edf5938a621345bc
MD5 9e9d31e70685dfa189af47facfb1a458
BLAKE2b-256 9ff877cc6db5c4b589bd94154fc9a419d0494201d6cd000d143c7373b91578db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page