Skip to main content

Lightweight streaming text-to-speech with Kokoro engine

Project description

streaming-tts.jpg


streaming-tts

Lightweight streaming text-to-speech with Kokoro engine.

Features

  • Streaming TTS: Real-time audio synthesis with callback support
  • Voice Blending: Mix multiple voices with weighted formulas
  • Pause Tags: Insert natural pauses with [pause:1.5s] syntax
  • Text Normalization: Convert URLs, emails, numbers, money to spoken form
  • Smart Chunking: Token-aware text splitting for optimal quality
  • Multi-Format Output: Export to WAV, MP3, Opus, FLAC, AAC (with [audio] extra)
  • 54 Voices: American, British English + 7 other languages

Installation

pip install streaming-tts

For development installation:

pip install -e .

Optional Extras

# Japanese language support
pip install streaming-tts[jp]

# Chinese language support
pip install streaming-tts[zh]

# Korean language support
pip install streaming-tts[ko]

# All features
pip install streaming-tts[all]

Note: Non-English languages require espeak-ng to be installed on your system.

Quick Start

from streaming_tts import TextToAudioStream, KokoroEngine

# Initialize the engine
engine = KokoroEngine(voice="af_heart")

# Create stream and play
stream = TextToAudioStream(engine)
stream.feed("Hello, world! This is a test of streaming text to speech.").play()

Pause Tags

Insert natural pauses in your text:

text = "Hello! [pause:1s] How are you? [pause:500ms] I hope you're well."
stream.feed(text).play()

Text Normalization

Automatically convert special content to spoken form:

from streaming_tts import normalize_text, NormalizationOptions

options = NormalizationOptions(normalize=True)

# URLs, emails, numbers, money, etc.
text = "Visit https://example.com or email user@test.com. Price: $42.50"
normalized = normalize_text(text, options)
# -> "Visit https example dot com or email user at test dot com. Price: forty-two dollars and fifty cents"

Smart Chunking

Split long text into optimal chunks for TTS:

from streaming_tts import smart_split, process_text_with_pauses
import time

# Process text with pauses
for item in process_text_with_pauses(text, normalize=True):
    if isinstance(item, float):
        time.sleep(item)  # Pause
    else:
        stream.feed(item).play()  # Speak

Multi-Format Audio Export

from streaming_tts import StreamingAudioWriter

# Requires: pip install streaming-tts[audio]
writer = StreamingAudioWriter("mp3", sample_rate=24000)

for audio_chunk in audio_chunks:
    mp3_data = writer.write_chunk(audio_chunk)
    # Stream or save mp3_data

final_data = writer.write_chunk(finalize=True)
writer.close()

Usage with Callbacks

from streaming_tts import TextToAudioStream, KokoroEngine

def on_audio_chunk(chunk):
    # Process audio chunk (e.g., send over websocket)
    pass

def on_stream_stop():
    print("Audio stream finished")

engine = KokoroEngine(voice="af_heart")
stream = TextToAudioStream(engine, on_audio_stream_stop=on_stream_stop)
stream.feed("Hello world").play(muted=True, on_audio_chunk=on_audio_chunk)

Available Voices

American English (lang_code='a')

  • Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
  • Male: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British English (lang_code='b')

  • Female: bf_alice, bf_emma, bf_isabella, bf_lily
  • Male: bm_daniel, bm_fable, bm_george, bm_lewis

Other Languages

  • Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
  • Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
  • Spanish: ef_dora, em_alex, em_santa
  • French: ff_siwis
  • Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
  • Italian: if_sara, im_nicola
  • Portuguese: pf_dora, pm_alex, pm_santa

Voice Blending

You can blend multiple voices using weighted formulas:

engine = KokoroEngine(voice="0.3*af_sarah + 0.7*am_adam")

API Reference

KokoroEngine

KokoroEngine(
    voice="af_heart",        # Voice name or blend formula
    default_speed=1.0,       # Speech speed multiplier
    trim_silence=True,       # Remove silence from audio
    debug=False              # Enable debug output
)

TextToAudioStream

TextToAudioStream(
    engine,                      # KokoroEngine instance
    on_audio_stream_start=None,  # Callback when audio starts
    on_audio_stream_stop=None,   # Callback when audio stops
    on_audio_chunk=None,         # Callback for each audio chunk
    on_word=None,                # Callback for word timing
    muted=False                  # Disable speaker output
)

Requirements

  • Python 3.9-3.12
  • PyAudio (may require system dependencies)
  • Torch

Windows Note

PyAudio on Windows may require Visual C++ Build Tools. If you encounter issues:

pip install pipwin
pipwin install pyaudio

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streaming_tts-0.3.0.tar.gz (56.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streaming_tts-0.3.0-py3-none-any.whl (62.1 kB view details)

Uploaded Python 3

File details

Details for the file streaming_tts-0.3.0.tar.gz.

File metadata

  • Download URL: streaming_tts-0.3.0.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1e4d75d4ec48d63fa85ed67d5e244fedfea464fb80f95aefeb04772b425f8972
MD5 160eaac099f7621249bcb5ddbc9ae5ff
BLAKE2b-256 f70b9f93d27def97ab5e412d01e432e60868b92251d673b7f0a8861caa7ca26d

See more details on using hashes here.

File details

Details for the file streaming_tts-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: streaming_tts-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 62.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for streaming_tts-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e7c40491d715340b4c92dd47f1bb5ed862a31c48365e2695c876c8d2ab58d6e
MD5 b88d259dff3162362a6bd6e87e3429f8
BLAKE2b-256 fad8d06df50bf57508e9aa2474a789cd273e59c67ce30fdff64d9624ac6690a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page