Skip to main content

A low latency text to speech streaming tool that efficiently converts streamed input text into audio. Ideal for applications requiring instant and dynamic audio feedback.

Reason this release was yanked:

New interface

Project description

RealtimeTTS

Stream text into audio with an easy-to-use, highly configurable library delivering voice output with minimal latency.

About the project

Quickly transform input streams into immediate auditory output by efficiently detecting sentence fragments.

Ideal for applications requiring on-the-spot audio feedback.

Hint: Looking for a way to convert voice audio input into text? Check out RealtimeSTT, the perfect input counterpart for this library. Together, they form a powerful realtime audio wrapper around large language model outputs.

Features

  • Real-time Streaming: Stream text as you generate or input it, without waiting for the entire content.
  • Dynamic Feedback: Ideal for applications and scenarios where immediate audio response is pivotal.
  • Modular Engine Design: Supports custom TTS engines with system tts, azure and elevenlabs engines provided to get you started.
  • Character-by-character Processing: Allows for true real-time feedback as characters are read and synthesized in a stream.
  • Sentence Segmentation: Efficiently detects sentence boundaries and synthesizes content for natural sounding output.

Installation

pip install RealtimeTTS

Quick Start

Here's a basic usage example:

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

Feed Text

You can feed individual strings:

stream.feed("Hello, this is a sentence.")

Or you can feed generators and character iterators for real-time streaming:

def write(prompt: str):
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content" : prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
            yield text_chunk

text_stream = write("A three-sentence relaxing speech.")

stream.feed(text_stream)
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)

Playback

Asynchronously:

stream.play_async()
while stream.is_playing():
    time.sleep(0.1)

Synchronously:

stream.play()

Testing the Library

The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.

  • simple_test.py

    • Description: A "hello world" styled demonstration of the library's simplest usage.
  • complex_test.py

    • Description: A comprehensive demonstration showcasing most of the features provided by the library.
  • translator_cli.py

    • Dependencies: Run pip install openai realtimestt.
    • Description: Real-time translations into six different languages using this test.
  • advanced_talk.py

    • Dependencies: Run pip install openai keyboard realtimestt.
    • Description: Engage in a conversation with an AI. You can choose the TTS engine and voice before starting the conversation.
  • ai_talk_10_lines.py

    • Dependencies: Run pip install openai realtimestt.
    • Description: The world's most concise AI talk program with just 10 lines of code.
  • simple_llm_test.py

    • Dependencies: Run pip install openai.
    • Description: Demonstrates how to integrate the library with large language models (LLMs).
  • simple_talk.py

    • Dependencies: Run pip install openai keyboard realtimestt.
    • Description: Get introduced to a basic voice-based AI companion talkbot.

Pause, Resume & Stop

Pause the audio stream:

stream.pause()

Resume a paused stream:

stream.resume()

Stop the stream immediately:

stream.stop()

Requirements Explained

  • Python 3.6+

  • requests (>=2.31.0): to send HTTP requests for API calls and voice list retrieval

  • PyAudio (>=0.2.13): to create an output audio stream

  • stream2sentence (>=0.1.1): to split the incoming text stream into sentences

  • pyttsx3 (>=2.90): System text-to-speech conversion engine

  • azure-cognitiveservices-speech (>=1.31.0): Azure text-to-speech conversion engine

  • elevenlabs (>=0.2.24): Elevenlabs text-to-speech conversion engine

play and play_async Methods

Handle synthesis of text to audio and play the audio stream. play waits until it is finished playing.

  • fast_sentence_fragment (bool):

    • Default: False
    • Determines if sentence fragments should be quickly yielded. Useful when a faster response is desired even if a sentence isn't complete.
  • buffer_threshold_seconds (float):

    • Default: 2.0
    • Time in seconds to determine the buffering threshold. Helps to decide when to generate more audio based on buffered content.
    • Hint: If you experience silence or breaks between sentences, consider raising this value to ensure smoother playback.
  • minimum_sentence_length (int):

    • Default: 3
    • Minimum characters required to treat content as a sentence.
  • log_characters (bool):

    • Default: False
    • If True, logs the characters processed for synthesis.
  • log_synthesized_text (bool):

    • Default: False
    • If True, logs the synthesized text chunks.

Contribution

Contributions are always welcome (e.g. PR to add a new engine).

License

MIT

Author

Kolja Beigel
Email: kolja.beigel@web.de
GitHub


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RealTimeTTS-0.1.1.tar.gz (16.3 kB view hashes)

Uploaded Source

Built Distribution

RealTimeTTS-0.1.1-py3-none-any.whl (22.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page