A low latency text to speech streaming tool that efficiently converts streamed input text into audio. Ideal for applications requiring instant and dynamic audio feedback.
Reason this release was yanked:
New interface
Project description
RealtimeTTS
Stream text into audio with an easy-to-use, highly configurable library delivering voice output with minimal latency.
About the project
Quickly transform input streams into immediate auditory output by efficiently detecting sentence fragments.
Ideal for applications requiring on-the-spot audio feedback.
Hint: Looking for a way to convert voice audio input into text? Check out RealtimeSTT, the perfect input counterpart for this library. Together, they form a powerful realtime audio wrapper around large language model outputs.
Features
- Real-time Streaming: Stream text as you generate or input it, without waiting for the entire content.
- Dynamic Feedback: Ideal for applications and scenarios where immediate audio response is pivotal.
- Modular Engine Design: Supports custom TTS engines with system tts, azure and elevenlabs engines provided to get you started.
- Character-by-character Processing: Allows for true real-time feedback as characters are read and synthesized in a stream.
- Sentence Segmentation: Efficiently detects sentence boundaries and synthesizes content for natural sounding output.
Installation
pip install RealtimeTTS
Quick Start
Here's a basic usage example:
from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()
Feed Text
You can feed individual strings:
stream.feed("Hello, this is a sentence.")
Or you can feed generators and character iterators for real-time streaming:
def write(prompt: str):
for chunk in openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content" : prompt}],
stream=True
):
if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
yield text_chunk
text_stream = write("A three-sentence relaxing speech.")
stream.feed(text_stream)
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)
Playback
Asynchronously:
stream.play_async()
while stream.is_playing():
time.sleep(0.1)
Synchronously:
stream.play()
Testing the Library
The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.
-
simple_test.py
- Description: A "hello world" styled demonstration of the library's simplest usage.
-
complex_test.py
- Description: A comprehensive demonstration showcasing most of the features provided by the library.
-
translator_cli.py
- Dependencies: Run
pip install openai realtimestt
. - Description: Real-time translations into six different languages using this test.
- Dependencies: Run
-
advanced_talk.py
- Dependencies: Run
pip install openai keyboard realtimestt
. - Description: Engage in a conversation with an AI. You can choose the TTS engine and voice before starting the conversation.
- Dependencies: Run
-
ai_talk_10_lines.py
- Dependencies: Run
pip install openai realtimestt
. - Description: The world's most concise AI talk program with just 10 lines of code.
- Dependencies: Run
-
simple_llm_test.py
- Dependencies: Run
pip install openai
. - Description: Demonstrates how to integrate the library with large language models (LLMs).
- Dependencies: Run
-
simple_talk.py
- Dependencies: Run
pip install openai keyboard realtimestt
. - Description: Get introduced to a basic voice-based AI companion talkbot.
- Dependencies: Run
Pause, Resume & Stop
Pause the audio stream:
stream.pause()
Resume a paused stream:
stream.resume()
Stop the stream immediately:
stream.stop()
Requirements Explained
-
Python 3.6+
-
requests (>=2.31.0): to send HTTP requests for API calls and voice list retrieval
-
PyAudio (>=0.2.13): to create an output audio stream
-
stream2sentence (>=0.1.1): to split the incoming text stream into sentences
-
pyttsx3 (>=2.90): System text-to-speech conversion engine
-
azure-cognitiveservices-speech (>=1.31.0): Azure text-to-speech conversion engine
-
elevenlabs (>=0.2.24): Elevenlabs text-to-speech conversion engine
play
and play_async
Methods
Handle synthesis of text to audio and play the audio stream. play
waits until it is finished playing.
-
fast_sentence_fragment
(bool):- Default:
False
- Determines if sentence fragments should be quickly yielded. Useful when a faster response is desired even if a sentence isn't complete.
- Default:
-
buffer_threshold_seconds
(float):- Default:
2.0
- Time in seconds to determine the buffering threshold. Helps to decide when to generate more audio based on buffered content.
- Hint: If you experience silence or breaks between sentences, consider raising this value to ensure smoother playback.
- Default:
-
minimum_sentence_length
(int):- Default:
3
- Minimum characters required to treat content as a sentence.
- Default:
-
log_characters
(bool):- Default:
False
- If
True
, logs the characters processed for synthesis.
- Default:
-
log_synthesized_text
(bool):- Default:
False
- If
True
, logs the synthesized text chunks.
- Default:
Contribution
Contributions are always welcome (e.g. PR to add a new engine).
License
MIT
Author
Kolja Beigel
Email: kolja.beigel@web.de
GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for RealTimeTTS-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fe15518fca19376af8b838fa50f63e3313187a10506319d2087def6ba31fdba |
|
MD5 | 4d73d4fc3e9beaaf317e73edf4115498 |
|
BLAKE2b-256 | a093bc38b5c386ba7e0045d6476c5d0b074c74f68d32bd86e14918dae1a7481b |