Skip to main content

A library for real-time text to speech processing using OpenAI API.

Project description

OpenAI VoiceStream

OpenAI VoiceStream is a Python library that provides real-time text-to-speech functionality using the OpenAI API. It allows you to process text and token streams and generate audio output on-the-fly, making it suitable for integration with language models that generate responses in segments.

Features

  • Real-time text-to-speech conversion
  • Support for processing text and token streams
  • Multiple voice options (alloy, echo, fable, onyx, nova, shimmer)
  • Thread-safe processing for smooth operation
  • Audio fading to avoid abrupt stops

Installation

You can install OpenAI VoiceStream using pip:

pip install openai-voicestream

Usage

Prerequisites

Before using OpenAI VoiceStream, make sure you have an OpenAI API key. Set the API key as an environment variable:

export OPENAI_API_KEY='your_openai_api_key'

Example Code

Here's 2 examples of how to use OpenAI VoiceStream passing in text and for a token stream:

import os
import time
from openai_voicestream import VoiceProcessor

# Retrieve your OpenAI API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Initialize the VoiceProcessor with the API key and desired voice
processor = VoiceProcessor(api_key, voice="nova", full_quality=True)  # Using the "nova" voice

# Example text with paragraphs to be processed
text = """This is an example using the nova voice.

The nova voice provides a different tone and style compared to the default voice.

You can experiment with different voices to find the one that suits your needs."""

# Add the text to the processing queue
processor.add_text_to_queue(text)

# Wait for all processing to complete before exiting
processor.wait_for_completion()
import os
import time
from openai_voicestream import VoiceProcessor

# Retrieve your OpenAI API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Initialize the VoiceProcessor with the API key and desired voice
processor = VoiceProcessor(api_key, voice="shimmer")  # Using the "shimmer" voice

# Example tokens being added to the processing queue
tokens = [
    "This is an example of processing a stream of tokens.",
    " The tokens are gradually added to the processor,",
    " simulating a real-time scenario where text is generated incrementally.",
    "\n\nThe processor will handle the tokens and generate audio on-the-fly,",
    " providing a seamless text-to-speech experience."
]

# Process the tokens in a streaming manner (this can be replaced with an tokenstream)
for token in tokens:
    processor.add_token(token)

# Finalize any remaining tokens in the buffer
processor.finalize_tokens()

# Wait for all processing to complete before exiting
processor.wait_for_completion()

API Reference

VoiceProcessor

The main class for processing text and generating audio.

__init__(self, api_key, voice='alloy')

Initializes the VoiceProcessor with the provided API key and voice.

  • api_key (str): The API key for accessing the OpenAI API.
  • voice (str or int): The voice to use for text-to-speech. Can be specified by name or index.

add_text_to_queue(self, text)

Adds text to the processing queue.

  • text (str): The text to add to the queue.

add_token(self, token)

Adds a token to the buffer and processes it if needed.

  • token (str): The token to add to the buffer.

finalize_tokens(self)

Finalizes any remaining tokens in the buffer.

wait_for_completion(self)

Waits for all sentences to be processed.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

Acknowledgements

OpenAI VoiceStream is built using the OpenAI API and relies on the following libraries:

Troubleshooting

If you encounter any issues while using OpenAI VoiceStream, here are a few things you can try:

  • Make sure you have set the OPENAI_API_KEY environment variable correctly with your OpenAI API key.
  • Check that you have a stable internet connection to communicate with the OpenAI API.
  • If you encounter any errors or exceptions, please check the error message and consult the documentation or seek support.

FAQ

Can I use OpenAI VoiceStream for commercial purposes?

Yes, you can use OpenAI VoiceStream for commercial purposes, subject to the terms and conditions of the OpenAI API usage. Make sure to review and comply with OpenAI's usage policies.

How can I customize the voice output?

OpenAI VoiceStream provides multiple voice options that you can choose from. You can specify the desired voice by passing the voice name or index to the VoiceProcessor constructor. Available voices include: alloy, echo, fable, onyx, nova, and shimmer. You can set the quality to full_quality=True in the VoiceProcessor for hd voice generation (this might cause a small delay but should be unnoticeable)

Can I control the speed or pitch of the generated audio?

Currently, OpenAI VoiceStream does not provide direct control over the speed or pitch of the generated audio. The audio is generated based on the selected voice and the input text. If you require more advanced audio customization, you may need to explore other text-to-speech libraries or APIs.

Is there a limit on the amount of text I can process?

The amount of text you can process depends on the limitations of the OpenAI API. OpenAI VoiceStream processes text in chunks, so it can handle larger text inputs by breaking them down into smaller segments. However, keep in mind that processing large amounts of text may result in longer processing times and higher API usage.

Support

If you have any questions, issues, or feature requests, please open an issue on the GitHub repository. I appreciate your feedback and will do our best to assist you.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_voicestream-0.1.2.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

openai_voicestream-0.1.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file openai_voicestream-0.1.2.tar.gz.

File metadata

  • Download URL: openai_voicestream-0.1.2.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for openai_voicestream-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6c9534163d6fc4bbe372d6808503807d3a9beb75cb792cc1e7225e0eee5e7b08
MD5 a5308dc358a6a6f7368b7fd50f4f8179
BLAKE2b-256 1f6b2fe9be8d97bf5ca3ad79e07a061b11d580d8122eef8736b27151d6734f42

See more details on using hashes here.

File details

Details for the file openai_voicestream-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for openai_voicestream-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2d4305ea0150525c3bcec23715388598bb81aa68da5df3b0bbf02c4b39a85fd
MD5 533e69092911e8e741788fbaa387b74a
BLAKE2b-256 2ae15f88f752eceb0349c2ecd213aef298faefb4b70fedfdf5224ea8e472df4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page