Output audio with playlist.
Project description
output-audio
A Python library for streaming audio output with playlist support, featuring real-time text-to-speech (TTS) capabilities using OpenAI's API.
Features
- Real-time streaming audio: Stream audio directly to output devices with minimal latency
- Playlist support: Queue multiple audio items for seamless playback
- Dynamic playlist management: Add audio items to playlists during playback
- OpenAI TTS integration: Convert text to speech using OpenAI's TTS models
- Seamless transitions: Automatic padding between audio segments for smooth playback
- Multi-language support: Works with English, Mandarin, Japanese, and other languages
- Low-latency buffering: Pre-buffering system for smooth playback experience
Installation
Basic Installation
pip install output-audio
With OpenAI TTS Support
pip install output-audio[all]
Requirements
- Python 3.11+
- Audio output device (speakers/headphones)
- OpenAI API key (for TTS functionality)
Quick Start
Basic TTS Example
from output_audio import OpenAITTSAudioItem, output_audio
# Create audio items
audio_items = [
OpenAITTSAudioItem(content="Hello, this is the first segment."),
OpenAITTSAudioItem(content="And this is the second segment."),
]
# Play audio
output_audio(audio_items)
Dynamic Playlist Example
import time
import threading
from output_audio import Playlist, OpenAITTSAudioItem, output_playlist_audio
# Create empty playlist
playlist = Playlist()
stop_event = threading.Event()
# Start playback in background
playback_thread = threading.Thread(
target=output_playlist_audio,
args=(playlist,),
kwargs={"playback_stop_event": stop_event}
)
playback_thread.start()
# Add items dynamically
playlist.add_item(OpenAITTSAudioItem(content="First dynamic item"))
time.sleep(2)
playlist.add_item(OpenAITTSAudioItem(content="Second dynamic item"))
# Stop playback
time.sleep(5)
stop_event.set()
playback_thread.join()
Configuration
Audio Configuration
The library uses the following default audio settings:
- Sample Rate: 24,000 Hz (matches OpenAI PCM output)
- Channels: 1 (Mono)
- Format: 16-bit PCM
- Buffer Size: 1024 frames
- Pre-buffer Duration: 0.2 seconds
TTS Configuration
Customize OpenAI TTS settings:
from output_audio import TTSAudioConfig, OpenAITTSAudioItem
import openai
config = TTSAudioConfig(
model="gpt-4o-mini-tts", # or "tts-1"
voice="nova", # alloy, echo, fable, onyx, nova, shimmer
speed=1.0, # 0.25 to 4.0
openai_client=openai.OpenAI(api_key="your-api-key")
)
audio_item = OpenAITTSAudioItem(
content="Hello world!",
audio_config=config
)
API Reference
Core Classes
AudioItem
Base class for audio items.
OpenAITTSAudioItem
Audio item that generates speech from text using OpenAI's TTS API.
Parameters:
content(str): Text to convert to speechaudio_config(TTSAudioConfig, optional): TTS configuration
Playlist
Container for managing multiple audio items with dynamic insertion support.
Methods:
add_item(audio_item): Add an audio item to the playlistplay(playback_queue): Start playlist playback
TTSAudioConfig
Configuration for OpenAI TTS settings.
Parameters:
model(str): TTS model ("gpt-4o-mini-tts" or "tts-1")voice(str): Voice selectionspeed(float): Playback speed (0.25-4.0)instructions(str): Voice instructionsopenai_client: OpenAI client instance
Functions
output_audio(audio_items)
Play a sequence of audio items.
Parameters:
audio_items: List of AudioItem instances
output_playlist_audio(playlist, playback_stop_event=None)
Play a playlist with dynamic item insertion support.
Parameters:
playlist: Playlist instanceplayback_stop_event: Threading event to stop playback
Examples
See scripts/demo.py for comprehensive examples including:
- English TTS demo
- Mandarin TTS demo
- Dynamic playlist management
Run the demo:
python scripts/demo.py
Environment Setup
Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
Or create a .env file:
OPENAI_API_KEY=your-api-key-here
Dependencies
numpy: Numerical operations for audio datasounddevice: Audio device interfacepydantic: Data validation and settingsopenai: OpenAI API client (optional)
License
MIT License - see LICENSE file for details.
Author
Allen Chou (f1470891079@gmail.com)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file output_audio-0.3.0.tar.gz.
File metadata
- Download URL: output_audio-0.3.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e25e1a7a96d21fbcfcecf45492c299cc9b40a295882ea424b22c096593fecc34
|
|
| MD5 |
eecc0d4e048c4f1eb4fbd11e3f93c6c8
|
|
| BLAKE2b-256 |
0c3b5f77e181f53d7e511592de4a45df105de5553ce5bd31cf87ad40826d3b35
|
File details
Details for the file output_audio-0.3.0-py3-none-any.whl.
File metadata
- Download URL: output_audio-0.3.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c733d4a9feb6ec7d099a48e39e106cb37e77c55a3e59ce591945173c789068c
|
|
| MD5 |
d4318f5a300249392d784e5c29a20cf6
|
|
| BLAKE2b-256 |
5f11d0cba8f0628a4b67dbfa4fe5b8ff3ac6718b2d5c2684c27e8c8a50c28666
|