Skip to main content

Output audio with playlist.

Project description

output-audio

A Python library for streaming audio output with playlist support, featuring real-time text-to-speech (TTS) capabilities using OpenAI's API.

Features

  • Real-time streaming audio: Stream audio directly to output devices with minimal latency
  • Playlist support: Queue multiple audio items for seamless playback
  • Dynamic playlist management: Add audio items to playlists during playback
  • OpenAI TTS integration: Convert text to speech using OpenAI's TTS models
  • Seamless transitions: Automatic padding between audio segments for smooth playback
  • Multi-language support: Works with English, Mandarin, Japanese, and other languages
  • Low-latency buffering: Pre-buffering system for smooth playback experience

Installation

Basic Installation

pip install output-audio

With OpenAI TTS Support

pip install output-audio[all]

Requirements

  • Python 3.11+
  • Audio output device (speakers/headphones)
  • OpenAI API key (for TTS functionality)

Quick Start

Basic TTS Example

from output_audio import OpenAITTSAudioItem, output_audio

# Create audio items
audio_items = [
    OpenAITTSAudioItem(content="Hello, this is the first segment."),
    OpenAITTSAudioItem(content="And this is the second segment."),
]

# Play audio
output_audio(audio_items)

Dynamic Playlist Example

import time
import threading
from output_audio import Playlist, OpenAITTSAudioItem, output_playlist_audio

# Create empty playlist
playlist = Playlist()
stop_event = threading.Event()

# Start playback in background
playback_thread = threading.Thread(
    target=output_playlist_audio,
    args=(playlist,),
    kwargs={"playback_stop_event": stop_event}
)
playback_thread.start()

# Add items dynamically
playlist.add_item(OpenAITTSAudioItem(content="First dynamic item"))
time.sleep(2)
playlist.add_item(OpenAITTSAudioItem(content="Second dynamic item"))

# Stop playback
time.sleep(5)
stop_event.set()
playback_thread.join()

Configuration

Audio Configuration

The library uses the following default audio settings:

  • Sample Rate: 24,000 Hz (matches OpenAI PCM output)
  • Channels: 1 (Mono)
  • Format: 16-bit PCM
  • Buffer Size: 1024 frames
  • Pre-buffer Duration: 0.2 seconds

TTS Configuration

Customize OpenAI TTS settings:

from output_audio import TTSAudioConfig, OpenAITTSAudioItem
import openai

config = TTSAudioConfig(
    model="gpt-4o-mini-tts",  # or "tts-1"
    voice="nova",             # alloy, echo, fable, onyx, nova, shimmer
    speed=1.0,                # 0.25 to 4.0
    openai_client=openai.OpenAI(api_key="your-api-key")
)

audio_item = OpenAITTSAudioItem(
    content="Hello world!",
    audio_config=config
)

API Reference

Core Classes

AudioItem

Base class for audio items.

OpenAITTSAudioItem

Audio item that generates speech from text using OpenAI's TTS API.

Parameters:

  • content (str): Text to convert to speech
  • audio_config (TTSAudioConfig, optional): TTS configuration

Playlist

Container for managing multiple audio items with dynamic insertion support.

Methods:

  • add_item(audio_item): Add an audio item to the playlist
  • play(playback_queue): Start playlist playback

TTSAudioConfig

Configuration for OpenAI TTS settings.

Parameters:

  • model (str): TTS model ("gpt-4o-mini-tts" or "tts-1")
  • voice (str): Voice selection
  • speed (float): Playback speed (0.25-4.0)
  • instructions (str): Voice instructions
  • openai_client: OpenAI client instance

Functions

output_audio(audio_items)

Play a sequence of audio items.

Parameters:

  • audio_items: List of AudioItem instances

output_playlist_audio(playlist, playback_stop_event=None)

Play a playlist with dynamic item insertion support.

Parameters:

  • playlist: Playlist instance
  • playback_stop_event: Threading event to stop playback

Examples

See scripts/demo.py for comprehensive examples including:

  • English TTS demo
  • Mandarin TTS demo
  • Dynamic playlist management

Run the demo:

python scripts/demo.py

Environment Setup

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

Or create a .env file:

OPENAI_API_KEY=your-api-key-here

Dependencies

  • numpy: Numerical operations for audio data
  • sounddevice: Audio device interface
  • pydantic: Data validation and settings
  • openai: OpenAI API client (optional)

License

MIT License - see LICENSE file for details.

Author

Allen Chou (f1470891079@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

output_audio-0.3.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

output_audio-0.3.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file output_audio-0.3.0.tar.gz.

File metadata

  • Download URL: output_audio-0.3.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0

File hashes

Hashes for output_audio-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e25e1a7a96d21fbcfcecf45492c299cc9b40a295882ea424b22c096593fecc34
MD5 eecc0d4e048c4f1eb4fbd11e3f93c6c8
BLAKE2b-256 0c3b5f77e181f53d7e511592de4a45df105de5553ce5bd31cf87ad40826d3b35

See more details on using hashes here.

File details

Details for the file output_audio-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: output_audio-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0

File hashes

Hashes for output_audio-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c733d4a9feb6ec7d099a48e39e106cb37e77c55a3e59ce591945173c789068c
MD5 d4318f5a300249392d784e5c29a20cf6
BLAKE2b-256 5f11d0cba8f0628a4b67dbfa4fe5b8ff3ac6718b2d5c2684c27e8c8a50c28666

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page