Skip to main content

Output audio with playlist.

Project description

output-audio

A Python library for streaming audio output with playlist support, featuring real-time text-to-speech (TTS) capabilities using OpenAI's API.

Features

  • Real-time streaming audio: Stream audio directly to output devices with minimal latency
  • Playlist support: Queue multiple audio items for seamless playback
  • Dynamic playlist management: Add audio items to playlists during playback
  • OpenAI TTS integration: Convert text to speech using OpenAI's TTS models
  • Seamless transitions: Automatic padding between audio segments for smooth playback
  • Multi-language support: Works with English, Mandarin, Japanese, and other languages
  • Low-latency buffering: Pre-buffering system for smooth playback experience

Installation

Basic Installation

pip install output-audio

With OpenAI TTS Support

pip install output-audio[all]

Requirements

  • Python 3.11+
  • Audio output device (speakers/headphones)
  • OpenAI API key (for TTS functionality)

Quick Start

Basic TTS Example

from output_audio import OpenAITTSAudioItem, output_audio

# Create audio items
audio_items = [
    OpenAITTSAudioItem(content="Hello, this is the first segment."),
    OpenAITTSAudioItem(content="And this is the second segment."),
]

# Play audio
output_audio(audio_items)

Dynamic Playlist Example

import time
import threading
from output_audio import Playlist, OpenAITTSAudioItem, output_playlist_audio

# Create empty playlist
playlist = Playlist()
stop_event = threading.Event()

# Start playback in background
playback_thread = threading.Thread(
    target=output_playlist_audio,
    args=(playlist,),
    kwargs={"playback_stop_event": stop_event}
)
playback_thread.start()

# Add items dynamically
playlist.add_item(OpenAITTSAudioItem(content="First dynamic item"))
time.sleep(2)
playlist.add_item(OpenAITTSAudioItem(content="Second dynamic item"))

# Stop playback
time.sleep(5)
stop_event.set()
playback_thread.join()

Configuration

Audio Configuration

The library uses the following default audio settings:

  • Sample Rate: 24,000 Hz (matches OpenAI PCM output)
  • Channels: 1 (Mono)
  • Format: 16-bit PCM
  • Buffer Size: 1024 frames
  • Pre-buffer Duration: 0.2 seconds

TTS Configuration

Customize OpenAI TTS settings:

from output_audio import TTSAudioConfig, OpenAITTSAudioItem
import openai

config = TTSAudioConfig(
    model="gpt-4o-mini-tts",  # or "tts-1"
    voice="nova",             # alloy, echo, fable, onyx, nova, shimmer
    speed=1.0,                # 0.25 to 4.0
    openai_client=openai.OpenAI(api_key="your-api-key")
)

audio_item = OpenAITTSAudioItem(
    content="Hello world!",
    audio_config=config
)

API Reference

Core Classes

AudioItem

Base class for audio items.

OpenAITTSAudioItem

Audio item that generates speech from text using OpenAI's TTS API.

Parameters:

  • content (str): Text to convert to speech
  • audio_config (TTSAudioConfig, optional): TTS configuration

Playlist

Container for managing multiple audio items with dynamic insertion support.

Methods:

  • add_item(audio_item): Add an audio item to the playlist
  • play(playback_queue): Start playlist playback

TTSAudioConfig

Configuration for OpenAI TTS settings.

Parameters:

  • model (str): TTS model ("gpt-4o-mini-tts" or "tts-1")
  • voice (str): Voice selection
  • speed (float): Playback speed (0.25-4.0)
  • instructions (str): Voice instructions
  • openai_client: OpenAI client instance

Functions

output_audio(audio_items)

Play a sequence of audio items.

Parameters:

  • audio_items: List of AudioItem instances

output_playlist_audio(playlist, playback_stop_event=None)

Play a playlist with dynamic item insertion support.

Parameters:

  • playlist: Playlist instance
  • playback_stop_event: Threading event to stop playback

Examples

See scripts/demo.py for comprehensive examples including:

  • English TTS demo
  • Mandarin TTS demo
  • Dynamic playlist management

Run the demo:

python scripts/demo.py

Environment Setup

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

Or create a .env file:

OPENAI_API_KEY=your-api-key-here

Dependencies

  • numpy: Numerical operations for audio data
  • sounddevice: Audio device interface
  • pydantic: Data validation and settings
  • openai: OpenAI API client (optional)

License

MIT License - see LICENSE file for details.

Author

Allen Chou (f1470891079@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

output_audio-0.2.2.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

output_audio-0.2.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file output_audio-0.2.2.tar.gz.

File metadata

  • Download URL: output_audio-0.2.2.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0

File hashes

Hashes for output_audio-0.2.2.tar.gz
Algorithm Hash digest
SHA256 3989d44c33bb32fbcd27fb9581e0481c2d07ad28aef154300ebddec54b332daa
MD5 62e0ebfb15952c6f1a342cf98e8e0e4d
BLAKE2b-256 84c40118d4f037f5d9a1ab94740277fa55c0b09637b8ed47b54bbd09c5a16f49

See more details on using hashes here.

File details

Details for the file output_audio-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: output_audio-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.11 Darwin/24.5.0

File hashes

Hashes for output_audio-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 68f049d38649872f1ffdefa30c87a952d3283d9683e2c388c2bd202f8ae97c2c
MD5 dac5b4f577b997e80971de184be5573c
BLAKE2b-256 a671c1d06f2187cdb9692856c387d0cc54ad847dd011a890c26cb65bb5313158

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page