Skip to main content

Streamlit component for real-time audio conversation with OpenAI using WebRTC

Project description

Streamlit Real-time Audio Component

A Streamlit custom component that enables real-time voice conversations with OpenAI's GPT using WebRTC for low-latency audio streaming.

Features

  • 🎤 Real-time Audio: Low-latency audio streaming using WebRTC
  • 🤖 OpenAI Integration: Direct integration with OpenAI's Real-time API
  • 💬 Live Transcription: Real-time transcription of both user and AI speech
  • ⏸️ Conversation Control: Start, pause, resume, and stop conversations
  • 🎛️ Configurable: Customizable voice, instructions, and AI parameters
  • 📝 Full Transcript: Complete conversation history with timestamps

Installation

Prerequisites

  • Python 3.9+
  • Node.js and npm (for development)
  • OpenAI API key

From Source

  1. Clone or download this component

  2. Install Python dependencies:

    cd realtime_audio
    pip install -e .
    
  3. Install and build frontend dependencies:

    cd frontend
    npm install
    npm run build
    

Usage

Basic Example

import streamlit as st
from realtime_audio import realtime_audio_conversation

st.title("AI Voice Assistant")

# Get API key from user
api_key = st.text_input("OpenAI API Key", type="password")

if api_key:
    # Create the real-time audio conversation
    result = realtime_audio_conversation(
        api_key=api_key,
        instructions="You are a helpful AI assistant. Keep responses concise.",
        voice="alloy",
        temperature=0.8
    )
    
    # Display conversation status
    st.write(f"Status: {result['status']}")
    
    # Show any errors
    if result['error']:
        st.error(result['error'])
    
    # Display transcript
    for message in result['transcript']:
        if message['type'] == 'user':
            st.chat_message("user").write(message['content'])
        else:
            st.chat_message("assistant").write(message['content'])

Advanced Example

import streamlit as st
from realtime_audio import realtime_audio_conversation

st.set_page_config(page_title="Advanced AI Assistant", layout="wide")

col1, col2 = st.columns([3, 1])

with col2:
    st.header("Settings")
    api_key = st.text_input("API Key", type="password")
    voice = st.selectbox("Voice", ["alloy", "echo", "fable", "onyx", "nova", "shimmer"])
    temperature = st.slider("Temperature", 0.0, 2.0, 0.8)
    instructions = st.text_area(
        "Instructions", 
        "You are a helpful AI assistant.",
        height=100
    )

with col1:
    st.header("Conversation")
    
    if api_key:
        conversation = realtime_audio_conversation(
            api_key=api_key,
            voice=voice,
            instructions=instructions,
            temperature=temperature,
            turn_detection_threshold=0.5,
            key="advanced_conversation"
        )
        
        # Handle the conversation result
        if conversation['error']:
            st.error(conversation['error'])
        
        # Display metrics
        col_a, col_b, col_c = st.columns(3)
        with col_a:
            st.metric("Status", conversation['status'])
        with col_b:
            st.metric("Messages", len(conversation['transcript']))
        with col_c:
            recording = "🔴" if conversation['is_recording'] else "⚪"
            st.metric("Recording", recording)

API Reference

realtime_audio_conversation()

Creates a real-time audio conversation component.

Parameters

  • api_key (str): OpenAI API key for authentication
  • voice (str, default="alloy"): Voice for TTS. Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer"
  • instructions (str, default="You are a helpful AI assistant..."): System instructions for the AI
  • prompt (str, default=""): Initial prompt to start the conversation
  • auto_start (bool, default=False): Whether to automatically start the conversation
  • temperature (float, default=0.8): AI response randomness (0.0-2.0)
  • turn_detection_threshold (float, default=0.5): Voice activity detection sensitivity (0.0-1.0)
  • key (str, optional): Unique component key

Returns

Dictionary with the following structure:

{
    "transcript": [
        {
            "id": "unique_message_id",
            "type": "user" | "assistant", 
            "content": "Message content",
            "timestamp": 1640995200000,
            "status": "completed" | "in_progress"
        }
    ],
    "status": "idle" | "connecting" | "connected" | "recording" | "speaking" | "error",
    "error": "Error message" | None,
    "session_id": "unique_session_id" | None,
    "connection_state": "new" | "connecting" | "connected" | "disconnected" | "failed" | "closed",
    "is_recording": bool,
    "is_paused": bool
}

Development

Setting up Development Environment

  1. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  2. Install development dependencies:

    pip install streamlit
    pip install -e .
    
  3. Set up frontend development:

    cd frontend
    npm install
    npm run start  # Starts development server on port 3001
    
  4. Run the example:

    streamlit run example.py
    

Project Structure

realtime_audio/
├── __init__.py              # Python API
├── example.py               # Usage example
├── setup.py                 # Package setup
├── README.md               # Documentation
└── frontend/
    ├── package.json
    ├── tsconfig.json
    ├── vite.config.ts
    └── src/
        ├── index.tsx        # Entry point
        ├── RealtimeAudio.tsx # Main component
        ├── types/           # TypeScript definitions
        └── utils/           # Utility functions

Browser Requirements

  • HTTPS required: WebRTC requires a secure connection
  • Microphone access: Component will request microphone permissions
  • Supported browsers: Chrome, Firefox, Safari, Edge (latest versions)

Troubleshooting

Common Issues

Microphone Access Denied

  • Ensure you're using HTTPS
  • Check browser permissions
  • Try refreshing the page

Connection Failed

  • Verify your OpenAI API key
  • Check internet connection
  • Ensure firewall isn't blocking WebRTC

No Audio Playback

  • Check system audio settings
  • Try different browsers
  • Verify speakers/headphones are working

Debug Mode

Enable debug logging by opening browser developer tools (F12) and checking the console for detailed connection information.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review browser console for errors
  3. Create an issue with detailed error information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamlit_realtime_audio-0.0.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamlit_realtime_audio-0.0.1-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file streamlit_realtime_audio-0.0.1.tar.gz.

File metadata

  • Download URL: streamlit_realtime_audio-0.0.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for streamlit_realtime_audio-0.0.1.tar.gz
Algorithm Hash digest
SHA256 aa7239401e7b983715b7dadb41d16470ebab8fee3187ef8a9422c7628ee0b4e5
MD5 92729400d7ed5aa40702fb1703617dc2
BLAKE2b-256 c95b947472de5e00321297e5ca2c934a96b40ac626ea88ed32ba15715dfb5a63

See more details on using hashes here.

File details

Details for the file streamlit_realtime_audio-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for streamlit_realtime_audio-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f92738126cdfdf3b46ccdd1824b07ece043a3f809f427f6836a25849b05300a
MD5 bb62981a0b8ab8e63b7bf5eb6f277c85
BLAKE2b-256 3303a628b15dbd40a2891d1b2a666e105eb9e837329e3e6a28a7e9385c36ea9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page