Skip to main content

Streamlit component for real-time audio conversation with OpenAI using WebRTC

Project description

Streamlit Real-time Audio Component

A Streamlit custom component that enables real-time voice conversations with OpenAI's GPT using WebRTC for low-latency audio streaming.

Features

  • 🎤 Real-time Audio: Low-latency audio streaming using WebRTC
  • 🤖 OpenAI Integration: Direct integration with OpenAI's Real-time API
  • 💬 Live Transcription: Real-time transcription of both user and AI speech
  • ⏸️ Conversation Control: Start, pause, resume, and stop conversations
  • 🎛️ Configurable: Customizable voice, instructions, and AI parameters
  • 📝 Full Transcript: Complete conversation history with timestamps

Installation

Prerequisites

  • Python 3.9+
  • Node.js and npm (for development)
  • OpenAI API key

From Source

  1. Clone or download this component

  2. Install Python dependencies:

    cd realtime_audio
    pip install -e .
    
  3. Install and build frontend dependencies:

    cd frontend
    npm install
    npm run build
    

Usage

Basic Example

import streamlit as st
from realtime_audio import realtime_audio_conversation

st.title("AI Voice Assistant")

# Get API key from user
api_key = st.text_input("OpenAI API Key", type="password")

if api_key:
    # Create the real-time audio conversation
    result = realtime_audio_conversation(
        api_key=api_key,
        instructions="You are a helpful AI assistant. Keep responses concise.",
        voice="alloy",
        temperature=0.8
    )
    
    # Display conversation status
    st.write(f"Status: {result['status']}")
    
    # Show any errors
    if result['error']:
        st.error(result['error'])
    
    # Display transcript
    for message in result['transcript']:
        if message['type'] == 'user':
            st.chat_message("user").write(message['content'])
        else:
            st.chat_message("assistant").write(message['content'])

Advanced Example

import streamlit as st
from realtime_audio import realtime_audio_conversation

st.set_page_config(page_title="Advanced AI Assistant", layout="wide")

col1, col2 = st.columns([3, 1])

with col2:
    st.header("Settings")
    api_key = st.text_input("API Key", type="password")
    voice = st.selectbox("Voice", ["alloy", "echo", "fable", "onyx", "nova", "shimmer"])
    temperature = st.slider("Temperature", 0.0, 2.0, 0.8)
    instructions = st.text_area(
        "Instructions", 
        "You are a helpful AI assistant.",
        height=100
    )

with col1:
    st.header("Conversation")
    
    if api_key:
        conversation = realtime_audio_conversation(
            api_key=api_key,
            voice=voice,
            instructions=instructions,
            temperature=temperature,
            turn_detection_threshold=0.5,
            key="advanced_conversation"
        )
        
        # Handle the conversation result
        if conversation['error']:
            st.error(conversation['error'])
        
        # Display metrics
        col_a, col_b, col_c = st.columns(3)
        with col_a:
            st.metric("Status", conversation['status'])
        with col_b:
            st.metric("Messages", len(conversation['transcript']))
        with col_c:
            recording = "🔴" if conversation['is_recording'] else "⚪"
            st.metric("Recording", recording)

API Reference

realtime_audio_conversation()

Creates a real-time audio conversation component.

Parameters

  • api_key (str): OpenAI API key for authentication
  • voice (str, default="alloy"): Voice for TTS. Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer"
  • instructions (str, default="You are a helpful AI assistant..."): System instructions for the AI
  • prompt (str, default=""): Initial prompt to start the conversation
  • auto_start (bool, default=False): Whether to automatically start the conversation
  • temperature (float, default=0.8): AI response randomness (0.0-2.0)
  • turn_detection_threshold (float, default=0.5): Voice activity detection sensitivity (0.0-1.0)
  • key (str, optional): Unique component key

Returns

Dictionary with the following structure:

{
    "transcript": [
        {
            "id": "unique_message_id",
            "type": "user" | "assistant", 
            "content": "Message content",
            "timestamp": 1640995200000,
            "status": "completed" | "in_progress"
        }
    ],
    "status": "idle" | "connecting" | "connected" | "recording" | "speaking" | "error",
    "error": "Error message" | None,
    "session_id": "unique_session_id" | None,
    "connection_state": "new" | "connecting" | "connected" | "disconnected" | "failed" | "closed",
    "is_recording": bool,
    "is_paused": bool
}

Development

Setting up Development Environment

  1. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  2. Install development dependencies:

    pip install streamlit
    pip install -e .
    
  3. Set up frontend development:

    cd frontend
    npm install
    npm run start  # Starts development server on port 3001
    
  4. Run the example:

    streamlit run example.py
    

Project Structure

realtime_audio/
├── __init__.py              # Python API
├── example.py               # Usage example
├── setup.py                 # Package setup
├── README.md               # Documentation
└── frontend/
    ├── package.json
    ├── tsconfig.json
    ├── vite.config.ts
    └── src/
        ├── index.tsx        # Entry point
        ├── RealtimeAudio.tsx # Main component
        ├── types/           # TypeScript definitions
        └── utils/           # Utility functions

Browser Requirements

  • HTTPS required: WebRTC requires a secure connection
  • Microphone access: Component will request microphone permissions
  • Supported browsers: Chrome, Firefox, Safari, Edge (latest versions)

Troubleshooting

Common Issues

Microphone Access Denied

  • Ensure you're using HTTPS
  • Check browser permissions
  • Try refreshing the page

Connection Failed

  • Verify your OpenAI API key
  • Check internet connection
  • Ensure firewall isn't blocking WebRTC

No Audio Playback

  • Check system audio settings
  • Try different browsers
  • Verify speakers/headphones are working

Debug Mode

Enable debug logging by opening browser developer tools (F12) and checking the console for detailed connection information.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review browser console for errors
  3. Create an issue with detailed error information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamlit_realtime_audio-0.0.3.tar.gz (134.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamlit_realtime_audio-0.0.3-py3-none-any.whl (133.6 kB view details)

Uploaded Python 3

File details

Details for the file streamlit_realtime_audio-0.0.3.tar.gz.

File metadata

  • Download URL: streamlit_realtime_audio-0.0.3.tar.gz
  • Upload date:
  • Size: 134.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for streamlit_realtime_audio-0.0.3.tar.gz
Algorithm Hash digest
SHA256 91d0553015664c6593f90e495a7232113fc066a9bb430cdf64157679d691974d
MD5 306c333f2d8933cff36939f0ffdc2fa2
BLAKE2b-256 323a8a1158a3513ab287c16c69b6bcb8ca6590940b37c3af36302ed9f6361b1a

See more details on using hashes here.

File details

Details for the file streamlit_realtime_audio-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for streamlit_realtime_audio-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ce80c58810b6ebe1e36eea8de205d19b9d608aa822291b743141d5706ff06b95
MD5 348ecb342be9dc9a44c46c090dcfeb83
BLAKE2b-256 dcdb907139d72d95bf1736c9d33aeb6096cd4019f76736a9be37b72939c42802

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page