Skip to main content

A simple voice chat interface using configurable LLM, STT, and TTS providers.

Project description

Simple Voice Chat

This project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.

Screenshot

Acknowledgement: This project heavily relies on the fantastic fastrtc library, which simplifies real-time audio streaming over WebRTC and provided crucial examples for setting up the various supported backends, making this application possible.

Motivation

This project aims to provide a versatile and cost-effective voice chat interface. While initially driven by the desire for alternatives to OpenAI's real-time voice API, it has evolved to offer multiple backend options, including direct integration with OpenAI's real-time services. This allows users to choose the best STT, LLM, and TTS combination for their needs, whether prioritizing cost, performance, self-hosting, or specific provider features.

Features

  • 🚀 Multiple Backends: The application supports three primary backend types for voice processing:
    • Classic Backend: This is the most flexible option, offering a modular approach where you connect separate services for:
      • 🗣️ STT (Speech-to-Text): Supports API-based services like OpenAI Whisper or self-hosted engines such as Speaches (which utilizes Faster Whisper).
      • 🧠 LLM (Large Language Model): Integrates with LiteLLM, providing access to a vast array of models including OpenAI, Anthropic, Google, Mistral, Cohere, Azure, and local models run via services like Ollama, LiteLLM proxy, vLLM, and more.
      • 🔊 TTS (Text-to-Speech): Supports API-based services like OpenAI TTS or alternatives such as Kokoro-FastAPI (which can use KokoroTTS).
      • This backend allows for a fully local setup if desired, using local STT, LLM (e.g., via Ollama), and TTS engines.
    • OpenAI Backend: Utilizes OpenAI's real-time voice API for a streamlined, all-in-one voice interaction experience, requiring an OpenAI API key.
    • Gemini Backend: Leverages Google's Gemini Live Connect API for real-time voice interactions, requiring a Google Gemini API key.
  • ⚙️ Highly Configurable: Adjust backend type, STT/LLM/TTS hosts, ports, models, API keys, STT confidence thresholds (classic backend), TTS voice/speed (classic backend), system messages, and more via CLI arguments or .env file.
  • 🌐 Web Interface: Simple and responsive UI built with HTML, CSS, and JavaScript.
  • 📊 Cost Tracking:
    • Classic Backend: Real-time cost estimation for OpenAI LLM and TTS usage.
    • OpenAI Backend: Real-time cost estimation based on token usage for the selected OpenAI real-time model.
  • Real-time Interaction: Low-latency voice communication powered by fastrtc (WebRTC).
  • 👂 STT Confidence Filtering (Classic Backend): Automatically reject low-confidence transcriptions based on configurable thresholds (no speech probability, average log probability, minimum word count).
  • 🎤 Dynamic Settings Adjustment:
    • Classic Backend: Change LLM model, TTS voice, TTS speed, and STT language on-the-fly.
    • OpenAI Backend: Change STT language and output voice (if supported by the model/API) on-the-fly.
  • 🔍 Fuzzy Search: Quickly find models and voices using fuzzy search in the UI dropdowns.
  • 💬 System Message Support: Define a custom system message to guide the LLM's behavior.
  • 📝 Chat History Logging: Automatically saves conversation history to timestamped JSON files.
  • 🔄 TTS Audio Replay (Classic Backend): Replay the audio for any assistant message directly from the chat interface.
  • ⌨️ Keyboard Shortcuts: Control mute (M), clear chat (Ctrl+R), and toggle options (Shift+S) using keyboard shortcuts.
  • 💓 Connection Monitoring: Uses a heartbeat mechanism to detect disconnected clients and potentially shut down the server.
  • 🖥️ Cross-Platform GUI: Runs as a standalone desktop application using pywebview (default) or in a standard web browser (--browser flag). The application explicitly uses the QT backend for pywebview as the GTK backend lacks necessary WebRTC support.

Known Issues

  • ⚠️ Cost Calculation: The cost calculation for the OpenAI real-time API and Gemini API is currently not functional.

Installation

  1. Clone the repository:

    git clone https://github.com/thiswillbeyourgithub/simple_voice_chat
    
    cd simple_voice_chat
    
  2. Install the Python packages:

    uv pip install -e .
    
  3. (Optional) Configure services using environment variables. You can create a .env file based on the available options (see --help or utils/env.py).

Usage

Run the main script using Python:

simple-voice-chat --help

The application will start a web server and attempt to open the interface in a dedicated window (or browser tab if --browser is specified).

You can choose the backend using the --backend option:

  • --backend classic (default): Uses separate STT, LLM, and TTS services.
  • --backend openai: Uses OpenAI's real-time voice API. Requires --openai-api-key.

For a detailed list of all configuration options, please use the --help flag:

simple-voice-chat --help

This will provide the most up-to-date information on available arguments and their corresponding environment variables, including options specific to each backend.


This README was generated with assistance from aider.chat.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_voice_chat-4.0.1.tar.gz (73.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_voice_chat-4.0.1-py3-none-any.whl (74.1 kB view details)

Uploaded Python 3

File details

Details for the file simple_voice_chat-4.0.1.tar.gz.

File metadata

  • Download URL: simple_voice_chat-4.0.1.tar.gz
  • Upload date:
  • Size: 73.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-4.0.1.tar.gz
Algorithm Hash digest
SHA256 c3fdf1e31fa78d5e24a440fe9a21cab1ad92580be38a9f37181ee15602b8b3b6
MD5 2ae70f6d8ff1cd8ad9604d870c17a42f
BLAKE2b-256 bbbfbe5f27cb22ede97abebaadad0fdde0116de178a68f74da920597a5b583ba

See more details on using hashes here.

File details

Details for the file simple_voice_chat-4.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_voice_chat-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9386030fd44fe9a666ae63605dde60035cadbb4e5dd7fbc31298755add4bd7cc
MD5 3058219d37580d55b9cebb89e5b7c884
BLAKE2b-256 1182fb740491e919ed2920fb688770a9c910bee80f39bcb43d90dcb1df3f07ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page