A simple voice chat interface using configurable LLM, STT, and TTS providers.
Project description
Simple Voice Chat
This project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.
Acknowledgement: This project heavily relies on the fantastic fastrtc library, which simplifies real-time audio streaming over WebRTC and provided crucial examples for setting up the various supported backends, making this application possible.
Motivation
This project aims to provide a versatile and cost-effective voice chat interface. While initially driven by the desire for alternatives to OpenAI's real-time voice API, it has evolved to offer multiple backend options, including direct integration with OpenAI's real-time services. This allows users to choose the best STT, LLM, and TTS combination for their needs, whether prioritizing cost, performance, self-hosting, or specific provider features.
Features
- 🚀 Multiple Backends: The application supports three primary backend types for voice processing:
- Classic Backend: This is the most flexible option, offering a modular approach where you connect separate services for:
- 🗣️ STT (Speech-to-Text): Supports API-based services like OpenAI Whisper or self-hosted engines such as Speaches (which utilizes Faster Whisper).
- 🧠 LLM (Large Language Model): Integrates with LiteLLM, providing access to a vast array of models including OpenAI, Anthropic, Google, Mistral, Cohere, Azure, and local models run via services like Ollama, LiteLLM proxy, vLLM, and more.
- 🔊 TTS (Text-to-Speech): Supports API-based services like OpenAI TTS or alternatives such as Kokoro-FastAPI (which can use KokoroTTS).
- This backend allows for a fully local setup if desired, using local STT, LLM (e.g., via Ollama), and TTS engines.
- OpenAI Backend: Utilizes OpenAI's real-time voice API for a streamlined, all-in-one voice interaction experience, requiring an OpenAI API key.
- Gemini Backend: Leverages Google's Gemini Live Connect API for real-time voice interactions, requiring a Google Gemini API key.
- Classic Backend: This is the most flexible option, offering a modular approach where you connect separate services for:
- ⚙️ Highly Configurable: Adjust backend type, STT/LLM/TTS hosts, ports, models, API keys, STT confidence thresholds (classic backend), TTS voice/speed (classic backend), system messages, and more via CLI arguments or
.envfile. - 🌐 Web Interface: Simple and responsive UI built with HTML, CSS, and JavaScript.
- 📊 Cost Tracking:
- Classic Backend: Real-time cost estimation for OpenAI LLM and TTS usage.
- OpenAI Backend: Real-time cost estimation based on token usage for the selected OpenAI real-time model.
- ⚡ Real-time Interaction: Low-latency voice communication powered by fastrtc (WebRTC).
- 👂 STT Confidence Filtering (Classic Backend): Automatically reject low-confidence transcriptions based on configurable thresholds (no speech probability, average log probability, minimum word count).
- 🎤 Dynamic Settings Adjustment:
- Classic Backend: Change LLM model, TTS voice, TTS speed, and STT language on-the-fly.
- OpenAI Backend: Change STT language and output voice (if supported by the model/API) on-the-fly.
- 🔍 Fuzzy Search: Quickly find models and voices using fuzzy search in the UI dropdowns.
- 💬 System Message Support: Define a custom system message to guide the LLM's behavior.
- 📝 Chat History Logging: Automatically saves conversation history to timestamped JSON files.
- 🔄 TTS Audio Replay (Classic Backend): Replay the audio for any assistant message directly from the chat interface.
- ⌨️ Keyboard Shortcuts: Control mute (M), clear chat (Ctrl+R), and toggle options (Shift+S) using keyboard shortcuts.
- 💓 Connection Monitoring: Uses a heartbeat mechanism to detect disconnected clients and potentially shut down the server.
- 🖥️ Cross-Platform GUI: Runs as a standalone desktop application using
pywebview(default) or in a standard web browser (--browserflag). The application explicitly uses the QT backend forpywebviewas the GTK backend lacks necessary WebRTC support.
Known Issues
- ⚠️ Cost Calculation: The cost calculation for the OpenAI real-time API and Gemini API is currently not functional.
Installation
-
Clone the repository:
git clone https://github.com/thiswillbeyourgithub/simple_voice_chat cd simple_voice_chat
-
Install the Python packages:
uv pip install -e .
-
(Optional) Configure services using environment variables. You can create a
.envfile based on the available options (see--helporutils/env.py).
Usage
Run the main script using Python:
simple-voice-chat --help
The application will start a web server and attempt to open the interface in a dedicated window (or browser tab if --browser is specified).
You can choose the backend using the --backend option:
--backend classic(default): Uses separate STT, LLM, and TTS services.--backend openai: Uses OpenAI's real-time voice API. Requires--openai-api-key.
For a detailed list of all configuration options, please use the --help flag:
simple-voice-chat --help
This will provide the most up-to-date information on available arguments and their corresponding environment variables, including options specific to each backend.
This README was generated with assistance from aider.chat.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_voice_chat-4.0.1.tar.gz.
File metadata
- Download URL: simple_voice_chat-4.0.1.tar.gz
- Upload date:
- Size: 73.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3fdf1e31fa78d5e24a440fe9a21cab1ad92580be38a9f37181ee15602b8b3b6
|
|
| MD5 |
2ae70f6d8ff1cd8ad9604d870c17a42f
|
|
| BLAKE2b-256 |
bbbfbe5f27cb22ede97abebaadad0fdde0116de178a68f74da920597a5b583ba
|
File details
Details for the file simple_voice_chat-4.0.1-py3-none-any.whl.
File metadata
- Download URL: simple_voice_chat-4.0.1-py3-none-any.whl
- Upload date:
- Size: 74.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9386030fd44fe9a666ae63605dde60035cadbb4e5dd7fbc31298755add4bd7cc
|
|
| MD5 |
3058219d37580d55b9cebb89e5b7c884
|
|
| BLAKE2b-256 |
1182fb740491e919ed2920fb688770a9c910bee80f39bcb43d90dcb1df3f07ae
|