A simple voice chat interface using configurable LLM, STT, and TTS providers.

These details have not been verified by PyPI

Project links

Project description

Simple Voice Chat

This project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.

Screenshot

Acknowledgement: This project heavily relies on the fantastic fastrtc library, which simplifies real-time audio streaming over WebRTC and provided crucial examples for setting up the various supported backends, making this application possible.

Motivation

This project aims to provide a versatile and cost-effective voice chat interface. While initially driven by the desire for alternatives to OpenAI's real-time voice API, it has evolved to offer multiple backend options, including direct integration with OpenAI's real-time services. This allows users to choose the best STT, LLM, and TTS combination for their needs, whether prioritizing cost, performance, self-hosting, or specific provider features.

Features

🚀 Multiple Backends: The application supports three primary backend types for voice processing:
- Classic Backend: This is the most flexible option, offering a modular approach where you connect separate services for:
  - 🗣️ STT (Speech-to-Text): Supports API-based services like OpenAI Whisper or self-hosted engines such as Speaches (which utilizes Faster Whisper).
  - 🧠 LLM (Large Language Model): Integrates with LiteLLM, providing access to a vast array of models including OpenAI, Anthropic, Google, Mistral, Cohere, Azure, and local models run via services like Ollama, LiteLLM proxy, vLLM, and more.
  - 🔊 TTS (Text-to-Speech): Supports API-based services like OpenAI TTS or alternatives such as Kokoro-FastAPI (which can use KokoroTTS).
  - This backend allows for a fully local setup if desired, using local STT, LLM (e.g., via Ollama), and TTS engines.
- OpenAI Backend: Utilizes OpenAI's real-time voice API for a streamlined, all-in-one voice interaction experience, requiring an OpenAI API key.
- Gemini Backend: Leverages Google's Gemini Live Connect API for real-time voice interactions, requiring a Google Gemini API key.
⚙️ Highly Configurable: Adjust backend type, STT/LLM/TTS hosts, ports, models, API keys, STT confidence thresholds (classic backend), TTS voice/speed (classic backend), system messages, and more via CLI arguments or .env file.
🌐 Web Interface: Simple and responsive UI built with HTML, CSS, and JavaScript.
📊 Cost Tracking:
- Classic Backend: Real-time cost estimation for OpenAI LLM and TTS usage.
- OpenAI Backend: Real-time cost estimation based on token usage for the selected OpenAI real-time model.
⚡ Real-time Interaction: Low-latency voice communication powered by fastrtc (WebRTC).
👂 STT Confidence Filtering (Classic Backend): Automatically reject low-confidence transcriptions based on configurable thresholds (no speech probability, average log probability, minimum word count).
🎤 Dynamic Settings Adjustment:
- Classic Backend: Change LLM model, TTS voice, TTS speed, and STT language on-the-fly.
- OpenAI Backend: Change STT language and output voice (if supported by the model/API) on-the-fly.
🔍 Fuzzy Search: Quickly find models and voices using fuzzy search in the UI dropdowns.
💬 System Message Support: Define a custom system message to guide the LLM's behavior.
📝 Chat History Logging: Automatically saves conversation history to timestamped JSON files.
🔄 TTS Audio Replay (Classic Backend): Replay the audio for any assistant message directly from the chat interface.
⌨️ Keyboard Shortcuts: Control mute (M), clear chat (Ctrl+R), and toggle options (Shift+S) using keyboard shortcuts.
💓 Connection Monitoring: Uses a heartbeat mechanism to detect disconnected clients and potentially shut down the server.
🖥️ Cross-Platform GUI: Runs as a standalone desktop application using pywebview (default) or in a standard web browser (--browser flag). The application explicitly uses the QT backend for pywebview as the GTK backend lacks necessary WebRTC support.

Known Issues

⚠️ Cost Calculation: The cost calculation for the OpenAI real-time API and Gemini API is currently not functional.

Installation

Clone the repository:

git clone https://github.com/thiswillbeyourgithub/simple_voice_chat

cd simple_voice_chat

Install the Python packages:
```
uv pip install -e .
```
(Optional) Configure services using environment variables. You can create a .env file based on the available options (see --help or utils/env.py).

Usage

Run the main script using Python:

simple-voice-chat --help

The application will start a web server and attempt to open the interface in a dedicated window (or browser tab if --browser is specified).

You can choose the backend using the --backend option:

--backend classic (default): Uses separate STT, LLM, and TTS services.
--backend openai: Uses OpenAI's real-time voice API. Requires --openai-api-key.

For a detailed list of all configuration options, please use the --help flag:

simple-voice-chat --help

This will provide the most up-to-date information on available arguments and their corresponding environment variables, including options specific to each backend.

This README was generated with assistance from aider.chat.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.2.3

Dec 16, 2025

4.2.2

Oct 4, 2025

4.2.1

May 20, 2025

4.2.0

May 10, 2025

4.1.0

May 10, 2025

This version

4.0.1

May 10, 2025

3.4.0

May 10, 2025

3.3.0

May 8, 2025

3.2.0

May 3, 2025

3.1.0

Apr 29, 2025

3.0.0

Apr 26, 2025

2.1.0

Apr 26, 2025

2.0.0

Apr 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_voice_chat-4.0.1.tar.gz (73.6 kB view details)

Uploaded May 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_voice_chat-4.0.1-py3-none-any.whl (74.1 kB view details)

Uploaded May 10, 2025 Python 3

File details

Details for the file simple_voice_chat-4.0.1.tar.gz.

File metadata

Download URL: simple_voice_chat-4.0.1.tar.gz
Upload date: May 10, 2025
Size: 73.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-4.0.1.tar.gz
Algorithm	Hash digest
SHA256	`c3fdf1e31fa78d5e24a440fe9a21cab1ad92580be38a9f37181ee15602b8b3b6`
MD5	`2ae70f6d8ff1cd8ad9604d870c17a42f`
BLAKE2b-256	`bbbfbe5f27cb22ede97abebaadad0fdde0116de178a68f74da920597a5b583ba`

See more details on using hashes here.

File details

Details for the file simple_voice_chat-4.0.1-py3-none-any.whl.

File metadata

Download URL: simple_voice_chat-4.0.1-py3-none-any.whl
Upload date: May 10, 2025
Size: 74.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-4.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9386030fd44fe9a666ae63605dde60035cadbb4e5dd7fbc31298755add4bd7cc`
MD5	`3058219d37580d55b9cebb89e5b7c884`
BLAKE2b-256	`1182fb740491e919ed2920fb688770a9c910bee80f39bcb43d90dcb1df3f07ae`

See more details on using hashes here.

simple-voice-chat 4.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Simple Voice Chat

Motivation

Features

Known Issues

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes