A simple voice chat interface using configurable LLM, STT, and TTS providers.

These details have not been verified by PyPI

Project links

Project description

Simple Voice Chat

This project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.

Screenshot

Acknowledgement: This project heavily relies on the fantastic fastrtc library, which simplifies real-time audio streaming over WebRTC, making this application possible.

Motivation

The primary motivation for creating this project was the high cost associated with OpenAI's real-time voice API. This application allows you to leverage potentially more cost-effective or self-hosted alternatives for STT, LLM, and TTS, while still providing a near real-time voice interaction experience.

Features

🔌 Modular Architecture: Easily connect to various STT, LLM, and TTS services.
- 🗣️ STT: Supports OpenAI Whisper API or self-hosted engines like Speaches (Faster Whisper).
- 🧠 LLM: Integrates with LiteLLM, enabling connections to OpenAI, Anthropic, Google, Mistral, Cohere, Azure, local models (via LiteLLM proxy, vLLM, Ollama), and more.
- 🔊 TTS: Supports OpenAI TTS API or alternatives like Kokoro-FastAPI.
⚙️ Highly Configurable: Adjust STT/LLM/TTS hosts, ports, models, API keys, STT confidence thresholds, TTS voice/speed, system messages, and more via CLI arguments or .env file.
🌐 Web Interface: Simple and responsive UI built with HTML, CSS, and JavaScript.
📊 Cost Tracking: Real-time cost estimation for OpenAI LLM and TTS usage.
⚡ Real-time Interaction: Low-latency voice communication powered by fastrtc (WebRTC).
👂 STT Confidence Filtering: Automatically reject low-confidence transcriptions based on configurable thresholds (no speech probability, average log probability, minimum word count).
🎤 Dynamic Voice/Model Selection: Change LLM model, TTS voice, TTS speed, and STT language on-the-fly through the UI without restarting.
🔍 Fuzzy Search: Quickly find models and voices using fuzzy search in the UI dropdowns.
💬 System Message Support: Define a custom system message to guide the LLM's behavior.
📝 Chat History Logging: Automatically saves conversation history to timestamped JSON files.
🔄 TTS Audio Replay: Replay the audio for any assistant message directly from the chat interface.
⌨️ Keyboard Shortcuts: Control mute (M), clear chat (Ctrl+R), and toggle options (Shift+S) using keyboard shortcuts.
💓 Connection Monitoring: Uses a heartbeat mechanism to detect disconnected clients and potentially shut down the server.
🖥️ Cross-Platform GUI: Runs as a standalone desktop application using pywebview (default) or in a standard web browser (--browser flag).

Installation

Clone the repository:

git clone https://github.com/thiswillbeyourgithub/simple_voice_chat

cd simple_voice_chat

Install the Python packages:
```
uv pip install -e .
```
(Optional) Configure services using environment variables. You can create a .env file based on the available options (see --help or utils/env.py).

Usage

Run the main script using Python:

simple-voice-chat --help

The application will start a web server and attempt to open the interface in a dedicated window (or browser tab if --browser is specified).

For a detailed list of all configuration options (STT/LLM/TTS hosts, ports, models, API keys, etc.), please use the --help flag:

simple-voice-chat --help

This will provide the most up-to-date information on available arguments and their corresponding environment variables.

Command-Line Options (--help)


usage: simple-voice-chat [-h] [--host HOST] [--port PORT] [-v] [--browser] [--system-message SYSTEM_MESSAGE] [--llm-host LLM_HOST]
                         [--llm-port LLM_PORT] [--llm-model LLM_MODEL] [--llm-api-key LLM_API_KEY] [--stt-host STT_HOST]
                         [--stt-port STT_PORT] [--stt-model STT_MODEL] [--stt-language STT_LANGUAGE] [--stt-api-key STT_API_KEY]
                         [--stt-no-speech-prob-threshold STT_NO_SPEECH_PROB_THRESHOLD]
                         [--stt-avg-logprob-threshold STT_AVG_LOGPROB_THRESHOLD] [--stt-min-words-threshold STT_MIN_WORDS_THRESHOLD]
                         [--tts-host TTS_HOST] [--tts-port TTS_PORT] [--tts-model TTS_MODEL] [--tts-voice TTS_VOICE]
                         [--tts-api-key TTS_API_KEY] [--tts-speed TTS_SPEED] [--tts-acronym-preserve-list TTS_ACRONYM_PRESERVE_LIST]
Run a simple voice chat interface using a configurable LLM provider, STT server, and TTS.
options:
-h, --help            show this help message and exit
--host HOST           Host address to bind the FastAPI server to. Default: 127.0.0.1
--port PORT           Preferred port to run the FastAPI server on. Default: 7860. (Env: APP_PORT)
-v, --verbose         Enable verbose logging (DEBUG level)
--browser             Launch the application in the default web browser instead of a dedicated GUI window. Default: False
--system-message SYSTEM_MESSAGE
System message to prepend to the chat history. Default: (from SYSTEM_MESSAGE env var, empty if unset).
--llm-host LLM_HOST   Host address of the LLM proxy server (optional). Default: None. (Env: LLM_HOST)
--llm-port LLM_PORT   Port of the LLM proxy server (optional). Default: None. (Env: LLM_PORT)
--llm-model LLM_MODEL
Default LLM model to use (e.g., 'gpt-4o', 'litellm_proxy/claude-3-opus'). Default:
'openrouter/google/gemini-2.5-pro-preview-03-25'. (Env: LLM_MODEL)
--llm-api-key LLM_API_KEY
API key for the LLM provider/proxy (optional, depends on setup). Default: None. (Env: LLM_API_KEY)
--stt-host STT_HOST   Host address of the STT server (e.g., 'api.openai.com' or 'localhost'). Default: 'api.openai.com'. (Env:
STT_HOST)
--stt-port STT_PORT   Port of the STT server (e.g., 443 for OpenAI, 8002 for local). Default: '443'. (Env: STT_PORT)
--stt-model STT_MODEL
STT model to use (e.g., 'whisper-1' for OpenAI, 'deepdml/faster-whisper-large-v3-turbo-ct2' for local).
Default: 'whisper-1'. (Env: STT_MODEL)
--stt-language STT_LANGUAGE
Language code for STT (e.g., 'en', 'fr'). If unset (empty string or not provided), Whisper usually auto-
detects. Default: None. (Env: STT_LANGUAGE)
--stt-api-key STT_API_KEY
API key for the STT server (REQUIRED for OpenAI STT). Default: None. (Env: STT_API_KEY)
--stt-no-speech-prob-threshold STT_NO_SPEECH_PROB_THRESHOLD
STT confidence threshold: Reject if no_speech_prob is higher than this. Default: 0.6. (Env:
STT_NO_SPEECH_PROB_THRESHOLD)
--stt-avg-logprob-threshold STT_AVG_LOGPROB_THRESHOLD
STT confidence threshold: Reject if avg_logprob is lower than this. Default: -0.7. (Env:
STT_AVG_LOGPROB_THRESHOLD)
--stt-min-words-threshold STT_MIN_WORDS_THRESHOLD
STT confidence threshold: Reject if the number of words is less than this. Default: 5. (Env:
STT_MIN_WORDS_THRESHOLD)
--tts-host TTS_HOST   Host address of the TTS server (e.g., 'api.openai.com' or 'localhost'). Default: 'api.openai.com'. (Env:
TTS_HOST)
--tts-port TTS_PORT   Port of the TTS server (e.g., 443 for OpenAI, 8880 for local). Default: '443'. (Env: TTS_PORT)
--tts-model TTS_MODEL
TTS model to use (e.g., 'tts-1', 'tts-1-hd' for OpenAI, 'kokoro' for local). Default: 'tts-1'. (Env:
TTS_MODEL)
--tts-voice TTS_VOICE
Default TTS voice to use (e.g., 'alloy', 'ash', 'echo' for OpenAI, 'ff_siwis' for local). Default: 'ash'.
(Env: TTS_VOICE)
--tts-api-key TTS_API_KEY
API key for the TTS server (REQUIRED for OpenAI TTS). Default: None. (Env: TTS_API_KEY)
--tts-speed TTS_SPEED
Default TTS speed multiplier. Default: 1.0. (Env: TTS_SPEED)
--tts-acronym-preserve-list TTS_ACRONYM_PRESERVE_LIST
Comma-separated list of acronyms to preserve during TTS (currently only used for Kokoro TTS). Default: ''.
(Env: TTS_ACRONYM_PRESERVE_LIST)

This README was generated with assistance from aider.chat.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.2.3

Dec 16, 2025

4.2.2

Oct 4, 2025

4.2.1

May 20, 2025

4.2.0

May 10, 2025

4.1.0

May 10, 2025

4.0.1

May 10, 2025

3.4.0

May 10, 2025

3.3.0

May 8, 2025

This version

3.2.0

May 3, 2025

3.1.0

Apr 29, 2025

3.0.0

Apr 26, 2025

2.1.0

Apr 26, 2025

2.0.0

Apr 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_voice_chat-3.2.0.tar.gz (62.9 kB view details)

Uploaded May 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_voice_chat-3.2.0-py3-none-any.whl (59.9 kB view details)

Uploaded May 3, 2025 Python 3

File details

Details for the file simple_voice_chat-3.2.0.tar.gz.

File metadata

Download URL: simple_voice_chat-3.2.0.tar.gz
Upload date: May 3, 2025
Size: 62.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e9906c555113174aeeb97aef821ec00d77fe716e64c1bb32aba1b19006a02732`
MD5	`cdfe633803d060ab6d4666a96dc71695`
BLAKE2b-256	`2ffd35de2253e5ddff949c679d77fdb2038e408901053825abb84b3cd49b7e16`

See more details on using hashes here.

File details

Details for the file simple_voice_chat-3.2.0-py3-none-any.whl.

File metadata

Download URL: simple_voice_chat-3.2.0-py3-none-any.whl
Upload date: May 3, 2025
Size: 59.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`939e739c13246361b61caf011078136ee423fc2b7b951034f3aac9700dcf7167`
MD5	`abc6562e2f07b06efebe11454bd44cc4`
BLAKE2b-256	`9a40850592657ac7c532ce055cb4c789dff19a5a0af352a2893a34f37da53bd6`

See more details on using hashes here.

simple-voice-chat 3.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Simple Voice Chat

Motivation

Features

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes