Skip to main content

A simple voice chat interface using configurable LLM, STT, and TTS providers.

Project description

Simple Voice Chat

This project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.

Screenshot

Acknowledgement: This project heavily relies on the fantastic fastrtc library, which simplifies real-time audio streaming over WebRTC, making this application possible.

Motivation

The primary motivation for creating this project was the high cost associated with OpenAI's real-time voice API. This application allows you to leverage potentially more cost-effective or self-hosted alternatives for STT, LLM, and TTS, while still providing a near real-time voice interaction experience.

Features

  • Modular: Connect to different STT, LLM (including local models via proxies like LiteLLM), and TTS providers.

    • STT: Defaults to using Speaches (which allows self-hosting Faster Whisper), but can also connect directly to OpenAI's Whisper API.
    • TTS: Defaults to OpenAI TTS, but also supports alternatives like Kokoro-FastAPI.
    • LLM: Supports virtually any LLM provider (OpenAI, Anthropic, Google, Mistral, Cohere, Azure, local models, etc.) thanks to its integration with LiteLLM. You can connect to any OpenAI-compatible API endpoint, including local models served via proxies like LiteLLM itself, vLLM, or Ollama.
  • Configurable: Fine-tune various parameters for STT confidence, TTS voice/speed, LLM model selection, and more via command-line arguments or environment variables.

  • Web Interface: Provides a simple web-based UI for interaction.

  • Cost Tracking: Includes basic cost estimation for supported LLM and TTS providers (like OpenAI).

Installation

  1. Clone the repository:

    git clone https://github.com/thiswillbeyourgithub/simple_voice_chat
    
    cd simple_voice_chat
    
  2. Install the Python packages:

    uv pip install -e .
    
  3. (Optional) Configure services using environment variables. You can create a .env file based on the available options (see --help or utils/env.py).

Usage

Run the main script using Python:

simple-voice-chat --help

The application will start a web server and attempt to open the interface in a dedicated window (or browser tab if --browser is specified).

For a detailed list of all configuration options (STT/LLM/TTS hosts, ports, models, API keys, etc.), please use the --help flag:

simple-voice-chat --help

This will provide the most up-to-date information on available arguments and their corresponding environment variables.

Command-Line Options (--help)

usage: simple_voice_chat.py [-h] [--host HOST] [--port PORT] [-v]
                               [--auto-start | --no-auto-start] [--browser]
                               [--system-message SYSTEM_MESSAGE]
                               [--llm-host LLM_HOST] [--llm-port LLM_PORT]
                               [--llm-model LLM_MODEL]
                               [--llm-api-key LLM_API_KEY]
                               [--stt-host STT_HOST] [--stt-port STT_PORT]
                               [--stt-model STT_MODEL]
                               [--stt-language STT_LANGUAGE]
                               [--stt-api-key STT_API_KEY]
                               [--stt-no-speech-prob-threshold STT_NO_SPEECH_PROB_THRESHOLD]
                               [--stt-avg-logprob-threshold STT_AVG_LOGPROB_THRESHOLD]
                               [--stt-min-words-threshold STT_MIN_WORDS_THRESHOLD]
                               [--tts-host TTS_HOST] [--tts-port TTS_PORT]
                               [--tts-model TTS_MODEL] [--tts-voice TTS_VOICE]
                               [--tts-api-key TTS_API_KEY]
                               [--tts-speed TTS_SPEED]
                               [--tts-acronym-preserve-list TTS_ACRONYM_PRESERVE_LIST]

Run a simple voice chat interface using a configurable LLM provider, STT server, and TTS.

options: -h, --help show this help message and exit --host HOST Host address to bind the FastAPI server to. Default: 127.0.0.1 --port PORT Preferred port to run the FastAPI server on. Default: 7860. (Env: APP_PORT) -v, --verbose Enable verbose logging (DEBUG level) --auto-start, --no-auto-start Automatically start the connection when the application loads. Default: True --browser Launch the application in the default web browser instead of a dedicated GUI window. Default: False --system-message SYSTEM_MESSAGE System message to prepend to the chat history. Default: (from SYSTEM_MESSAGE env var, empty if unset). --llm-host LLM_HOST Host address of the LLM proxy server (optional). Default: None. (Env: LLM_HOST) --llm-port LLM_PORT Port of the LLM proxy server (optional). Default: None. (Env: LLM_PORT) --llm-model LLM_MODEL Default LLM model to use (e.g., 'gpt-4o', 'litellm_proxy/claude-3-opus'). Default: 'litellm_proxy/litellm_best'. (Env: LLM_MODEL) --llm-api-key LLM_API_KEY API key for the LLM provider/proxy (optional, depends on setup). Default: None. (Env: LLM_API_KEY) --stt-host STT_HOST Host address of the STT server (e.g., 'api.openai.com' or 'localhost'). Default: 'localhost'. (Env: STT_HOST) --stt-port STT_PORT Port of the STT server (e.g., 443 for OpenAI, 8002 for local). Default: '8002'. (Env: STT_PORT) --stt-model STT_MODEL STT model to use (e.g., 'whisper-1' for OpenAI, 'deepdml/faster-whisper-large-v3-turbo-ct2' for local). Default: 'deepdml/faster-whisper- large-v3-turbo-ct2'. (Env: STT_MODEL) --stt-language STT_LANGUAGE Language code for STT (e.g., 'en', 'fr'). If unset, Whisper usually auto-detects. Default: None. (Env: STT_LANGUAGE) --stt-api-key STT_API_KEY API key for the STT server (REQUIRED for OpenAI STT). Default: None. (Env: STT_API_KEY) --stt-no-speech-prob-threshold STT_NO_SPEECH_PROB_THRESHOLD STT confidence threshold: Reject if no_speech_prob is higher than this. Default: 0.6. (Env: STT_NO_SPEECH_PROB_THRESHOLD) --stt-avg-logprob-threshold STT_AVG_LOGPROB_THRESHOLD STT confidence threshold: Reject if avg_logprob is lower than this. Default: -0.7. (Env: STT_AVG_LOGPROB_THRESHOLD) --stt-min-words-threshold STT_MIN_WORDS_THRESHOLD STT confidence threshold: Reject if the number of words is less than this. Default: 5. (Env: STT_MIN_WORDS_THRESHOLD) --tts-host TTS_HOST Host address of the TTS server (e.g., 'api.openai.com' or 'localhost'). Default: 'api.openai.com'. (Env: TTS_HOST) --tts-port TTS_PORT Port of the TTS server (e.g., 443 for OpenAI, 8880 for local). Default: '443'. (Env: TTS_PORT) --tts-model TTS_MODEL TTS model to use (e.g., 'tts-1', 'tts-1-hd' for OpenAI, 'kokoro' for local). Default: 'tts-1'. (Env: TTS_MODEL) --tts-voice TTS_VOICE Default TTS voice to use (e.g., 'alloy', 'ash', 'echo' for OpenAI, 'ff_siwis' for local). Default: 'nova'. (Env: TTS_VOICE) --tts-api-key TTS_API_KEY API key for the TTS server (REQUIRED for OpenAI TTS). Default: None. (Env: TTS_API_KEY) --tts-speed TTS_SPEED Default TTS speed multiplier. Default: 1.00. (Env: TTS_SPEED) --tts-acronym-preserve-list TTS_ACRONYM_PRESERVE_LIST Comma-separated list of acronyms to preserve during TTS (currently only used for Kokoro TTS). Default: ''. (Env: TTS_ACRONYM_PRESERVE_LIST)


This README was generated with assistance from aider.chat.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_voice_chat-3.0.0.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_voice_chat-3.0.0-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file simple_voice_chat-3.0.0.tar.gz.

File metadata

  • Download URL: simple_voice_chat-3.0.0.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simple_voice_chat-3.0.0.tar.gz
Algorithm Hash digest
SHA256 d50f34367f89082e0285e18068f9ec74af0b34fb796a1d88d25decf061f3f28f
MD5 84cc8cfe10cf632a894d171bfdfd8ee8
BLAKE2b-256 cc2238c4f96e5e6ed98ef1c36dd16944b18013baf0373ad81d7c9b46fa48bdbc

See more details on using hashes here.

File details

Details for the file simple_voice_chat-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_voice_chat-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1ec1cc7ffdb262a8888039d22ac1489e59a6bd5ba2c0802e0c348bd80b5a2d6
MD5 b7df981a6c84d3c606bc0b6fe95858d1
BLAKE2b-256 1ad0f157c437b34a7b334c56da1c59302eccd9eb73f4fa9e63f2b0389de4f3c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page