Skip to main content

Real-time voice assistant with Speech-to-Text (STT), GPT, and Text-to-Speech (TTS). Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.

Project description

Vocalyx

Real-time voice assistant: Speech-to-Text (STT) → GPT → Text-to-Speech (TTS). Install with pip install vocalyx. Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.


Install

pip install vocalyx

Requirements: Python 3.10+, system dependencies portaudio and ffmpeg, microphone and audio output.


Architecture and flow

All components live in the vocalyx package and talk over WebSockets:

[Microphone] → vocalyx (client)
                    ↓
              STT server (RealtimeSTT) ← control + data WebSockets
                    ↓
              vocalyx (client) receives transcribed text
                    ↓
              TTS server (GPT + Soprano TTS) ← single WebSocket
                    ↓
              vocalyx (client) plays audio → [Speakers]
  1. vocalyx-stt – STT server. Listens on two ports: control (commands) and data (audio + transcriptions).
  2. vocalyx-tts – TTS server. Listens on one port. Receives text, streams GPT response, synthesizes with Soprano, streams audio back.
  3. vocalyx – Client. Captures mic, sends audio to STT; receives text, sends to TTS; receives and plays TTS audio.

Ports (defaults)

Component Role Default URL / Port
STT server Control WebSocket ws://localhost:8011
STT server Data WebSocket ws://localhost:8012
TTS server WebSocket ws://localhost:8013

The client uses these URLs by default. You can override them with --control, --data, and --tts-url when running vocalyx, and with -c/-d when running vocalyx-stt. The TTS server binds to localhost and the port above (no CLI port flag in the current release).


Models used

Role Model / service Notes
STT RealtimeSTT (Faster Whisper, CTranslate2) Main model default: tiny.en. Real-time model default: tiny. Configurable via -m, -r.
LLM OpenAI GPT (streaming) Default: gpt-3.5-turbo. Used for assistant replies. Requires OPENAI_API_KEY.
TTS Soprano TTS On-device, 80M params. Default: backend=auto, device=auto, cache_size_mb=100, decoder_batch_size=1. Output: 32 kHz mono float32.

Configuration

Environment

Create a .env in the directory from which you run the processes (or set the variable in the shell):

  • OPENAI_API_KEY – Required for the TTS server (GPT). Get it from OpenAI API keys.

TTS server (vocalyx-tts)

  • Host / port: localhost:8013 (hardcoded in code).
  • GPT: System prompt and model are sent by the client per request (default prompt “motivational assistant”.", default model: gpt-3.5-turbo). Temperature is 0.7.
  • Soprano: Loaded once at startup; backend="auto", device="auto", cache_size_mb=100, decoder_batch_size=1.

STT server (vocalyx-stt)

Key options (run vocalyx-stt --help for full list):

Option Default Description
-c, --control 8011 Control WebSocket port.
-d, --data 8012 Data WebSocket port.
-m, --model tiny.en Main Whisper model size or path (e.g. base, small, large-v2).
-r, --rt-model tiny Real-time transcription model size.
-l, --lang en Language code.
-i, --input-device 1 Audio input device index.
--device cuda cuda or cpu.
--compute_type default CTranslate2 compute type.
--silero_sensitivity 0.05 VAD sensitivity (0–1).
--webrtc_sensitivity 3 WebRTC VAD (0–3).
--end_of_sentence_detection_pause 0.45 Silence (seconds) to treat as end of sentence.
-D, --debug off Debug logging.

Client (vocalyx)

Option Default Description
--control ws://localhost:8011 STT control WebSocket URL.
--data ws://localhost:8012 STT data WebSocket URL.
--tts-url ws://localhost:8013 TTS WebSocket URL.
--voice af_heart Voice name (Soprano may ignore).
--speed 1.0 Playback speed.
--system-prompt, --prompt You are a motivational assistant. LLM system prompt (persona/instructions).
--model gpt-3.5-turbo OpenAI model (e.g. gpt-4, gpt-4o).
-L, --list - List audio devices and exit.
-c, --continous true Keep running and transcribing.

Usage

  1. Set OPENAI_API_KEY (e.g. in .env).
  2. Start STT server, then TTS server, then client:
# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx
  1. Speak; the client shows “You” and “AI” and plays the assistant’s voice.

List microphone devices:

vocalyx --list

Use a different TTS URL or STT URLs if your servers run on other hosts/ports:

vocalyx --tts-url ws://otherhost:8013 --control ws://otherhost:8011 --data ws://otherhost:8012

Audio format (TTS)

Soprano TTS outputs 32 kHz, mono, float32. The client plays this directly (e.g. paFloat32, 32000 Hz, 1 channel). Do not convert to μ-law or int16 for playback or you may change pitch/tempo.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalyx-1.0.2.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalyx-1.0.2-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file vocalyx-1.0.2.tar.gz.

File metadata

  • Download URL: vocalyx-1.0.2.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for vocalyx-1.0.2.tar.gz
Algorithm Hash digest
SHA256 8b65f42cfc64049fdebe4c984a3f94d20286b8fb2880902246e7d14639f89434
MD5 20c1817cfeb477ae974ca1b827778e68
BLAKE2b-256 8a5abad85d39b2e5a3601ca9c01073e36b8a876e8e8d86ea17645d007b6352ad

See more details on using hashes here.

File details

Details for the file vocalyx-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: vocalyx-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for vocalyx-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 56d13dbfc7cb9f87a2f2ef6d17e91ee13f351ba138bf2a22c2ca8bbdbf641214
MD5 f815dc38204d17394094b2ea6474d1a2
BLAKE2b-256 59e23c86e512d3ae24f0a075250c8a9dba54e38005e957a79fa48829c7a13ea3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page