Skip to main content

Real-time voice assistant with STT, GPT, and TTS. Install with pip install vocalyx. Run vocalyx-stt, vocalyx-tts, and vocalyx.

Project description

Vocalyx

Real-time voice assistant: Speech-to-Text (STT) → GPT → Text-to-Speech (TTS). Install with pip install vocalyx. Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.


Install

pip install vocalyx

Requirements: Python 3.10+, system dependencies portaudio and ffmpeg, microphone and audio output.


Architecture and flow

All components live in the vocalyx package and talk over WebSockets:

[Microphone] → vocalyx (client)
                    ↓
              STT server (RealtimeSTT) ← control + data WebSockets
                    ↓
              vocalyx (client) receives transcribed text
                    ↓
              TTS server (GPT + Soprano TTS) ← single WebSocket
                    ↓
              vocalyx (client) plays audio → [Speakers]
  1. vocalyx-stt – STT server. Listens on two ports: control (commands) and data (audio + transcriptions).
  2. vocalyx-tts – TTS server. Listens on one port. Receives text, streams GPT response, synthesizes with Soprano, streams audio back.
  3. vocalyx – Client. Captures mic, sends audio to STT; receives text, sends to TTS; receives and plays TTS audio.

Ports (defaults)

Component Role Default URL / Port
STT server Control WebSocket ws://localhost:8011
STT server Data WebSocket ws://localhost:8012
TTS server WebSocket ws://localhost:8013

The client uses these URLs by default. You can override them with --control, --data, and --tts-url when running vocalyx, and with -c/-d when running vocalyx-stt. The TTS server binds to localhost and the port above (no CLI port flag in the current release).


Models used

Role Model / service Notes
STT RealtimeSTT (Faster Whisper, CTranslate2) Main model default: tiny.en. Real-time model default: tiny. Configurable via -m, -r.
LLM OpenAI GPT (streaming) Default: gpt-3.5-turbo. Used for assistant replies. Requires OPENAI_API_KEY.
TTS Soprano TTS On-device, 80M params. Default: backend=auto, device=auto, cache_size_mb=100, decoder_batch_size=1. Output: 32 kHz mono float32.

Configuration

Environment

Create a .env in the directory from which you run the processes (or set the variable in the shell):

  • OPENAI_API_KEY – Required for the TTS server (GPT). Get it from OpenAI API keys.

TTS server (vocalyx-tts)

  • Host / port: localhost:8013 (hardcoded in code).
  • GPT: System prompt is a fixed “motivational assistant”; model gpt-3.5-turbo, temperature 0.7.
  • Soprano: Loaded once at startup; backend="auto", device="auto", cache_size_mb=100, decoder_batch_size=1.

STT server (vocalyx-stt)

Key options (run vocalyx-stt --help for full list):

Option Default Description
-c, --control 8011 Control WebSocket port.
-d, --data 8012 Data WebSocket port.
-m, --model tiny.en Main Whisper model size or path (e.g. base, small, large-v2).
-r, --rt-model tiny Real-time transcription model size.
-l, --lang en Language code.
-i, --input-device 1 Audio input device index.
--device cuda cuda or cpu.
--compute_type default CTranslate2 compute type.
--silero_sensitivity 0.05 VAD sensitivity (0–1).
--webrtc_sensitivity 3 WebRTC VAD (0–3).
--end_of_sentence_detection_pause 0.45 Silence (seconds) to treat as end of sentence.
-D, --debug off Debug logging.

Client (vocalyx)

Option Default Description
--control ws://localhost:8011 STT control WebSocket URL.
--data ws://localhost:8012 STT data WebSocket URL.
--tts-url ws://localhost:8013 TTS WebSocket URL.
--voice af_heart Voice name (Soprano may ignore).
--speed 1.0 Playback speed.
-L, --list - List audio devices and exit.
-c, --continous true Keep running and transcribing.

Usage

  1. Set OPENAI_API_KEY (e.g. in .env).
  2. Start STT server, then TTS server, then client:
# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx
  1. Speak; the client shows “You” and “AI” and plays the assistant’s voice.

List microphone devices:

vocalyx --list

Use a different TTS URL or STT URLs if your servers run on other hosts/ports:

vocalyx --tts-url ws://otherhost:8013 --control ws://otherhost:8011 --data ws://otherhost:8012

Audio format (TTS)

Soprano TTS outputs 32 kHz, mono, float32. The client plays this directly (e.g. paFloat32, 32000 Hz, 1 channel). Do not convert to μ-law or int16 for playback or you may change pitch/tempo.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalyx-0.3.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalyx-0.3.0-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file vocalyx-0.3.0.tar.gz.

File metadata

  • Download URL: vocalyx-0.3.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1b6683557e8c396abf2b0da99ffe9216d33ed60bf828798d0d8ec9c63ef2c54f
MD5 6425cb3d42428ff5f101e2015f14ab5f
BLAKE2b-256 233630450f9c665b0972a38449a050613c631a410832f4fdce97970d6e400cfe

See more details on using hashes here.

File details

Details for the file vocalyx-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: vocalyx-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84dab79aff38a7cc7e7d1079c8a430336c535b57d4a77d2423ce8b9781b52329
MD5 054972ffc43bd599abdd2d5777a54aeb
BLAKE2b-256 a35195317b188cc7973b38e4a4635f53a566bf96c26e76f6524713065cc581cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page