Skip to main content

Real-time voice assistant with Speech-to-Text (STT), GPT, and Text-to-Speech (TTS). Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.

Project description

Vocalyx

Real-time voice assistant: Speech-to-Text (STT) → GPT → Text-to-Speech (TTS). Install with pip install vocalyx. Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.


Install

pip install vocalyx

Requirements: Python 3.10+, system dependencies portaudio and ffmpeg, microphone and audio output.


Architecture and flow

All components live in the vocalyx package and talk over WebSockets:

[Microphone] → vocalyx (client)
                    ↓
              STT server (RealtimeSTT) ← control + data WebSockets
                    ↓
              vocalyx (client) receives transcribed text
                    ↓
              TTS server (GPT + Soprano TTS) ← single WebSocket
                    ↓
              vocalyx (client) plays audio → [Speakers]
  1. vocalyx-stt – STT server. Listens on two ports: control (commands) and data (audio + transcriptions).
  2. vocalyx-tts – TTS server. Listens on one port. Receives text, streams GPT response, synthesizes with Soprano, streams audio back.
  3. vocalyx – Client. Captures mic, sends audio to STT; receives text, sends to TTS; receives and plays TTS audio.

Ports (defaults)

Component Role Default URL / Port
STT server Control WebSocket ws://localhost:8011
STT server Data WebSocket ws://localhost:8012
TTS server WebSocket ws://localhost:8013

The client uses these URLs by default. You can override them with --control, --data, and --tts-url when running vocalyx, and with -c/-d when running vocalyx-stt. The TTS server binds to localhost and the port above (no CLI port flag in the current release).


Models used

Role Model / service Notes
STT RealtimeSTT (Faster Whisper, CTranslate2) Main model default: tiny.en. Real-time model default: tiny. Configurable via -m, -r.
LLM OpenAI GPT (streaming) Default: gpt-3.5-turbo. Used for assistant replies. Requires OPENAI_API_KEY.
TTS Soprano TTS On-device, 80M params. Default: backend=auto, device=auto, cache_size_mb=100, decoder_batch_size=1. Output: 32 kHz mono float32.

Configuration

Environment

Create a .env in the directory from which you run the processes (or set the variable in the shell):

  • OPENAI_API_KEY – Required for the TTS server (GPT). Get it from OpenAI API keys.

TTS server (vocalyx-tts)

  • Host / port: localhost:8013 (hardcoded in code).
  • GPT: System prompt and model are sent by the client per request (default prompt “motivational assistant”.", default model: gpt-3.5-turbo). Temperature is 0.7.
  • Soprano: Loaded once at startup; backend="auto", device="auto", cache_size_mb=100, decoder_batch_size=1.

STT server (vocalyx-stt)

Key options (run vocalyx-stt --help for full list):

Option Default Description
-c, --control 8011 Control WebSocket port.
-d, --data 8012 Data WebSocket port.
-m, --model tiny.en Main Whisper model size or path (e.g. base, small, large-v2).
-r, --rt-model tiny Real-time transcription model size.
-l, --lang en Language code.
-i, --input-device 1 Audio input device index.
--device cuda cuda or cpu.
--compute_type default CTranslate2 compute type.
--silero_sensitivity 0.05 VAD sensitivity (0–1).
--webrtc_sensitivity 3 WebRTC VAD (0–3).
--end_of_sentence_detection_pause 0.45 Silence (seconds) to treat as end of sentence.
-D, --debug off Debug logging.

Client (vocalyx)

Option Default Description
--control ws://localhost:8011 STT control WebSocket URL.
--data ws://localhost:8012 STT data WebSocket URL.
--tts-url ws://localhost:8013 TTS WebSocket URL.
--voice af_heart Voice name (Soprano may ignore).
--speed 1.0 Playback speed.
--system-prompt, --prompt You are a motivational assistant. LLM system prompt (persona/instructions).
--model gpt-3.5-turbo OpenAI model (e.g. gpt-4, gpt-4o).
-L, --list - List audio devices and exit.
-c, --continous true Keep running and transcribing.

Usage

  1. Set OPENAI_API_KEY (e.g. in .env).
  2. Start STT server, then TTS server, then client:
# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx
  1. Speak; the client shows “You” and “AI” and plays the assistant’s voice.

List microphone devices:

vocalyx --list

Use a different TTS URL or STT URLs if your servers run on other hosts/ports:

vocalyx --tts-url ws://otherhost:8013 --control ws://otherhost:8011 --data ws://otherhost:8012

Audio format (TTS)

Soprano TTS outputs 32 kHz, mono, float32. The client plays this directly (e.g. paFloat32, 32000 Hz, 1 channel). Do not convert to μ-law or int16 for playback or you may change pitch/tempo.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalyx-1.0.1.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalyx-1.0.1-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file vocalyx-1.0.1.tar.gz.

File metadata

  • Download URL: vocalyx-1.0.1.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-1.0.1.tar.gz
Algorithm Hash digest
SHA256 50b00a3fe1ac847427e30fc82ab03da15a961b2f3fddef520f0b05fe1a3e446f
MD5 273e1318b5b2ebe34f1a0b55652e546e
BLAKE2b-256 8e5cb26fef6d04bd1ac5a95776147de6570ec84653ff1603f777ec10f66e99bf

See more details on using hashes here.

File details

Details for the file vocalyx-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: vocalyx-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ac3c1785b826858d3adc13793d5559abc278768ef3a98a8fdc0b1a962fbe254
MD5 0e6949057e564eae0b74dcfde1583ef0
BLAKE2b-256 157f7d145bcead1736a62074077bcddfef1c0f79f40362cadd79a1b298b5a421

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page