Real-time voice assistant with Speech-to-Text (STT), GPT, and Text-to-Speech (TTS). Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.

These details have not been verified by PyPI

Project links

Project description

Vocalyx

Real-time voice assistant: Speech-to-Text (STT) → GPT → Text-to-Speech (TTS). Install with pip install vocalyx. Run vocalyx-stt, vocalyx-tts, and vocalyx in three terminals.

Install

pip install vocalyx

Requirements: Python 3.10+, system dependencies portaudio and ffmpeg, microphone and audio output.

Architecture and flow

All components live in the vocalyx package and talk over WebSockets:

[Microphone] → vocalyx (client)
                    ↓
              STT server (RealtimeSTT) ← control + data WebSockets
                    ↓
              vocalyx (client) receives transcribed text
                    ↓
              TTS server (GPT + Soprano TTS) ← single WebSocket
                    ↓
              vocalyx (client) plays audio → [Speakers]

vocalyx-stt – STT server. Listens on two ports: control (commands) and data (audio + transcriptions).
vocalyx-tts – TTS server. Listens on one port. Receives text, streams GPT response, synthesizes with Soprano, streams audio back.
vocalyx – Client. Captures mic, sends audio to STT; receives text, sends to TTS; receives and plays TTS audio.

Ports (defaults)

Component	Role	Default URL / Port
STT server	Control WebSocket	`ws://localhost:8011`
STT server	Data WebSocket	`ws://localhost:8012`
TTS server	WebSocket	`ws://localhost:8013`

The client uses these URLs by default. You can override them with --control, --data, and --tts-url when running vocalyx, and with -c/-d when running vocalyx-stt. The TTS server binds to localhost and the port above (no CLI port flag in the current release).

Models used

Role	Model / service	Notes
STT	RealtimeSTT (Faster Whisper, CTranslate2)	Main model default: `tiny.en`. Real-time model default: `tiny`. Configurable via `-m`, `-r`.
LLM	OpenAI GPT (streaming)	Default: `gpt-3.5-turbo`. Used for assistant replies. Requires `OPENAI_API_KEY`.
TTS	Soprano TTS	On-device, 80M params. Default: `backend=auto`, `device=auto`, `cache_size_mb=100`, `decoder_batch_size=1`. Output: 32 kHz mono float32.

Configuration

Environment

Create a .env in the directory from which you run the processes (or set the variable in the shell):

OPENAI_API_KEY – Required for the TTS server (GPT). Get it from OpenAI API keys.

TTS server (`vocalyx-tts`)

Host / port: localhost:8013 (hardcoded in code).
GPT: System prompt is a fixed “motivational assistant”; model gpt-3.5-turbo, temperature 0.7.
Soprano: Loaded once at startup; backend="auto", device="auto", cache_size_mb=100, decoder_batch_size=1.

STT server (`vocalyx-stt`)

Key options (run vocalyx-stt --help for full list):

Option	Default	Description
`-c`, `--control`	8011	Control WebSocket port.
`-d`, `--data`	8012	Data WebSocket port.
`-m`, `--model`	`tiny.en`	Main Whisper model size or path (e.g. `base`, `small`, `large-v2`).
`-r`, `--rt-model`	`tiny`	Real-time transcription model size.
`-l`, `--lang`	`en`	Language code.
`-i`, `--input-device`	1	Audio input device index.
`--device`	`cuda`	`cuda` or `cpu`.
`--compute_type`	`default`	CTranslate2 compute type.
`--silero_sensitivity`	0.05	VAD sensitivity (0–1).
`--webrtc_sensitivity`	3	WebRTC VAD (0–3).
`--end_of_sentence_detection_pause`	0.45	Silence (seconds) to treat as end of sentence.
`-D`, `--debug`	off	Debug logging.

Client (`vocalyx`)

Option	Default	Description
`--control`	`ws://localhost:8011`	STT control WebSocket URL.
`--data`	`ws://localhost:8012`	STT data WebSocket URL.
`--tts-url`	`ws://localhost:8013`	TTS WebSocket URL.
`--voice`	`af_heart`	Voice name (Soprano may ignore).
`--speed`	1.0	Playback speed.
`-L`, `--list`	-	List audio devices and exit.
`-c`, `--continous`	true	Keep running and transcribing.

Usage

Set OPENAI_API_KEY (e.g. in .env).
Start STT server, then TTS server, then client:

# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx

Speak; the client shows “You” and “AI” and plays the assistant’s voice.

List microphone devices:

vocalyx --list

Use a different TTS URL or STT URLs if your servers run on other hosts/ports:

vocalyx --tts-url ws://otherhost:8013 --control ws://otherhost:8011 --data ws://otherhost:8012

Audio format (TTS)

Soprano TTS outputs 32 kHz, mono, float32. The client plays this directly (e.g. paFloat32, 32000 Hz, 1 channel). Do not convert to μ-law or int16 for playback or you may change pitch/tempo.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Feb 15, 2026

1.0.1

Feb 13, 2026

This version

1.0.0

Feb 11, 2026

0.3.0

Feb 11, 2026

0.2.1

Feb 11, 2026

0.2.0

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalyx-1.0.0.tar.gz (22.9 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vocalyx-1.0.0-py3-none-any.whl (22.4 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file vocalyx-1.0.0.tar.gz.

File metadata

Download URL: vocalyx-1.0.0.tar.gz
Upload date: Feb 11, 2026
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for vocalyx-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4d98b14acc0de9c8fe584f818fe75f7dec6406fc48f05ce6e4006c6afc20561b`
MD5	`1c93c4fe574677f2d899674cdb6fe58f`
BLAKE2b-256	`0fb7a229b05fb2e981ebe6458ca2a40881d1bea992cb582ce90e936997cb1e63`

See more details on using hashes here.

File details

Details for the file vocalyx-1.0.0-py3-none-any.whl.

File metadata

Download URL: vocalyx-1.0.0-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 22.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for vocalyx-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b402ad0d0b0594ce104d0e0d502daff7b424154abb0cba0a6663e70eba4bcaf6`
MD5	`d0ee3aa3c1710750c664fe28d41b8318`
BLAKE2b-256	`5d4f51428d4192e288b34feeb0f6f053b7b477ac66f53f77522605f953e42dba`

See more details on using hashes here.

vocalyx 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vocalyx

Install

Architecture and flow

Ports (defaults)

Models used

Configuration

Environment

TTS server (`vocalyx-tts`)

STT server (`vocalyx-stt`)

Client (`vocalyx`)

Usage

Audio format (TTS)

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

vocalyx 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vocalyx

Install

Architecture and flow

Ports (defaults)

Models used

Configuration

Environment

TTS server (vocalyx-tts)

STT server (vocalyx-stt)

Client (vocalyx)

Usage

Audio format (TTS)

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

TTS server (`vocalyx-tts`)

STT server (`vocalyx-stt`)

Client (`vocalyx`)