Real-time voice assistant with Speech-to-Text, GPT, and Text-to-Speech (Soprano TTS)

These details have not been verified by PyPI

Project links

Project description

🎙️ Real-Time Voice Assistant (STT + GPT + TTS)

This project is a local, streaming voice assistant that listens to your voice, transcribes it in real time (STT), generates an AI response using GPT, and speaks the answer back using SopranoTTS.

It's built from three key components:

stt_server (vocalyx-stt) – handles real-time speech-to-text via WebSockets.
tts_server (vocalyx-tts) – streams GPT responses and converts them to speech using SopranoTTS.
client (vocalyx) – connects everything together: records your mic, shows live transcription, sends it to GPT, and plays back AI-generated voice.

📦 Install

pip install vocalyx

Or install from source:

git clone <repo-url>
cd Voice-to-Voice
pip install -e .

🧩 Requirements

1. Python

Make sure you have Python 3.9+ and <=3.12 installed.

2. System Dependencies

You'll need:

ffmpeg (for audio handling)
A working microphone and audio output
portaudio (for PyAudio)

macOS

brew install portaudio ffmpeg

Ubuntu / Debian

sudo apt update
sudo apt install portaudio19-dev ffmpeg python3-pyaudio

Windows

Install Python (make sure to add it to PATH)
PyAudio binaries can be installed with:

pip install pipwin
pipwin install pyaudio

📦 Install Python Dependencies

Create a virtual environment (Python 3.12 recommended) and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Key Python Packages

RealtimeSTT – real-time speech-to-text
openai – GPT streaming responses
soprano-tts – neural TTS engine
torch, numpy – audio inference backend
pyaudio, sounddevice – audio playback
python-dotenv – environment variable loading

🔑 Environment Variables

Create a .env file in the project root with:

OPENAI_API_KEY=your_openai_api_key_here

Get your key from https://platform.openai.com/api-keys.

⚙️ How It Works

End-to-end flow:

[Microphone Input]
        ↓
 [client.py] → Sends audio to STT server
        ↓
 [stt_server.py] → Transcribes speech in real time
        ↓
 [client.py] → Sends text to TTS server
        ↓
 [tts_server.py] → Streams GPT text + converts to speech (SopranoTTS)
        ↓
 [client.py] → Plays AI voice audio live

Each component communicates over WebSockets:

STT control channel: ws://localhost:8011
STT data channel: ws://localhost:8012
TTS channel: ws://localhost:8013

🔊 Audio Format (Important)

SopranoTTS streams raw float32 mono audio:

Sample rate: 32000 Hz
Channels: 1
Format: float32

The client plays audio directly using paFloat32 without μ-law or int16 conversion. This ensures:

Natural pitch
Correct tempo
No distortion

🚀 Running the System

You’ll need three terminals.

1️⃣ Start the STT Server

vocalyx-stt

Handles microphone audio and real-time transcription.

2️⃣ Start the TTS Server

vocalyx-tts

Streams GPT responses and converts them to speech using SopranoTTS.

3️⃣ Run the Client

vocalyx

The client:

Captures microphone input
Displays live transcription
Sends prompts to GPT
Plays streamed AI voice output

By default, it runs in continuous mode.

🗣️ Example Interaction

You:

What's a good way to stay focused today?

AI: (spoken + printed)

Try breaking your day into short focus sessions. Take a quick stretch between them.

⚙️ Optional Command-Line Arguments

You can tweak client.py behavior:

Flag	Description	Default
`--tts-url`	TTS WebSocket server URL	`ws://localhost:8013`
`--post-silence`	Silence after each utterance	`1.0`
`--speech-end-detection`	Adaptive silence detection	off
`--debug`	Print debug logs	off
`--norealtime`	Disable live transcription display	off
`--list`	List microphone devices	off

List audio devices:

vocalyx --list

Select a specific mic:

vocalyx -i 2

🧠 Notes

SopranoTTS is initialized once and reused for all requests.
GPT responses are streamed sentence-by-sentence to minimize latency.
Audio is streamed and played in near real time.

🧹 Troubleshooting

Audio sounds distorted or slow

Ensure client playback uses paFloat32 at 32000 Hz.
Do not apply μ-law or int16 conversion.

No response from GPT

Verify OPENAI_API_KEY in .env.
Check internet connectivity.

STT not transcribing

Ensure RealtimeSTT is installed correctly.
Verify microphone index using --list.

🧾 License

This project is for personal and educational use.

💡 Future Improvements

VAD-based auto start/stop for more natural conversations
Opus/WebRTC streaming for browser clients
GUI frontend for controlling STT/TTS parameters
Interruptible (barge-in) speech handling

🏁 Summary

# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Feb 15, 2026

1.0.1

Feb 13, 2026

1.0.0

Feb 11, 2026

0.3.0

Feb 11, 2026

0.2.1

Feb 11, 2026

This version

0.2.0

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalyx-0.2.0.tar.gz (23.4 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vocalyx-0.2.0-py3-none-any.whl (22.7 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file vocalyx-0.2.0.tar.gz.

File metadata

Download URL: vocalyx-0.2.0.tar.gz
Upload date: Feb 11, 2026
Size: 23.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ef8d5498f5ce973380d98476f488ca59a6bbe076be7710f54f69861c4e288c2f`
MD5	`8e8c2ff85d467fa58d4c4dcea2ba260d`
BLAKE2b-256	`dbafd4b2a2e81e1456469c25cab5859e28a0002b1403ec15a894a993e5542769`

See more details on using hashes here.

File details

Details for the file vocalyx-0.2.0-py3-none-any.whl.

File metadata

Download URL: vocalyx-0.2.0-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 22.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vocalyx-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66987acc97cec44a7e5ba044b880bd8abad593396faa645f1cf13d7e7f7ff58c`
MD5	`d2b04381b90e3335b721b22e9a54be42`
BLAKE2b-256	`62c65bfde9e1394ba9e329420693aa22edad6f802b0b61d194e7e34a8ee79949`

See more details on using hashes here.

vocalyx 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎙️ Real-Time Voice Assistant (STT + GPT + TTS)

📦 Install

🧩 Requirements

1. Python

2. System Dependencies

macOS

Ubuntu / Debian

Windows

📦 Install Python Dependencies

Key Python Packages

🔑 Environment Variables

⚙️ How It Works

🔊 Audio Format (Important)

🚀 Running the System

1️⃣ Start the STT Server

2️⃣ Start the TTS Server

3️⃣ Run the Client

🗣️ Example Interaction

⚙️ Optional Command-Line Arguments

🧠 Notes

🧹 Troubleshooting

Audio sounds distorted or slow

No response from GPT

STT not transcribing

🧾 License

💡 Future Improvements

🏁 Summary

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes