A modular, voice conversation pipeline using MLX.
Project description
voiceChain: A Local-First, Streaming, and Interruptible Voice Agent Framework
Introduction
voiceChain is a high-performance, modular Python library for building real-time, voice-based conversational AI agents. Its core mission is to provide a complete, local-first framework that runs entirely offline on consumer hardware, with a special focus on leveraging the full power of Apple Silicon via the MLX and llama.cpp Metal backends. The novel architecture is designed from the ground up for low-latency streaming and full user interruptibility, creating a truly responsive conversational experience.
Core Features
- End-to-End Local Pipeline: All components—including Speech-to-Text (Whisper), Large Language Model (Llama/Qwen), and Text-to-Speech (Kokoro)—run 100% offline, ensuring privacy and independence from cloud services.
- High-Performance on Apple Silicon: Leverages Apple's Metal framework via MLX and
llama-cpp-pythonfor massive GPU acceleration on all AI model inferences. - Low-Latency Streaming: A fully asynchronous design using
asyncioallows for the parallel processing of STT, LLM inference, and TTS synthesis. This minimizes "dead air" and begins audio playback as soon as the first sentence is generated. - Hands-Free Operation: A sophisticated two-stage Voice Activity Detection (VAD) system (WebRTCVAD + Silero VAD) provides robust, continuous listening for a "wake-word free" experience.
- Full Interruptibility (Barge-In): Users can interrupt the agent at any point during its response. The pipeline gracefully cancels the in-flight generation and playback, immediately processing the user's new command.
- Software-Based Echo Cancellation: A pragmatic, text-based echo detection algorithm prevents the agent from accidentally transcribing and responding to its own speech, a common issue in full-duplex systems.
Architecture Overview
voiceChain is built on a clean, decoupled architecture that separates concerns for maintainability and scalability.
- ServiceManager: Manages all background hardware interactions, I/O, and threading. This includes the microphone input thread, the VAD processor, the persistent audio output stream, and dedicated thread pools for STT and TTS tasks.
- PipelineRunner: Orchestrates a single, complete conversational "turn" from user audio to agent audio response. It manages the flow of data through the STT, LLM, and TTS models.
- ConversationManager: The main state machine of the application. It listens for user speech from the
ServiceManagerand decides when to initiate a new turn with thePipelineRunner, when to handle a barge-in, and when to return to an idle listening state. - Composition Root: The
examples/run_agent.pyscript acts as the composition root, where all components are instantiated with their dependencies and the application is started.
Getting Started
Prerequisites
- Python 3.10+
- ffmpeg: Required by
mlx-whisperfor audio processing.# On macOS with Homebrew brew install ffmpeg
- PortAudio: Required by
PyAudioandsounddevicefor microphone and speaker access.# On macOS with Homebrew brew install portaudio
Installation
- Clone the repository and navigate to the root directory.
- Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate
- Install the library in "editable" mode, which also installs all dependencies from
pyproject.toml:pip install -e .
Model Setup
The agent requires several pre-trained models to function. You must download them and place them in a models/ directory at the root of the project.
project_root/
├── models/
│ ├── whisper-large-v3-turbo/ # MLX Whisper model
│ ├── Qwen3-4B-Instruct-2507-Q4_K_M.gguf # GGUF format LLM
│ └── Kokoro/ # MLX Kokoro TTS model
└── ...
Running the Agent
Once the environment is set up and the models are in place, you can run the agent with a single command from the project root:
python examples/run_agent.py
Project Structure
The library uses a modern src layout to cleanly separate library code from tests and examples.
src/
└── voiceChain/
├── __init__.py
├── audio/ # Hardware I/O: Player, Recorder, VAD
│ ├── player.py
│ ├── recorder.py
│ └── vad.py
├── core/ # Core AI Engines: LLM, STT, TTS
│ ├── llm.py
│ ├── stt.py
│ └── tts.py
├── pipeline/ # Orchestration and State Management
│ ├── manager.py
│ ├── runner.py
│ └── services.py
└── utils/ # Shared utilities (Logging, State Enums)
├── logging.py
└── state.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicechain-0.1.1.tar.gz.
File metadata
- Download URL: voicechain-0.1.1.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80670c28125494b6f6d406ea30ae46487fc4a0104b1dcd6d1db66eeb6960b58e
|
|
| MD5 |
d10d459ab0cd8cbbfceedef7d3d24980
|
|
| BLAKE2b-256 |
c8d81f635a5dd13898e669df876e59ab23dee90e04bdbcd68ee9a895be9a78f1
|
File details
Details for the file voicechain-0.1.1-py3-none-any.whl.
File metadata
- Download URL: voicechain-0.1.1-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a87270c0615aa2d704fef5236b2e5c1f140b88acc40c065ac3097abe84a7331f
|
|
| MD5 |
6e7b124df234b0c45188737ade8f7735
|
|
| BLAKE2b-256 |
ac40bffa73996fe5307392666394ce1864c31ceaeaf4f7d09b3a49185f9f3d1a
|