A modular, voice conversation pipeline using MLX.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- MacOS
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

voiceChain: A Local-First, Streaming, and Interruptible Voice Agent Framework

Introduction

voiceChain is a high-performance, modular Python library for building real-time, voice-based conversational AI agents. Its core mission is to provide a complete, local-first framework that runs entirely offline on consumer hardware, with a special focus on leveraging the full power of Apple Silicon via the MLX and llama.cpp Metal backends. The novel architecture is designed from the ground up for low-latency streaming and full user interruptibility, creating a truly responsive conversational experience.

Core Features

End-to-End Local Pipeline: All components—including Speech-to-Text (Whisper), Large Language Model (Llama/Qwen), and Text-to-Speech (Kokoro)—run 100% offline, ensuring privacy and independence from cloud services.
High-Performance on Apple Silicon: Leverages Apple's Metal framework via MLX and llama-cpp-python for massive GPU acceleration on all AI model inferences.
Low-Latency Streaming: A fully asynchronous design using asyncio allows for the parallel processing of STT, LLM inference, and TTS synthesis. This minimizes "dead air" and begins audio playback as soon as the first sentence is generated.
Hands-Free Operation: A sophisticated two-stage Voice Activity Detection (VAD) system (WebRTCVAD + Silero VAD) provides robust, continuous listening for a "wake-word free" experience.
Full Interruptibility (Barge-In): Users can interrupt the agent at any point during its response. The pipeline gracefully cancels the in-flight generation and playback, immediately processing the user's new command.
Software-Based Echo Cancellation: A pragmatic, text-based echo detection algorithm prevents the agent from accidentally transcribing and responding to its own speech, a common issue in full-duplex systems.

Architecture Overview

voiceChain is built on a clean, decoupled architecture that separates concerns for maintainability and scalability.

ServiceManager: Manages all background hardware interactions, I/O, and threading. This includes the microphone input thread, the VAD processor, the persistent audio output stream, and dedicated thread pools for STT and TTS tasks.
PipelineRunner: Orchestrates a single, complete conversational "turn" from user audio to agent audio response. It manages the flow of data through the STT, LLM, and TTS models.
ConversationManager: The main state machine of the application. It listens for user speech from the ServiceManager and decides when to initiate a new turn with the PipelineRunner, when to handle a barge-in, and when to return to an idle listening state.
Composition Root: The examples/run_agent.py script acts as the composition root, where all components are instantiated with their dependencies and the application is started.

Getting Started

Prerequisites

Python 3.10+
ffmpeg: Required by mlx-whisper for audio processing.
```
# On macOS with Homebrew
brew install ffmpeg
```
PortAudio: Required by PyAudio and sounddevice for microphone and speaker access.
```
# On macOS with Homebrew
brew install portaudio
```

Installation

Clone the repository and navigate to the root directory.

Create and activate a Python virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the library in "editable" mode, which also installs all dependencies from pyproject.toml:
```
pip install -e .
```

Model Setup

The agent requires several pre-trained models to function. You must download them and place them in a models/ directory at the root of the project.

project_root/
├── models/
│   ├── whisper-large-v3-turbo/      # MLX Whisper model
│   ├── Qwen3-4B-Instruct-2507-Q4_K_M.gguf  # GGUF format LLM
│   └── Kokoro/                    # MLX Kokoro TTS model
└── ...

Running the Agent

Once the environment is set up and the models are in place, you can run the agent with a single command from the project root:

python examples/run_agent.py

Project Structure

The library uses a modern src layout to cleanly separate library code from tests and examples.

src/
└── voiceChain/
    ├── __init__.py
    ├── audio/          # Hardware I/O: Player, Recorder, VAD
    │   ├── player.py
    │   ├── recorder.py
    │   └── vad.py
    ├── core/           # Core AI Engines: LLM, STT, TTS
    │   ├── llm.py
    │   ├── stt.py
    │   └── tts.py
    ├── pipeline/       # Orchestration and State Management
    │   ├── manager.py
    │   ├── runner.py
    │   └── services.py
    └── utils/          # Shared utilities (Logging, State Enums)
        ├── logging.py
        └── state.py

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- MacOS
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.1.1

Aug 30, 2025

0.1.0

Aug 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicechain-0.1.1.tar.gz (19.3 kB view details)

Uploaded Aug 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voicechain-0.1.1-py3-none-any.whl (21.2 kB view details)

Uploaded Aug 30, 2025 Python 3

File details

Details for the file voicechain-0.1.1.tar.gz.

File metadata

Download URL: voicechain-0.1.1.tar.gz
Upload date: Aug 30, 2025
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for voicechain-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`80670c28125494b6f6d406ea30ae46487fc4a0104b1dcd6d1db66eeb6960b58e`
MD5	`d10d459ab0cd8cbbfceedef7d3d24980`
BLAKE2b-256	`c8d81f635a5dd13898e669df876e59ab23dee90e04bdbcd68ee9a895be9a78f1`

See more details on using hashes here.

File details

Details for the file voicechain-0.1.1-py3-none-any.whl.

File metadata

Download URL: voicechain-0.1.1-py3-none-any.whl
Upload date: Aug 30, 2025
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for voicechain-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a87270c0615aa2d704fef5236b2e5c1f140b88acc40c065ac3097abe84a7331f`
MD5	`6e7b124df234b0c45188737ade8f7735`
BLAKE2b-256	`ac40bffa73996fe5307392666394ce1864c31ceaeaf4f7d09b3a49185f9f3d1a`

See more details on using hashes here.

voiceChain 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

voiceChain: A Local-First, Streaming, and Interruptible Voice Agent Framework

Introduction

Core Features

Architecture Overview

Getting Started

Prerequisites

Installation

Model Setup

Running the Agent

Project Structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes