Voice runtime (STT + TTS) with OpenAI-compatible API
Project description
Macaw OpenVoice
Voice runtime (STT + TTS) with OpenAI-compatible API
Quick Start · Core Capabilities · Architecture · API Docs · Demo · Full Documentation
Production Voice Runtime Infrastructure Real-time Speech-to-Text and Text-to-Speech with OpenAI-compatible API, streaming session control, and extensible execution architecture.
Overview
Macaw OpenVoice is a production-grade runtime for voice systems.
It standardizes and operationalizes the execution of Speech-to-Text (STT) and Text-to-Speech (TTS) models in real environments by providing:
- a unified execution interface for multiple inference engines
- real-time audio streaming with controlled latency
- continuous session management
- bidirectional speech interaction
- operational observability
- production-ready APIs
Macaw acts as the infrastructure layer between voice models and production applications, abstracting complexity related to streaming, synchronization, state management, and execution control.
Technology Positioning
Macaw OpenVoice plays the same role for voice systems that:
- vLLM plays for LLM serving
- Triton Inference Server plays for GPU inference
- Ollama plays for local model execution
It transforms voice models into operational services.
Core Capabilities
Unified Interface
- OpenAI-compatible Audio API
- Real-time full-duplex WebSocket streaming
- Local runtime CLI
Bidirectional Speech Streaming
- simultaneous STT and TTS in the same session
- automatic speech detection
- barge-in support (interruptible speech)
- automatic mute during synthesis
Session Management
- state machine for continuous audio processing
- ring buffer with persistence
- crash recovery without context loss
- cross-segment coherence
Audio Processing Pipeline
- automatic resampling
- DC offset removal
- gain normalization
- voice activity detection
Multi-Engine Execution
- multiple STT and TTS engines
- subprocess isolation
- declarative model registry
- pluggable architecture
Operational Control
- priority-based scheduler
- dynamic batching
- latency tracking
- Prometheus metrics
Production Use Cases
Macaw is designed for real-world voice workloads:
- real-time conversational voice agents
- telephony automation (SIP / VoIP)
- live transcription systems
- embedded voice interfaces
- multimodal assistants
- interactive media streaming
- continuous audio processing pipelines
Quick Start
# Install
pip install macaw-openvoice[server,grpc,faster-whisper]
# Pull a model
macaw pull faster-whisper-tiny
# Start the runtime
macaw serve
$ macaw serve
╔══════════════════════════════════════════════╗
║ Macaw OpenVoice v1.0.0 ║
╚══════════════════════════════════════════════╝
INFO Scanning models in ~/.macaw/models
INFO Found 2 model(s): faster-whisper-tiny (STT), kokoro-v1 (TTS)
INFO Spawning STT worker faster-whisper-tiny port=50051 engine=faster-whisper
INFO Spawning TTS worker kokoro-v1 port=50052 engine=kokoro
INFO Scheduler started aging=30.0s batch_ms=75.0 batch_max=8
INFO Uvicorn running on http://127.0.0.1:8000
Transcribe a file
# Via REST API
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav \
-F model=faster-whisper-tiny
# Via CLI
macaw transcribe audio.wav --model faster-whisper-tiny
Streaming via WebSocket
wscat -c "ws://localhost:8000/v1/realtime?model=faster-whisper-tiny"
# Send binary audio frames, receive JSON transcript events
Text-to-Speech
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "kokoro-v1", "input": "Hello, how can I help you?", "voice": "default"}' \
--output speech.wav
Architecture
Clients
CLI / REST / WebSocket (full-duplex)
|
v
+----------------------------------------------------+
| API Server (FastAPI) |
| |
| POST /v1/audio/transcriptions (STT batch) |
| POST /v1/audio/translations (STT translate) |
| POST /v1/audio/speech (TTS) |
| WS /v1/realtime (STT+TTS) |
+----------------------------------------------------+
| Scheduler |
| Priority queue (realtime > batch), cancellation, |
| dynamic batching, latency tracking |
+----------------------------------------------------+
| Model Registry |
| Declarative manifest (macaw.yaml), lifecycle |
+----------+-------------------+---------------------+
| |
+--------+--------+ +------+-------+
| STT Workers | | TTS Workers |
| (subprocess | | (subprocess |
| gRPC) | | gRPC) |
| | | |
| Faster-Whisper | | Kokoro |
| WeNet | | |
+-----------------+ +--------------+
|
+----------+-------------------------------------+
| Audio Preprocessing Pipeline |
| Resample -> DC Remove -> Gain Normalize |
+------------------------------------------------+
| Session Manager (STT only) |
| 6 states, ring buffer, WAL, LocalAgreement, |
| cross-segment context, crash recovery |
+------------------------------------------------+
| VAD (Energy Pre-filter + Silero VAD) |
+------------------------------------------------+
| Post-Processing (ITN via NeMo) |
+------------------------------------------------+
Demo
Supported Models
| Engine | Type | Architecture | Partials | Hot Words | Status |
|---|---|---|---|---|---|
| Faster-Whisper | STT | encoder-decoder | LocalAgreement | via initial_prompt | Supported |
| WeNet | STT | CTC | native | native keyword boosting | Supported |
| Kokoro | TTS | neural | — | — | Supported |
Adding a new engine requires ~400-700 lines of code and zero changes to the runtime core. See the Adding an Engine guide.
API Compatibility
Macaw implements the OpenAI Audio API contract, so existing SDKs work without modification:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
# Transcription
result = client.audio.transcriptions.create(
model="faster-whisper-tiny",
file=open("audio.wav", "rb"),
)
print(result.text)
# Text-to-Speech
response = client.audio.speech.create(
model="kokoro-v1",
input="Hello, how can I help you?",
voice="default",
)
response.stream_to_file("output.wav")
WebSocket Protocol
The /v1/realtime endpoint supports full-duplex STT + TTS:
Client -> Server:
Binary frames PCM 16-bit audio (any sample rate)
session.configure Configure VAD, language, hot words, TTS model
tts.speak Trigger text-to-speech synthesis
tts.cancel Cancel active TTS
Server -> Client:
session.created Session established
vad.speech_start Speech detected
transcript.partial Intermediate hypothesis
transcript.final Confirmed segment (with ITN)
vad.speech_end Speech ended
tts.speaking_start TTS started (STT muted)
Binary frames TTS audio output
tts.speaking_end TTS finished (STT unmuted)
error Error with recoverable flag
CLI
macaw serve # Start API server
macaw transcribe audio.wav # Transcribe file
macaw transcribe audio.wav --format srt # Generate subtitles
macaw transcribe --stream # Stream from microphone
macaw translate audio.wav # Translate to English
macaw list # List installed models
macaw pull faster-whisper-tiny # Download a model
macaw inspect faster-whisper-tiny # Model details
Demo
An interactive demo with a React/Next.js frontend is included:
./demo/start.sh
This starts the FastAPI backend (port 9000) and the Next.js frontend (port 3000) together. The demo includes a dashboard for batch transcriptions, real-time streaming STT with VAD visualization, and a TTS playground. See demo/README.md for details.
Development
# Setup (requires Python 3.11+ and uv)
uv venv --python 3.12
uv sync --all-extras
# Development workflow
make check # format + lint + typecheck
make test-unit # unit tests (preferred during development)
make test # all tests (1686 passing)
make ci # full pipeline: format + lint + typecheck + test
Documentation
Full documentation is available at usemacaw.github.io/macaw-openvoice.
Contributing
We welcome contributions! Please read our Contributing Guide before submitting a pull request.
Contact
- Website: usemacaw.io
- Email: hello@usemacaw.io
- GitHub: github.com/usemacaw/macaw-openvoice
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file macaw_openvoice-0.1.6.tar.gz.
File metadata
- Download URL: macaw_openvoice-0.1.6.tar.gz
- Upload date:
- Size: 583.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71c5b039f8e69159a3fa5134d0a8003ec2563292d68f0d7eeb9663aa456d5ee4
|
|
| MD5 |
f7a04b5b4ffb48bd3714282ba0db3c70
|
|
| BLAKE2b-256 |
6a21c73dd368a94de8ff01e9a345e247bcce31ef08e51192c6bf650ee0cf1793
|
Provenance
The following attestation bundles were made for macaw_openvoice-0.1.6.tar.gz:
Publisher:
release.yml on usemacaw/macaw-openvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
macaw_openvoice-0.1.6.tar.gz -
Subject digest:
71c5b039f8e69159a3fa5134d0a8003ec2563292d68f0d7eeb9663aa456d5ee4 - Sigstore transparency entry: 946567628
- Sigstore integration time:
-
Permalink:
usemacaw/macaw-openvoice@2facb8a9915cad1c7391f102ac123d02646cc738 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/usemacaw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2facb8a9915cad1c7391f102ac123d02646cc738 -
Trigger Event:
push
-
Statement type:
File details
Details for the file macaw_openvoice-0.1.6-py3-none-any.whl.
File metadata
- Download URL: macaw_openvoice-0.1.6-py3-none-any.whl
- Upload date:
- Size: 168.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8636d2051fa51ebbc8808e87f449053d1ecc5311cdfe428d00b5489d4423b5a
|
|
| MD5 |
a8f7c232272c51ae2d23485e39d4e72a
|
|
| BLAKE2b-256 |
1d44105cbfdb7cae41a87955e6b00112834e5b346734fcbe36711ba3e043ce88
|
Provenance
The following attestation bundles were made for macaw_openvoice-0.1.6-py3-none-any.whl:
Publisher:
release.yml on usemacaw/macaw-openvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
macaw_openvoice-0.1.6-py3-none-any.whl -
Subject digest:
f8636d2051fa51ebbc8808e87f449053d1ecc5311cdfe428d00b5489d4423b5a - Sigstore transparency entry: 946567679
- Sigstore integration time:
-
Permalink:
usemacaw/macaw-openvoice@2facb8a9915cad1c7391f102ac123d02646cc738 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/usemacaw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2facb8a9915cad1c7391f102ac123d02646cc738 -
Trigger Event:
push
-
Statement type: