Sub-second local voice AI for robots and edge devices.
Project description
EdgeVox
Sub-second local voice AI for robots and edge devices.
No cloud APIs. No internet after setup. Fully private. Powered by Gemma 4.
______ __ _ __
/ ____/___/ /___ ____| | / /___ _ __
/ __/ / __ / __ `/ _ \ | / / __ \| |/_/
/ /___/ /_/ / /_/ / __/ |/ / /_/ /> <
/_____/\__,_/\__, /\___/|___/\____/_/|_|
/____/
Stack: Silero VAD -> faster-whisper (STT) -> Gemma 4 E2B IT via llama.cpp (LLM) -> Kokoro 82M (TTS)
Tested latency: 0.80s end-to-end on RTX 3080 (STT 0.40s + LLM 0.33s + TTS 0.08s)
Features
- Streaming pipeline — speaks first sentence while LLM generates the rest
- Interrupt support — speak while bot is talking to cut it off
- Wake word detection — "Hey Jarvis" / "Lily" (optional, via OpenWakeWord)
- Beautiful TUI — ASCII logo, sparkline waveform, latency history, GPU/RAM monitor, model info panel
- ROS2 bridge — publishes STT/TTS/state to ROS2 topics for robotics integration
- Slash commands —
/reset,/lang,/voice,/say,/mictest,/modelin the TUI - Chat export — Ctrl+S to save conversation as markdown
- 15 languages — English, Vietnamese, French, Spanish, Hindi, Italian, Portuguese, Japanese, Chinese, Korean, German, Thai, Russian, Arabic, Indonesian
- Auto-detects hardware — GPU layers, model size, STT model
Hardware Requirements
| Device | RAM | GPU | Expected Latency |
|---|---|---|---|
| PC (i9 + RTX 3080 16GB) | 64GB | CUDA | ~0.8s |
| Jetson Orin Nano | 8GB | CUDA | ~1.5-2s |
| MacBook Air M1 | 8GB | Metal | ~2-3s |
| Any modern laptop | 16GB+ | CPU only | ~2-4s |
Quick Start
# 1. Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Create virtual environment
uv venv --python 3.12
source .venv/bin/activate
# 3. Install llama-cpp-python with CUDA (prebuilt wheels)
uv pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
# For Apple Silicon (Metal):
# CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python
# For CPU only:
# uv pip install llama-cpp-python
# 4. Install EdgeVox
uv pip install -e .
# 5. Download all models (~3GB total)
edgevox-setup
# 6. Run!
edgevox
Usage
# TUI mode (default, recommended)
edgevox
# With wake word
edgevox --wakeword "hey jarvis"
# With ROS2 bridge (for robotics)
edgevox --ros2
# CLI mode (simpler, no TUI)
edgevox-cli
# Text mode (no microphone)
edgevox-cli --text-mode
# Custom options
edgevox \
--whisper-model large-v3-turbo \
--voice am_adam \
--language en
TUI Controls
| Key | Action |
|---|---|
Q |
Quit |
R |
Reset conversation |
M |
Mute/Unmute mic |
/ |
Open command input |
Ctrl+S |
Export chat to markdown |
Slash Commands
| Command | Action |
|---|---|
/reset |
Reset conversation |
/lang XX |
Switch language (en, vi, fr, ko, ...) |
/langs |
List all supported languages |
/say TEXT |
TTS preview — speak text directly |
/mictest |
Record 3s + playback to test audio |
/model SIZE |
Switch Whisper model (small/medium/large-v3-turbo) |
/voice XX |
Switch TTS voice |
/voices |
List available voices |
/export |
Export chat to markdown |
/mute |
Mute microphone |
/unmute |
Unmute microphone |
/help |
Show all commands |
ROS2 Integration
EdgeVox can publish voice pipeline events to ROS2 topics, making it easy to add voice interaction to any robot.
# Install with ROS2 support
uv pip install -e ".[ros2]"
# Run with ROS2 bridge
edgevox --ros2
Published Topics
| Topic | Type | Description |
|---|---|---|
/edgevox/transcription |
std_msgs/String |
User's speech (STT output) |
/edgevox/response |
std_msgs/String |
Bot's response text |
/edgevox/state |
std_msgs/String |
Pipeline state (listening, thinking, speaking) |
/edgevox/audio_level |
std_msgs/Float32 |
Mic level (0.0-1.0) |
/edgevox/metrics |
std_msgs/String |
JSON latency metrics |
Subscribed Topics
| Topic | Type | Description |
|---|---|---|
/edgevox/tts_request |
std_msgs/String |
Send text for the bot to speak |
/edgevox/command |
std_msgs/String |
Commands: reset, mute, unmute |
Example: Robot Integration
import rclpy
from std_msgs.msg import String
# Listen to what the user says
node.create_subscription(String, '/edgevox/transcription', on_user_speech, 10)
# Make the robot say something
pub = node.create_publisher(String, '/edgevox/tts_request', 10)
msg = String()
msg.data = "I detected an obstacle ahead."
pub.publish(msg)
Architecture
EdgeVox Pipeline
+-----------+ +------------+ +----------------+
| Microphone|---->| Silero VAD |---->| faster-whisper |
| | | (32ms) | | (STT) |
+-----------+ +------------+ +--------+-------+
|
v
+----------------+
| Gemma 4 E2B IT |
| (streaming) |
+--------+-------+
| sentence by sentence
v
+-----------+ +------------+ +----------------+
| Speaker |<----| Kokoro 82M |<----| Sentence |
| | | (TTS) | | Splitter |
+-----------+ +------------+ +----------------+
|
v (optional)
+------------+
| ROS2 Bridge|----> /edgevox/* topics
+------------+
Model Sizes
| Component | Model | Size | RAM |
|---|---|---|---|
| VAD | Silero VAD v6 | ~2MB | ~10MB |
| STT | whisper-small | 500MB | ~600MB |
| STT | whisper-large-v3-turbo | 1.5GB | ~2GB |
| LLM | Gemma 4 E2B IT Q4_K_M | 1.8GB | ~2.5GB |
| TTS | Kokoro 82M | 200MB | ~300MB |
| Wake | OpenWakeWord | ~2MB | ~10MB |
M1 Air (8GB): whisper-small + Q4_K_M = 3.4GB PC with GPU: whisper-large-v3-turbo + Q4_K_M = 5.8GB
Documentation
Full docs: EdgeVox Docs (built with VitePress)
cd website && npm run dev
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgevox-0.1.1.tar.gz.
File metadata
- Download URL: edgevox-0.1.1.tar.gz
- Upload date:
- Size: 128.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17451341411fd27447f0f9454404032912e994342eb28915e9b8863e64611715
|
|
| MD5 |
c6c1684db153bf4fe236df83c8d10823
|
|
| BLAKE2b-256 |
27653d6ea8955073bce48034073553958645c9174422f07c3cecf5f5d438045a
|
File details
Details for the file edgevox-0.1.1-py3-none-any.whl.
File metadata
- Download URL: edgevox-0.1.1-py3-none-any.whl
- Upload date:
- Size: 136.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4f9b3345da4a2a80b25c98951ea5abebc69040eb0b3dc2516f602c3dff155f1
|
|
| MD5 |
1504f282ccffd8c5d99244e94dc74c17
|
|
| BLAKE2b-256 |
19d9d56156642c8ff762b37cf464c3a3f18773a2cbd864ac38552ed6238530f5
|