Skip to main content

Sub-second local voice AI for robots and edge devices.

Project description

EdgeVox

Sub-second local voice AI for robots and edge devices.

No cloud APIs. No internet after setup. Fully private. Powered by Gemma 4.

    ______    __         _    __
   / ____/___/ /___ ____| |  / /___  _  __
  / __/ / __  / __ `/ _ \ | / / __ \| |/_/
 / /___/ /_/ / /_/ /  __/ |/ / /_/ />  <
/_____/\__,_/\__, /\___/|___/\____/_/|_|
            /____/

Stack: Silero VAD -> faster-whisper (STT) -> Gemma 4 E2B IT via llama.cpp (LLM) -> Kokoro 82M (TTS)

Tested latency: 0.80s end-to-end on RTX 3080 (STT 0.40s + LLM 0.33s + TTS 0.08s)

Features

  • Streaming pipeline — speaks first sentence while LLM generates the rest
  • Interrupt support — speak while bot is talking to cut it off
  • Wake word detection — "Hey Jarvis" / "Lily" (optional, via OpenWakeWord)
  • Beautiful TUI — ASCII logo, sparkline waveform, latency history, GPU/RAM monitor, model info panel
  • ROS2 bridge — publishes STT/TTS/state to ROS2 topics for robotics integration
  • Slash commands/reset, /lang, /voice, /say, /mictest, /model in the TUI
  • Chat export — Ctrl+S to save conversation as markdown
  • 15 languages — English, Vietnamese, French, Spanish, Hindi, Italian, Portuguese, Japanese, Chinese, Korean, German, Thai, Russian, Arabic, Indonesian
  • Auto-detects hardware — GPU layers, model size, STT model

Hardware Requirements

Device RAM GPU Expected Latency
PC (i9 + RTX 3080 16GB) 64GB CUDA ~0.8s
Jetson Orin Nano 8GB CUDA ~1.5-2s
MacBook Air M1 8GB Metal ~2-3s
Any modern laptop 16GB+ CPU only ~2-4s

Quick Start

# 1. Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Create virtual environment
uv venv --python 3.12
source .venv/bin/activate

# 3. Install llama-cpp-python with CUDA (prebuilt wheels)
uv pip install llama-cpp-python \
    --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124

# For Apple Silicon (Metal):
# CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python

# For CPU only:
# uv pip install llama-cpp-python

# 4. Install EdgeVox
uv pip install -e .

# 5. Download all models (~3GB total)
edgevox-setup

# 6. Run!
edgevox

Usage

# TUI mode (default, recommended)
edgevox

# With wake word
edgevox --wakeword "hey jarvis"

# With ROS2 bridge (for robotics)
edgevox --ros2

# CLI mode (simpler, no TUI)
edgevox-cli

# Text mode (no microphone)
edgevox-cli --text-mode

# Custom options
edgevox \
    --whisper-model large-v3-turbo \
    --voice am_adam \
    --language en

TUI Controls

Key Action
Q Quit
R Reset conversation
M Mute/Unmute mic
/ Open command input
Ctrl+S Export chat to markdown

Slash Commands

Command Action
/reset Reset conversation
/lang XX Switch language (en, vi, fr, ko, ...)
/langs List all supported languages
/say TEXT TTS preview — speak text directly
/mictest Record 3s + playback to test audio
/model SIZE Switch Whisper model (small/medium/large-v3-turbo)
/voice XX Switch TTS voice
/voices List available voices
/export Export chat to markdown
/mute Mute microphone
/unmute Unmute microphone
/help Show all commands

ROS2 Integration

EdgeVox can publish voice pipeline events to ROS2 topics, making it easy to add voice interaction to any robot.

# Install with ROS2 support
uv pip install -e ".[ros2]"

# Run with ROS2 bridge
edgevox --ros2

Published Topics

Topic Type Description
/edgevox/transcription std_msgs/String User's speech (STT output)
/edgevox/response std_msgs/String Bot's response text
/edgevox/state std_msgs/String Pipeline state (listening, thinking, speaking)
/edgevox/audio_level std_msgs/Float32 Mic level (0.0-1.0)
/edgevox/metrics std_msgs/String JSON latency metrics

Subscribed Topics

Topic Type Description
/edgevox/tts_request std_msgs/String Send text for the bot to speak
/edgevox/command std_msgs/String Commands: reset, mute, unmute

Example: Robot Integration

import rclpy
from std_msgs.msg import String

# Listen to what the user says
node.create_subscription(String, '/edgevox/transcription', on_user_speech, 10)

# Make the robot say something
pub = node.create_publisher(String, '/edgevox/tts_request', 10)
msg = String()
msg.data = "I detected an obstacle ahead."
pub.publish(msg)

Architecture

                        EdgeVox Pipeline
 +-----------+     +------------+     +----------------+
 | Microphone|---->| Silero VAD |---->| faster-whisper  |
 |           |     | (32ms)     |     | (STT)          |
 +-----------+     +------------+     +--------+-------+
                                               |
                                               v
                                      +----------------+
                                      | Gemma 4 E2B IT |
                                      | (streaming)    |
                                      +--------+-------+
                                               | sentence by sentence
                                               v
 +-----------+     +------------+     +----------------+
 |  Speaker  |<----| Kokoro 82M |<----| Sentence       |
 |           |     | (TTS)      |     | Splitter       |
 +-----------+     +------------+     +----------------+
                         |
                         v (optional)
                   +------------+
                   | ROS2 Bridge|----> /edgevox/* topics
                   +------------+

Model Sizes

Component Model Size RAM
VAD Silero VAD v6 ~2MB ~10MB
STT whisper-small 500MB ~600MB
STT whisper-large-v3-turbo 1.5GB ~2GB
LLM Gemma 4 E2B IT Q4_K_M 1.8GB ~2.5GB
TTS Kokoro 82M 200MB ~300MB
Wake OpenWakeWord ~2MB ~10MB

M1 Air (8GB): whisper-small + Q4_K_M = 3.4GB PC with GPU: whisper-large-v3-turbo + Q4_K_M = 5.8GB

Documentation

Full docs: EdgeVox Docs (built with VitePress)

cd website && npm run dev

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgevox-0.1.1.tar.gz (128.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edgevox-0.1.1-py3-none-any.whl (136.4 kB view details)

Uploaded Python 3

File details

Details for the file edgevox-0.1.1.tar.gz.

File metadata

  • Download URL: edgevox-0.1.1.tar.gz
  • Upload date:
  • Size: 128.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for edgevox-0.1.1.tar.gz
Algorithm Hash digest
SHA256 17451341411fd27447f0f9454404032912e994342eb28915e9b8863e64611715
MD5 c6c1684db153bf4fe236df83c8d10823
BLAKE2b-256 27653d6ea8955073bce48034073553958645c9174422f07c3cecf5f5d438045a

See more details on using hashes here.

File details

Details for the file edgevox-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: edgevox-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 136.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for edgevox-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f9b3345da4a2a80b25c98951ea5abebc69040eb0b3dc2516f602c3dff155f1
MD5 1504f282ccffd8c5d99244e94dc74c17
BLAKE2b-256 19d9d56156642c8ff762b37cf464c3a3f18773a2cbd864ac38552ed6238530f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page