Skip to main content

Sub-second local voice AI for robots and edge devices.

Project description

EdgeVox

Sub-second local voice AI for robots and edge devices.

No cloud APIs. No internet after setup. Fully private. Powered by Gemma 4.

    ______    __         _    __
   / ____/___/ /___ ____| |  / /___  _  __
  / __/ / __  / __ `/ _ \ | / / __ \| |/_/
 / /___/ /_/ / /_/ /  __/ |/ / /_/ />  <
/_____/\__,_/\__, /\___/|___/\____/_/|_|
            /____/

Stack: Silero VAD -> faster-whisper (STT) -> Gemma 4 E2B IT via llama.cpp (LLM) -> Kokoro 82M (TTS)

Tested latency: 0.80s end-to-end on RTX 3080 (STT 0.40s + LLM 0.33s + TTS 0.08s)

Features

  • Streaming pipeline — speaks first sentence while LLM generates the rest
  • Interrupt support — speak while bot is talking to cut it off
  • Wake word detection — "Hey Jarvis" / "Lily" (optional, via OpenWakeWord)
  • Beautiful TUI — ASCII logo, sparkline waveform, latency history, GPU/RAM monitor, model info panel
  • ROS2 bridge — publishes STT/TTS/state to ROS2 topics for robotics integration
  • Slash commands/reset, /lang, /voice, /say, /mictest, /model in the TUI
  • Chat export — Ctrl+S to save conversation as markdown
  • 15 languages — English, Vietnamese, French, Spanish, Hindi, Italian, Portuguese, Japanese, Chinese, Korean, German, Thai, Russian, Arabic, Indonesian
  • Auto-detects hardware — GPU layers, model size, STT model

Hardware Requirements

Device RAM GPU Expected Latency
PC (i9 + RTX 3080 16GB) 64GB CUDA ~0.8s
Jetson Orin Nano 8GB CUDA ~1.5-2s
MacBook Air M1 8GB Metal ~2-3s
Any modern laptop 16GB+ CPU only ~2-4s

Quick Start

# 1. Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Create virtual environment
uv venv --python 3.12
source .venv/bin/activate

# 3. Install llama-cpp-python with CUDA (prebuilt wheels)
uv pip install llama-cpp-python \
    --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124

# For Apple Silicon (Metal):
# CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python

# For CPU only:
# uv pip install llama-cpp-python

# 4. Install EdgeVox
uv pip install -e .

# 5. Download all models (~3GB total)
edgevox-setup

# 6. Run!
edgevox

Usage

# TUI mode (default, recommended)
edgevox

# With wake word
edgevox --wakeword "hey jarvis"

# With ROS2 bridge (for robotics)
edgevox --ros2

# CLI mode (simpler, no TUI)
edgevox-cli

# Text mode (no microphone)
edgevox-cli --text-mode

# Custom options
edgevox \
    --whisper-model large-v3-turbo \
    --voice am_adam \
    --language en

TUI Controls

Key Action
Q Quit
R Reset conversation
M Mute/Unmute mic
/ Open command input
Ctrl+S Export chat to markdown

Slash Commands

Command Action
/reset Reset conversation
/lang XX Switch language (en, vi, fr, ko, ...)
/langs List all supported languages
/say TEXT TTS preview — speak text directly
/mictest Record 3s + playback to test audio
/model SIZE Switch Whisper model (small/medium/large-v3-turbo)
/voice XX Switch TTS voice
/voices List available voices
/export Export chat to markdown
/mute Mute microphone
/unmute Unmute microphone
/help Show all commands

ROS2 Integration

EdgeVox can publish voice pipeline events to ROS2 topics, making it easy to add voice interaction to any robot.

# Install with ROS2 support
uv pip install -e ".[ros2]"

# Run with ROS2 bridge
edgevox --ros2

Published Topics

Topic Type Description
/edgevox/transcription std_msgs/String User's speech (STT output)
/edgevox/response std_msgs/String Bot's response text
/edgevox/state std_msgs/String Pipeline state (listening, thinking, speaking)
/edgevox/audio_level std_msgs/Float32 Mic level (0.0-1.0)
/edgevox/metrics std_msgs/String JSON latency metrics

Subscribed Topics

Topic Type Description
/edgevox/tts_request std_msgs/String Send text for the bot to speak
/edgevox/command std_msgs/String Commands: reset, mute, unmute

Example: Robot Integration

import rclpy
from std_msgs.msg import String

# Listen to what the user says
node.create_subscription(String, '/edgevox/transcription', on_user_speech, 10)

# Make the robot say something
pub = node.create_publisher(String, '/edgevox/tts_request', 10)
msg = String()
msg.data = "I detected an obstacle ahead."
pub.publish(msg)

Architecture

                        EdgeVox Pipeline
 +-----------+     +------------+     +----------------+
 | Microphone|---->| Silero VAD |---->| faster-whisper  |
 |           |     | (32ms)     |     | (STT)          |
 +-----------+     +------------+     +--------+-------+
                                               |
                                               v
                                      +----------------+
                                      | Gemma 4 E2B IT |
                                      | (streaming)    |
                                      +--------+-------+
                                               | sentence by sentence
                                               v
 +-----------+     +------------+     +----------------+
 |  Speaker  |<----| Kokoro 82M |<----| Sentence       |
 |           |     | (TTS)      |     | Splitter       |
 +-----------+     +------------+     +----------------+
                         |
                         v (optional)
                   +------------+
                   | ROS2 Bridge|----> /edgevox/* topics
                   +------------+

Model Sizes

Component Model Size RAM
VAD Silero VAD v6 ~2MB ~10MB
STT whisper-small 500MB ~600MB
STT whisper-large-v3-turbo 1.5GB ~2GB
LLM Gemma 4 E2B IT Q4_K_M 1.8GB ~2.5GB
TTS Kokoro 82M 200MB ~300MB
Wake OpenWakeWord ~2MB ~10MB

M1 Air (8GB): whisper-small + Q4_K_M = 3.4GB PC with GPU: whisper-large-v3-turbo + Q4_K_M = 5.8GB

Documentation

Full docs: EdgeVox Docs (built with VitePress)

cd website && npm run dev

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgevox-0.1.2.tar.gz (128.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edgevox-0.1.2-py3-none-any.whl (136.5 kB view details)

Uploaded Python 3

File details

Details for the file edgevox-0.1.2.tar.gz.

File metadata

  • Download URL: edgevox-0.1.2.tar.gz
  • Upload date:
  • Size: 128.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgevox-0.1.2.tar.gz
Algorithm Hash digest
SHA256 aa0bc56d539e38030820faa9bbedb5401f372c191b40414616c3ebcbfae6efa6
MD5 8f4ba7f2f811d352fc145f308e26340f
BLAKE2b-256 b5a25ffa00321ff01d1ae8c708c57670c9c2c816e430a45e386d8bba837bf9c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for edgevox-0.1.2.tar.gz:

Publisher: release.yml on vietanhdev/edgevox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file edgevox-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: edgevox-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 136.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for edgevox-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6ad72590bff8e6998d18b063ba2b4b83204cbebf1d1828f57b2af6d1bc731d13
MD5 63f03e941c17ceb665f4cab40917963e
BLAKE2b-256 387075e866116c349dac1403b16ec035c0d3805779ad0b9b616ba5b1322013f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for edgevox-0.1.2-py3-none-any.whl:

Publisher: release.yml on vietanhdev/edgevox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page