Skip to main content

Pipecat for the edge — edge-native, local-first real-time voice conversation library

Project description

voxedge

English | 中文

voxedge banner

PyPI License Python

Native TensorRT · RKNN · sherpa-onnx voice pipelines for Jetson, Rockchip, and Raspberry Pi — fully on-device, verified on real hardware, zero cloud.

What is voxedge?

voxedge is an embeddable Python library that drives real-time, on-device voice conversations by calling directly into each platform's native inference runtime — TensorRT on Jetson Orin, RKNN on RK3576/RK3588, sherpa-onnx on CPU. No cloud STT/TTS APIs, no internet at runtime, no intermediate abstraction overhead. The same ConversationEngine API works across all three backends; you swap only the backend constructor — N=2 concurrent sessions verified on Orin Nano 8 GB, byte-identical output, zero CUDA errors.

voxedge is the open-core engine behind OpenVoiceStream — the deployable FastAPI/WebSocket server, device profiles, and agent gallery. Want a container? Start there. Want to embed real-time edge voice in your own app? You're in the right place.

Key Features

  • Native runtimes, full performance — calls directly into TensorRT (Jetson), RKNN (Rockchip), and sherpa-onnx (CPU); no wrapper overhead, no cross-platform abstraction tax
  • Fully on-device — no speech API key, no per-call bill, no internet dependency at runtime
  • Verified on real hardware — N=2 concurrent sessions on Orin Nano 8 GB: byte-identical output vs. single-stream, zero CUDA errors
  • Streaming + barge-in — partial + final ASR while the user speaks; sentence-level TTS streaming with first-audio latency low enough for live dialogue and cooperative barge-in
  • Swap hardware, not code — same ConversationEngine API across Jetson, Rockchip, and sherpa-onnx CPU; only the backend constructor changes
  • Test on any machine — mock backends require only numpy; the whole engine runs end-to-end on a Mac with no CUDA or GPU

Quickstart

Runs on any machine — no GPU needed. Swap the backend constructors for a real device; the engine, transport, and event contract never change.

pip install voxedge
import asyncio
from voxedge.engine import ConversationEngine
from voxedge.transport import InProcessTransport
from voxedge.backends.mock import MockASR, MockTTS, MockVAD

engine = ConversationEngine(
    backends={"asr": MockASR(transcript="hello world"), "tts": MockTTS(), "vad": MockVAD()},
    multi_utterance=True,
)

async def main():
    t = InProcessTransport()
    await t.feed_audio(b"\x01\x02" * 8000)   # speech frames (int16 PCM)
    await t.feed_audio(b"\x00\x00" * 8000)   # silence → VAD endpoints the utterance
    t.end_input()
    await engine.run(t)                       # drives ASR → (LLM) → TTS
    for ev in t.drain_events_nowait():        # asr_final / tts_* / ...
        print(ev["type"], ev.get("text", ""))

asyncio.run(main())

On a real device, swap only the backend constructors — everything else is identical:

# Jetson Orin — pip install voxedge[jetson]
from voxedge.backends.jetson import (
    TRTEdgeLLMASRBackend, TRTEdgeLLMASRConfig,
    TRTEdgeLLMTTSBackend, TRTEdgeLLMTTSConfig,
)

engine = ConversationEngine(backends={
    "asr": TRTEdgeLLMASRBackend(TRTEdgeLLMASRConfig(...)),   # Qwen3-ASR, native TRT
    "tts": TRTEdgeLLMTTSBackend(TRTEdgeLLMTTSConfig(...)),   # Qwen3-TTS, streaming
}, multi_utterance=True)

import voxedge is numpy-only — TensorRT, RKNN, and sherpa-onnx are lazy-imported by their backend adapters and pulled in via extras. The example above imports cleanly on a Mac even though the TRT engine only runs on a Jetson.

Install

pip install voxedge            # pure-Python core (numpy only)
pip install voxedge[sherpa]    # sherpa-onnx CPU ASR/TTS
pip install voxedge[jetson]    # Jetson TensorRT backends (aarch64)
pip install voxedge[rk]        # Rockchip RK3576/RK3588 NPU (aarch64)
pip install voxedge[llm]       # OpenAI-compatible LLM backend (httpx)

The jetson / rk extras declare only pure-Python deps; the CUDA/TensorRT and RKNN runtime wheels ship from the platform (JetPack L4T / Rockchip NPU userspace) or the engine repos — you bring the platform runtime.

Architecture

Four layers, all importable without CUDA.

Backends (voxedge/backends/)

Clean ABCs in backends/base.py — every constructor takes explicit params only, no env coupling:

  • ASRBackend / ASRStream — streaming recognition
  • TTSBackendsynthesize() (batch) + generate_streaming() (sentence-level chunks, cooperative cancel via cancel_token for barge-in)
  • VADBackend / VADSession — voice-activity detection for speech / barge-in segmentation
  • LLMBackend / LLMEvent — token-streaming LLM for the conversation loop

Concrete adapters live under backends/{jetson,rk,sherpa}/ and import their heavy runtimes lazily (inside methods), so all modules import on any machine:

Backend Platform Models Extra Source engine
backends/jetson/ Jetson Orin (TensorRT) Qwen3-ASR/TTS, Matcha, Kokoro, Paraformer, SenseVoice, MOSS-TTS-Nano voxedge[jetson] aarch64 jetson-voice-engine
backends/rk/ Rockchip RK3576/RK3588 (RKNN) Qwen3-ASR, Matcha, Piper, Kokoro, Paraformer, SenseVoice voxedge[rk] aarch64 rkvoice-stream
backends/sherpa/ CPU (any arch) Paraformer, Zipformer, SenseVoice, Matcha, Kokoro ONNX voxedge[sherpa]
backends/llm/ Any OpenAI-compatible LLM over httpx voxedge[llm]
backends/mock.py Dev / CI MockASR, MockTTS, MockVAD, MockLLM core

Transport (voxedge/transport/)

Transport ABC + two implementations:

  • InProcessTransport — zero-IPC asyncio queues; default, used everywhere in tests
  • WebSocketTransport — duck-typed ws adapter with no FastAPI dependency; idle-watchdog timeout injected by caller, reads no env

Conversation Engine (voxedge/engine/)

ConversationEngine + per-connection Session coordinator, split into focused collaborators: audio_dispatcher (VAD → speech / barge-in), asr_loop, client_events, tts_sequencer / tts_buffer, session_state, and the LLM↔tool loop — llm_turn over the provider-agnostic turn_driver.run_turn pump, with tool_registry (@tool → JSON schema) and coordinator / concurrency_capability for multi-stream concurrency.

Capabilities (voxedge/capabilities/)

Optional, default-off, stateless add-ons (punctuation, speaker embedding) via sherpa-onnx. Opt in explicitly; byte-level no-op when off.

Design Constraints

  • Pure Python coreimport voxedge is numpy-only. Heavy adapters live under backends/{jetson,rk,sherpa}/ with deferred runtime imports.
  • No env reads in the library — all config injected as explicit params. Profiles and deployment knobs are the product's job (OpenVoiceStream).

Status

In production — the open-core engine behind a shipped edge voice stack. ~270 mock-based tests; the whole engine runs end-to-end on a Mac with no CUDA.

Contributing

Issues and PRs welcome. The mock backend suite runs on any machine with no hardware:

pip install voxedge
uv run pytest

Ecosystem

voxedge is one layer in a family of repos:

Repo Role When to go there
voxedge (this repo) Embeddable Python engine Embedding real-time voice in your own app
openvoicestream Deployable FastAPI/WebSocket server, Docker profiles, agent gallery Deployed use-cases and end-to-end demos; ready-to-run containers
rkvoice-stream Rockchip NPU engine (backends/rk/ wraps this) RK3576/RK3588 model formats, RKNN perf numbers, TTS/ASR backend internals
jetson-voice-engine Jetson TensorRT build scripts, model export, artifacts (backends/jetson/ wraps this) Jetson model conversion, TRT engine build, Orin-specific optimisations

Acknowledgements

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxedge-0.0.2a0.tar.gz (205.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxedge-0.0.2a0-py3-none-any.whl (234.5 kB view details)

Uploaded Python 3

File details

Details for the file voxedge-0.0.2a0.tar.gz.

File metadata

  • Download URL: voxedge-0.0.2a0.tar.gz
  • Upload date:
  • Size: 205.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for voxedge-0.0.2a0.tar.gz
Algorithm Hash digest
SHA256 5e21f5419a560997c112c0141e3c05b50ad6f28a8db0c3cf7e139f277160fbe5
MD5 fd5f9bfd909b765018f25da2d5d173b9
BLAKE2b-256 04eeb2ed6312c12996072b3fbb9016e77009c84d1555042bd9e9d0fe1330dac7

See more details on using hashes here.

File details

Details for the file voxedge-0.0.2a0-py3-none-any.whl.

File metadata

  • Download URL: voxedge-0.0.2a0-py3-none-any.whl
  • Upload date:
  • Size: 234.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for voxedge-0.0.2a0-py3-none-any.whl
Algorithm Hash digest
SHA256 619c322d9479a6b5a365a5224c4a24fb9637edc42b06dfa2c984b43063c58fdd
MD5 929db38977e53f5bfea1c8e345fe3a4f
BLAKE2b-256 9044062a9e140cbc846cd71fc85e28cfe11286eb72adf4220ce2617c082778ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page