Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components
Project description
RoohAI
Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.
Teri awaaz sun kar, meri rooh ko sukoon milta hai.
Features
- Real-time voice — WebRTC and WebSocket transports with sub-second latency
- Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
- Hot-swap at runtime — Change models without restarting the server
- Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
- LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
- Barge-in — Interrupt the AI mid-sentence by speaking
- Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
- Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required
Quick Start
pip install roohai
Then start the server:
roohai
# Open http://localhost:8000
Use the web UI to create an agent, select your models, and start a conversation.
All STT, TTS, and LLM providers are included by default. For NVIDIA models, install the extra:
pip install "roohai[nvidia]"
Supported Models
Speech-to-Text
| Name | Provider | Notes |
|---|---|---|
deepgram |
Deepgram Nova | Cloud API, streaming support |
nvidia-parakeet |
NVIDIA | Local, high accuracy |
whisper-tiny |
HuggingFace | Local, fast, English-focused. Default |
whisper-base |
HuggingFace | Better accuracy, still lightweight |
whisper-small |
HuggingFace | Best local accuracy |
Text-to-Speech
| Name | Provider | Notes |
|---|---|---|
cartesia |
Cartesia Sonic | Cloud API, natural voices |
deepgram |
Deepgram Aura | Cloud API, natural voices |
piper |
Piper TTS | Local ONNX, multiple voices. Default |
speecht5 |
HuggingFace | Local, lightweight |
bark |
HuggingFace | Local, expressive |
LLM
| Name | Provider | Notes |
|---|---|---|
bedrock |
AWS Bedrock | Claude Haiku/Sonnet/Opus. Default |
openai |
OpenAI | GPT-4o, GPT-4o-mini via Strands SDK |
anthropic |
Anthropic | Claude models via Strands SDK |
gemini |
Gemini Flash/Pro via Strands SDK | |
ollama |
Ollama | Any local model (Llama 3, Mistral, etc.) |
local |
HuggingFace | Any local causal LM (direct, no Strands) |
VAD
| Name | Provider |
|---|---|
silero |
Silero VAD |
Configuration
Environment Variables
| Variable | Required for |
|---|---|
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY |
Bedrock LLM |
AWS_DEFAULT_REGION |
Bedrock (default: us-east-1) |
OPENAI_API_KEY |
OpenAI LLM |
ANTHROPIC_API_KEY |
Anthropic LLM |
GOOGLE_API_KEY |
Gemini LLM |
DEEPGRAM_API_KEY |
Deepgram STT/TTS |
CARTESIA_API_KEY |
Cartesia TTS |
API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.
Agent Config
Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:
name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
stt: whisper-base
tts: piper
llm: bedrock-claude
vad: silero
transport: websocket
Usage
CLI
roohai # Start with defaults
roohai --port 3000 # Custom port
roohai --reload # Auto-reload for development
roohai --log-level debug # Verbose logging
Python API
from roohai import Rooh
pipeline = Rooh.from_config({
"pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
"system_prompt": "You are a helpful assistant.",
})
pipeline.load()
# Transcribe
text = await pipeline.transcribe(audio_bytes)
# Chat
response = await pipeline.chat("Hello, how are you?")
# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
print(chunk, end="")
# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)
REST API
| Method | Path | Description |
|---|---|---|
GET |
/api/health |
Health check with active model info |
POST |
/api/transcribe |
Audio file -> transcription |
POST |
/api/chat |
Text -> LLM response |
POST |
/api/synthesize |
Text -> WAV audio |
POST |
/api/voice-chat |
Audio in -> text + audio out |
POST |
/api/webrtc/offer |
WebRTC SDP offer/answer |
GET |
/api/models |
List available and active models |
POST |
/api/models/swap |
Hot-swap a model at runtime |
Examples
The examples/ directory contains complete working apps:
- quickstart — Minimal voice agent
- barge-in-hook — Custom barge-in handling
- session-memory-agent — Per-session conversation memory with Strands SDK
- voice-weather-agent — Voice agent with tool use (weather API)
- skill-interview-agent — Structured interview agent
Extending RoohAI
Custom Models
Create a class extending STTModel, TTSModel, or LLMModel:
from roohai import STTModel, registry
class MySTT(STTModel):
def load(self): ...
def unload(self): ...
@property
def is_loaded(self) -> bool: ...
def transcribe(self, audio, sample_rate) -> str: ...
registry.register_stt("my-stt", MySTT)
LLM Hooks
Override LLM behavior with hooks for tool use, RAG, or custom logic:
pipeline.set_llm_hooks(
hook=my_batch_handler,
stream_hook=my_streaming_handler,
)
See the Strands SDK integration for a full example with tool use and conversation memory.
Documentation
Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[all]"
pytest
License
Apache 2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file roohai-0.1.5.tar.gz.
File metadata
- Download URL: roohai-0.1.5.tar.gz
- Upload date:
- Size: 134.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c5c02b8820437587c74ac635335afdd9db049dafbf85d09fa8138f3ac455aa7
|
|
| MD5 |
539664c61970afe2f78737f8345d6098
|
|
| BLAKE2b-256 |
bde6530b8943a3f014f395b71809b5a91dbfbac28c305cc3ae7a8d73c157c064
|
File details
Details for the file roohai-0.1.5-py3-none-any.whl.
File metadata
- Download URL: roohai-0.1.5-py3-none-any.whl
- Upload date:
- Size: 151.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
541db0f93b68a8ed3e7a4a4a59bf8c940c9c9e0055fa67bf05adca1984728fdf
|
|
| MD5 |
13bbac05e1ed5d05bc1db0ec71a13705
|
|
| BLAKE2b-256 |
62f45f3a633354daeff61def59be3caf2fc20fa666c40d5cfb2c55c9363e914c
|