Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components
Project description
RoohAI Framework
A modular voice AI framework with real-time WebRTC audio, swappable STT/TTS/LLM models, and a browser-based frontend.
Teri awaaz sun kar, meri rooh ko sukoon milta hai.
Quick Start
# 1. Create a Python 3.11 environment
conda create -n roohai python=3.11 -y
conda activate roohai
# 2. Install RoohAI
pip install "roohai[all]" # Everything (Deepgram, Cartesia, Piper, etc.)
# 3. Start the voice UI
roohai
# Open http://localhost:8000 in your browser
Use the web UI to create agents, pick models, and start talking.
Models
| Type | Name | Backend |
|---|---|---|
| STT | whisper-tiny |
HuggingFace openai/whisper-tiny — default |
| STT | wav2vec2 |
HuggingFace facebook/wav2vec2-base-960h |
| STT | deepgram |
Deepgram Nova (cloud API) |
| STT | nvidia-canary |
NVIDIA Canary |
| STT | nvidia-parakeet |
NVIDIA Parakeet |
| TTS | piper |
Piper TTS (local ONNX) — default |
| TTS | speecht5 |
HuggingFace microsoft/speecht5_tts |
| TTS | bark |
HuggingFace suno/bark-small |
| TTS | cartesia |
Cartesia (cloud API) |
| LLM | bedrock-claude |
Amazon Bedrock (anthropic.claude-3-haiku) — default |
| LLM | local |
HuggingFace local model |
Models can be hot-swapped at runtime via the REST API or the frontend UI.
Environment Variables
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY— For Bedrock LLMAWS_DEFAULT_REGION— AWS region (default:us-east-1)DEEPGRAM_API_KEY— For Deepgram STTCARTESIA_API_KEY— For Cartesia TTS
Cloud API keys can also be provided through the web UI wizard — they are stored securely in ~/.roohai/secrets.yaml.
Adding a Custom Model
- Create a class extending
STTModel,TTSModel, orLLMModelfromroohai.base - Implement the required abstract methods (
load,transcribe/synthesize/chat,unload,is_loaded) - Register it in
roohai/pipeline.py_register_defaults()and_CLASS_MAP - Add a config entry in
config.yamlundermodels:
API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Health check with active model info |
| POST | /api/transcribe |
Upload audio file, get transcription |
| POST | /api/chat |
Send text, get LLM response |
| POST | /api/synthesize |
Send text, get WAV audio |
| POST | /api/voice-chat |
Full pipeline: audio in -> text + audio out |
| POST | /api/webrtc/offer |
WebRTC SDP offer/answer exchange |
| GET | /api/models |
List available and active models |
| POST | /api/models/swap |
Hot-swap a model at runtime |
Documentation
For full documentation including architecture, configuration, and advanced usage, see the RoohAI Guide (available when the server is running).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file roohai-0.1.2.tar.gz.
File metadata
- Download URL: roohai-0.1.2.tar.gz
- Upload date:
- Size: 105.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f84510536c0ae9ff92d2c2e67d845fdea2736255509df1ca990ebe6746d17962
|
|
| MD5 |
b96023315273c5c7bdd5757131594aa0
|
|
| BLAKE2b-256 |
e96794ca94d963b22ac18676b86a6f3532be030f4af6b4a32581206e4d738122
|
File details
Details for the file roohai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: roohai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 121.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aad21b611ea73b15a17e5b6a62111de339b0b88605a5ebc16e4cccba07051a3e
|
|
| MD5 |
0c35f86932276980e5d5298ea842834a
|
|
| BLAKE2b-256 |
d421119d31a91561c9b3295e15cce56df2b69c1c2f712fa1ee31069e8d78d0e3
|