Skip to main content

Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

Project description

RoohAI Framework

A modular voice AI framework with real-time WebRTC audio, swappable STT/TTS/LLM models, and a browser-based frontend.

Teri awaaz sun kar, meri rooh ko sukoon milta hai.

Quick Start

# 1. Create a Python 3.11 environment
conda create -n roohai python=3.11 -y
conda activate roohai

# 2. Install RoohAI
pip install "roohai[all]" # Everything (Deepgram, Cartesia, Piper, etc.)

# 3. Start the voice UI
roohai

# Open http://localhost:8000 in your browser

Use the web UI to create agents, pick models, and start talking.

Models

Type Name Backend
STT whisper-tiny HuggingFace openai/whisper-tinydefault
STT wav2vec2 HuggingFace facebook/wav2vec2-base-960h
STT deepgram Deepgram Nova (cloud API)
STT nvidia-canary NVIDIA Canary
STT nvidia-parakeet NVIDIA Parakeet
TTS piper Piper TTS (local ONNX) — default
TTS speecht5 HuggingFace microsoft/speecht5_tts
TTS bark HuggingFace suno/bark-small
TTS cartesia Cartesia (cloud API)
LLM bedrock-claude Amazon Bedrock (anthropic.claude-3-haiku) — default
LLM local HuggingFace local model

Models can be hot-swapped at runtime via the REST API or the frontend UI.

Environment Variables

  • AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY — For Bedrock LLM
  • AWS_DEFAULT_REGION — AWS region (default: us-east-1)
  • DEEPGRAM_API_KEY — For Deepgram STT
  • CARTESIA_API_KEY — For Cartesia TTS

Cloud API keys can also be provided through the web UI wizard — they are stored securely in ~/.roohai/secrets.yaml.

Adding a Custom Model

  1. Create a class extending STTModel, TTSModel, or LLMModel from roohai.base
  2. Implement the required abstract methods (load, transcribe/synthesize/chat, unload, is_loaded)
  3. Register it in roohai/pipeline.py _register_defaults() and _CLASS_MAP
  4. Add a config entry in config.yaml under models:

API Endpoints

Method Path Description
GET /api/health Health check with active model info
POST /api/transcribe Upload audio file, get transcription
POST /api/chat Send text, get LLM response
POST /api/synthesize Send text, get WAV audio
POST /api/voice-chat Full pipeline: audio in -> text + audio out
POST /api/webrtc/offer WebRTC SDP offer/answer exchange
GET /api/models List available and active models
POST /api/models/swap Hot-swap a model at runtime

Documentation

For full documentation including architecture, configuration, and advanced usage, see the RoohAI Guide (available when the server is running).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.1.2.tar.gz (105.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roohai-0.1.2-py3-none-any.whl (121.1 kB view details)

Uploaded Python 3

File details

Details for the file roohai-0.1.2.tar.gz.

File metadata

  • Download URL: roohai-0.1.2.tar.gz
  • Upload date:
  • Size: 105.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for roohai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f84510536c0ae9ff92d2c2e67d845fdea2736255509df1ca990ebe6746d17962
MD5 b96023315273c5c7bdd5757131594aa0
BLAKE2b-256 e96794ca94d963b22ac18676b86a6f3532be030f4af6b4a32581206e4d738122

See more details on using hashes here.

File details

Details for the file roohai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: roohai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 121.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for roohai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aad21b611ea73b15a17e5b6a62111de339b0b88605a5ebc16e4cccba07051a3e
MD5 0c35f86932276980e5d5298ea842834a
BLAKE2b-256 d421119d31a91561c9b3295e15cce56df2b69c1c2f712fa1ee31069e8d78d0e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page