Skip to main content

Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

Project description

RoohAI

Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.

PyPI Python License

Teri awaaz sun kar, meri rooh ko sukoon milta hai.

Why Rooh | Docs

Features

  • Real-time voice — WebRTC and WebSocket transports with sub-second latency
  • Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
  • Hot-swap at runtime — Change models without restarting the server
  • Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
  • LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
  • Barge-in — Interrupt the AI mid-sentence by speaking
  • Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
  • Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required

Quick Start

pip install roohai

Then start the server:

roohai
# Open http://localhost:8000

Use the web UI to create an agent, select your models, and start a conversation.

All cloud providers (Deepgram, Cartesia, Bedrock), the Strands Agent SDK (OpenAI, Anthropic, Gemini, Ollama), and Silero VAD are included. Local model dependencies (Whisper, SpeechT5, Bark, Piper) are installed automatically when you select them in the wizard.

For NVIDIA NeMo models (requires CUDA GPU):

pip install "roohai[nvidia]"

Supported Models

Speech-to-Text

Name Provider Notes
deepgram Deepgram Nova Cloud API, streaming support
nvidia-parakeet NVIDIA Local, high accuracy
whisper-tiny HuggingFace Local, fast, English-focused. Default
whisper-base HuggingFace Better accuracy, still lightweight
whisper-small HuggingFace Best local accuracy

Text-to-Speech

Name Provider Notes
cartesia Cartesia Sonic Cloud API, natural voices
deepgram Deepgram Aura Cloud API, natural voices
piper Piper TTS Local ONNX, multiple voices. Default
speecht5 HuggingFace Local, lightweight
bark HuggingFace Local, expressive

LLM

Name Provider Notes
bedrock AWS Bedrock Claude Haiku/Sonnet/Opus. Default
openai OpenAI GPT-4o, GPT-4o-mini via Strands SDK
anthropic Anthropic Claude models via Strands SDK
gemini Google Gemini Flash/Pro via Strands SDK
ollama Ollama Any local model (Llama 3, Mistral, etc.)
local HuggingFace Any local causal LM (direct, no Strands)

VAD

Name Provider
silero Silero VAD

Configuration

Environment Variables

Variable Required for
BEDROCK_API_KEY or AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY Bedrock LLM
AWS_DEFAULT_REGION Bedrock (default: us-east-1)
OPENAI_API_KEY OpenAI LLM
ANTHROPIC_API_KEY Anthropic LLM
GOOGLE_API_KEY Gemini LLM
DEEPGRAM_API_KEY Deepgram STT/TTS
CARTESIA_API_KEY Cartesia TTS

API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.

Agent Config

Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:

name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket

Usage

CLI

roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging

Python API

Builder Pattern

from roohai import Rooh

pipeline = (
    Rooh.builder()
    .stt("whisper-tiny")
    .tts("piper")
    .llm("bedrock", model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
    .vad("silero")
    .system_prompt("You are a helpful assistant.")
    .build()
)
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)

From Config

from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()

REST API

Method Path Description
GET /api/health Health check with active model info
POST /api/transcribe Audio file -> transcription
POST /api/chat Text -> LLM response
POST /api/synthesize Text -> WAV audio
POST /api/voice-chat Audio in -> text + audio out
POST /api/webrtc/offer WebRTC SDP offer/answer
GET /api/models List available and active models
POST /api/models/swap Hot-swap a model at runtime

Examples

The examples/ directory contains complete working apps:

Extending RoohAI

Custom Models

Create a class extending STTModel, TTSModel, or LLMModel:

from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)

LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)

See the Strands SDK integration for a full example with tool use and conversation memory.

Documentation

Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[nvidia]"
pytest

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.2.1.tar.gz (156.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roohai-0.2.1-py3-none-any.whl (173.4 kB view details)

Uploaded Python 3

File details

Details for the file roohai-0.2.1.tar.gz.

File metadata

  • Download URL: roohai-0.2.1.tar.gz
  • Upload date:
  • Size: 156.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for roohai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e01d866d5fb19bce2feefa8448bb44d40354c34be19b12a7a7121a82ecd07e0f
MD5 0e80f309c4b80e75c91bfc937db26751
BLAKE2b-256 685fd586cdf575fbca4f2c7638a512409e386a10828ffe7569ebad32d36d8343

See more details on using hashes here.

File details

Details for the file roohai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: roohai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 173.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for roohai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8406aa82922178e135f571f1a4720e07c2037375bad4be37c0bb306945cabf4a
MD5 dbb78d2dce2aaa08616523e6f60cb271
BLAKE2b-256 f3f0d8910bbabf640a12ec32f41cc38b7b49b204b19a58dbfe667706b70712ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page