Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

These details have not been verified by PyPI

Project links

Project description

RoohAI

Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.

Teri awaaz sun kar, meri rooh ko sukoon milta hai.

Why Rooh | Docs

Features

Real-time voice — WebRTC and WebSocket transports with sub-second latency
Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
Hot-swap at runtime — Change models without restarting the server
Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
Barge-in — Interrupt the AI mid-sentence by speaking
Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required

Quick Start

pip install roohai

Then start the server:

roohai
# Open http://localhost:8000

Use the web UI to create an agent, select your models, and start a conversation.

All cloud providers (Deepgram, Cartesia, Bedrock), the Strands Agent SDK (OpenAI, Anthropic, Gemini, Ollama), and Silero VAD are included. Local model dependencies (Whisper, SpeechT5, Bark, Piper) are installed automatically when you select them in the wizard.

For NVIDIA NeMo models (requires CUDA GPU):

pip install "roohai[nvidia]"

Supported Models

Speech-to-Text

Name	Provider	Notes
`deepgram`	Deepgram Nova	Cloud API, streaming support
`nvidia-parakeet`	NVIDIA	Local, high accuracy
`whisper-tiny`	HuggingFace	Local, fast, English-focused. Default
`whisper-base`	HuggingFace	Better accuracy, still lightweight
`whisper-small`	HuggingFace	Best local accuracy

Text-to-Speech

Name	Provider	Notes
`cartesia`	Cartesia Sonic	Cloud API, natural voices
`deepgram`	Deepgram Aura	Cloud API, natural voices
`piper`	Piper TTS	Local ONNX, multiple voices. Default
`speecht5`	HuggingFace	Local, lightweight
`bark`	HuggingFace	Local, expressive

LLM

Name	Provider	Notes
`bedrock`	AWS Bedrock	Claude Haiku/Sonnet/Opus. Default
`openai`	OpenAI	GPT-4o, GPT-4o-mini via Strands SDK
`anthropic`	Anthropic	Claude models via Strands SDK
`gemini`	Google	Gemini Flash/Pro via Strands SDK
`ollama`	Ollama	Any local model (Llama 3, Mistral, etc.)
`local`	HuggingFace	Any local causal LM (direct, no Strands)

VAD

Name	Provider
`silero`	Silero VAD

Configuration

Environment Variables

Variable	Required for
`BEDROCK_API_KEY` or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`	Bedrock LLM
`AWS_DEFAULT_REGION`	Bedrock (default: `us-east-1`)
`OPENAI_API_KEY`	OpenAI LLM
`ANTHROPIC_API_KEY`	Anthropic LLM
`GOOGLE_API_KEY`	Gemini LLM
`DEEPGRAM_API_KEY`	Deepgram STT/TTS
`CARTESIA_API_KEY`	Cartesia TTS

API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.

Agent Config

Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:

name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket

Usage

CLI

roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging

Python API

Builder Pattern

from roohai import Rooh

pipeline = (
    Rooh.builder()
    .stt("whisper-tiny")
    .tts("piper")
    .llm("bedrock", model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
    .vad("silero")
    .system_prompt("You are a helpful assistant.")
    .build()
)
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)

From Config

from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()

REST API

Method	Path	Description
`GET`	`/api/health`	Health check with active model info
`POST`	`/api/transcribe`	Audio file -> transcription
`POST`	`/api/chat`	Text -> LLM response
`POST`	`/api/synthesize`	Text -> WAV audio
`POST`	`/api/voice-chat`	Audio in -> text + audio out
`POST`	`/api/webrtc/offer`	WebRTC SDP offer/answer
`GET`	`/api/models`	List available and active models
`POST`	`/api/models/swap`	Hot-swap a model at runtime

Examples

The examples/ directory contains complete working apps:

quickstart — Minimal voice agent
barge-in-hook — Custom barge-in handling
session-memory-agent — Per-session conversation memory with Strands SDK
voice-weather-agent — Voice agent with tool use (weather API)
skill-interview-agent — Structured interview agent

Extending RoohAI

Custom Models

Create a class extending STTModel, TTSModel, or LLMModel:

from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)

LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)

See the Strands SDK integration for a full example with tool use and conversation memory.

Documentation

Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[nvidia]"
pytest

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Mar 20, 2026

0.1.9

Mar 14, 2026

0.1.8

Mar 12, 2026

0.1.7

Mar 11, 2026

0.1.6

Mar 10, 2026

0.1.5

Mar 10, 2026

0.1.3

Mar 3, 2026

0.1.2

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.2.1.tar.gz (156.6 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

roohai-0.2.1-py3-none-any.whl (173.4 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file roohai-0.2.1.tar.gz.

File metadata

Download URL: roohai-0.2.1.tar.gz
Upload date: Mar 20, 2026
Size: 156.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for roohai-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`e01d866d5fb19bce2feefa8448bb44d40354c34be19b12a7a7121a82ecd07e0f`
MD5	`0e80f309c4b80e75c91bfc937db26751`
BLAKE2b-256	`685fd586cdf575fbca4f2c7638a512409e386a10828ffe7569ebad32d36d8343`

See more details on using hashes here.

File details

Details for the file roohai-0.2.1-py3-none-any.whl.

File metadata

Download URL: roohai-0.2.1-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 173.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for roohai-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8406aa82922178e135f571f1a4720e07c2037375bad4be37c0bb306945cabf4a`
MD5	`dbb78d2dce2aaa08616523e6f60cb271`
BLAKE2b-256	`f3f0d8910bbabf640a12ec32f41cc38b7b49b204b19a58dbfe667706b70712ca`

See more details on using hashes here.

roohai 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RoohAI

Why Rooh | Docs

Features

Quick Start

Supported Models

Speech-to-Text

Text-to-Speech

LLM

VAD

Configuration

Environment Variables

Agent Config

Usage

CLI

Python API

Builder Pattern

From Config

REST API

Examples

Extending RoohAI

Custom Models

LLM Hooks

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes