Skip to main content

Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

Project description

RoohAI

Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.

PyPI Python License

Teri awaaz sun kar, meri rooh ko sukoon milta hai.


Features

  • Real-time voice — WebRTC and WebSocket transports with sub-second latency
  • Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
  • Hot-swap at runtime — Change models without restarting the server
  • Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
  • LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
  • Barge-in — Interrupt the AI mid-sentence by speaking
  • Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
  • Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required

Quick Start

pip install roohai

Then start the server:

roohai
# Open http://localhost:8000

Use the web UI to create an agent, select your models, and start a conversation.

All STT, TTS, and LLM providers are included by default. For NVIDIA models, install the extra:

pip install "roohai[nvidia]"

Supported Models

Speech-to-Text

Name Provider Notes
deepgram Deepgram Nova Cloud API, streaming support
nvidia-parakeet NVIDIA Local, high accuracy
whisper-tiny HuggingFace Local, fast, English-focused. Default
whisper-base HuggingFace Better accuracy, still lightweight
whisper-small HuggingFace Best local accuracy

Text-to-Speech

Name Provider Notes
cartesia Cartesia Sonic Cloud API, natural voices
deepgram Deepgram Aura Cloud API, natural voices
piper Piper TTS Local ONNX, multiple voices. Default
speecht5 HuggingFace Local, lightweight
bark HuggingFace Local, expressive

LLM

Name Provider Notes
bedrock AWS Bedrock Claude Haiku/Sonnet/Opus. Default
openai OpenAI GPT-4o, GPT-4o-mini via Strands SDK
anthropic Anthropic Claude models via Strands SDK
gemini Google Gemini Flash/Pro via Strands SDK
ollama Ollama Any local model (Llama 3, Mistral, etc.)
local HuggingFace Any local causal LM (direct, no Strands)

VAD

Name Provider
silero Silero VAD

Configuration

Environment Variables

Variable Required for
BEDROCK_API_KEY or AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY Bedrock LLM
AWS_DEFAULT_REGION Bedrock (default: us-east-1)
OPENAI_API_KEY OpenAI LLM
ANTHROPIC_API_KEY Anthropic LLM
GOOGLE_API_KEY Gemini LLM
DEEPGRAM_API_KEY Deepgram STT/TTS
CARTESIA_API_KEY Cartesia TTS

API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.

Agent Config

Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:

name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket

Usage

CLI

roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging

Python API

Builder Pattern

from roohai import Rooh

pipeline = (
    Rooh.builder()
    .stt("whisper-tiny")
    .tts("piper")
    .llm("bedrock", model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
    .vad("silero")
    .system_prompt("You are a helpful assistant.")
    .build()
)
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)

From Config

from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()

REST API

Method Path Description
GET /api/health Health check with active model info
POST /api/transcribe Audio file -> transcription
POST /api/chat Text -> LLM response
POST /api/synthesize Text -> WAV audio
POST /api/voice-chat Audio in -> text + audio out
POST /api/webrtc/offer WebRTC SDP offer/answer
GET /api/models List available and active models
POST /api/models/swap Hot-swap a model at runtime

Examples

The examples/ directory contains complete working apps:

Extending RoohAI

Custom Models

Create a class extending STTModel, TTSModel, or LLMModel:

from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)

LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)

See the Strands SDK integration for a full example with tool use and conversation memory.

Documentation

Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[all]"
pytest

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.1.7.tar.gz (138.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roohai-0.1.7-py3-none-any.whl (155.3 kB view details)

Uploaded Python 3

File details

Details for the file roohai-0.1.7.tar.gz.

File metadata

  • Download URL: roohai-0.1.7.tar.gz
  • Upload date:
  • Size: 138.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.7.tar.gz
Algorithm Hash digest
SHA256 0c0bf73d095731a35ddd5fd10cde046e79b1dac8ffe095078fbc465a5f90e15a
MD5 b987a0a64cd386b2ab16a474a562f947
BLAKE2b-256 55c0e4fa85c462f08455a486c68d1e1438a2bebb0224ff708ab534bbf500ef5d

See more details on using hashes here.

File details

Details for the file roohai-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: roohai-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 155.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6f2bacf805b2309205bc460621b4a6bca42e0fb402ed73fbaa863ef34cb2bf2f
MD5 f313e07a7e2b8e849be9ea2662e7f3ca
BLAKE2b-256 00e67f0b8643420cc31097d8024e1cbe615f8a6193cce34246da834e2890742d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page