Skip to main content

Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

Project description

RoohAI

Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.

PyPI Python License

Teri awaaz sun kar, meri rooh ko sukoon milta hai.


Features

  • Real-time voice — WebRTC and WebSocket transports with sub-second latency
  • Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
  • Hot-swap at runtime — Change models without restarting the server
  • Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
  • LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
  • Barge-in — Interrupt the AI mid-sentence by speaking
  • Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
  • Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required

Quick Start

pip install roohai

Then start the server:

roohai
# Open http://localhost:8000

Use the web UI to create an agent, select your models, and start a conversation.

All STT, TTS, and LLM providers are included by default. For NVIDIA models, install the extra:

pip install "roohai[nvidia]"

Supported Models

Speech-to-Text

Name Provider Notes
deepgram Deepgram Nova Cloud API, streaming support
nvidia-parakeet NVIDIA Local, high accuracy
whisper-tiny HuggingFace Local, fast, English-focused. Default
whisper-base HuggingFace Better accuracy, still lightweight
whisper-small HuggingFace Best local accuracy

Text-to-Speech

Name Provider Notes
cartesia Cartesia Sonic Cloud API, natural voices
deepgram Deepgram Aura Cloud API, natural voices
piper Piper TTS Local ONNX, multiple voices. Default
speecht5 HuggingFace Local, lightweight
bark HuggingFace Local, expressive

LLM

Name Provider Notes
bedrock AWS Bedrock Claude Haiku/Sonnet/Opus. Default
openai OpenAI GPT-4o, GPT-4o-mini via Strands SDK
anthropic Anthropic Claude models via Strands SDK
gemini Google Gemini Flash/Pro via Strands SDK
ollama Ollama Any local model (Llama 3, Mistral, etc.)
local HuggingFace Any local causal LM (direct, no Strands)

VAD

Name Provider
silero Silero VAD

Configuration

Environment Variables

Variable Required for
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY Bedrock LLM
AWS_DEFAULT_REGION Bedrock (default: us-east-1)
OPENAI_API_KEY OpenAI LLM
ANTHROPIC_API_KEY Anthropic LLM
GOOGLE_API_KEY Gemini LLM
DEEPGRAM_API_KEY Deepgram STT/TTS
CARTESIA_API_KEY Cartesia TTS

API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.

Agent Config

Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:

name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket

Usage

CLI

roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging

Python API

from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)

REST API

Method Path Description
GET /api/health Health check with active model info
POST /api/transcribe Audio file -> transcription
POST /api/chat Text -> LLM response
POST /api/synthesize Text -> WAV audio
POST /api/voice-chat Audio in -> text + audio out
POST /api/webrtc/offer WebRTC SDP offer/answer
GET /api/models List available and active models
POST /api/models/swap Hot-swap a model at runtime

Examples

The examples/ directory contains complete working apps:

Extending RoohAI

Custom Models

Create a class extending STTModel, TTSModel, or LLMModel:

from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)

LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)

See the Strands SDK integration for a full example with tool use and conversation memory.

Documentation

Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[all]"
pytest

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.1.5.tar.gz (134.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roohai-0.1.5-py3-none-any.whl (151.3 kB view details)

Uploaded Python 3

File details

Details for the file roohai-0.1.5.tar.gz.

File metadata

  • Download URL: roohai-0.1.5.tar.gz
  • Upload date:
  • Size: 134.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.5.tar.gz
Algorithm Hash digest
SHA256 0c5c02b8820437587c74ac635335afdd9db049dafbf85d09fa8138f3ac455aa7
MD5 539664c61970afe2f78737f8345d6098
BLAKE2b-256 bde6530b8943a3f014f395b71809b5a91dbfbac28c305cc3ae7a8d73c157c064

See more details on using hashes here.

File details

Details for the file roohai-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: roohai-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 151.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 541db0f93b68a8ed3e7a4a4a59bf8c940c9c9e0055fa67bf05adca1984728fdf
MD5 13bbac05e1ed5d05bc1db0ec71a13705
BLAKE2b-256 62f45f3a633354daeff61def59be3caf2fc20fa666c40d5cfb2c55c9363e914c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page