Skip to main content

Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components

Project description

RoohAI

Open-source voice AI framework for building real-time voice agents.
Swap STT, TTS, and LLM models with a single line of config.

PyPI Python License

Teri awaaz sun kar, meri rooh ko sukoon milta hai.


Features

  • Real-time voice — WebRTC and WebSocket transports with sub-second latency
  • Swappable models — Mix and match STT, TTS, LLM, and VAD providers via YAML config
  • Hot-swap at runtime — Change models without restarting the server
  • Agent wizard UI — Browser-based GUI to create agents, pick models, and start talking
  • LLM streaming — Token-by-token responses with sentence-boundary TTS overlap
  • Barge-in — Interrupt the AI mid-sentence by speaking
  • Hooks & extensibility — Plug in custom LLM logic, tool use (Strands SDK), and observability
  • Built-in frontend — Dark-themed vanilla HTML/CSS/JS UI, no build step required

Quick Start

pip install roohai

Then start the server:

roohai
# Open http://localhost:8000

Use the web UI to create an agent, select your models, and start a conversation.

All STT, TTS, and LLM providers are included by default. For NVIDIA models, install the extra:

pip install "roohai[nvidia]"

Supported Models

Speech-to-Text

Name Provider Notes
deepgram Deepgram Nova Cloud API, streaming support
nvidia-parakeet NVIDIA Local, high accuracy
whisper-tiny HuggingFace Local, fast, English-focused. Default
whisper-base HuggingFace Better accuracy, still lightweight
whisper-small HuggingFace Best local accuracy

Text-to-Speech

Name Provider Notes
cartesia Cartesia Sonic Cloud API, natural voices
deepgram Deepgram Aura Cloud API, natural voices
piper Piper TTS Local ONNX, multiple voices. Default
speecht5 HuggingFace Local, lightweight
bark HuggingFace Local, expressive

LLM

Name Provider Notes
bedrock AWS Bedrock Claude Haiku/Sonnet/Opus. Default
openai OpenAI GPT-4o, GPT-4o-mini via Strands SDK
anthropic Anthropic Claude models via Strands SDK
gemini Google Gemini Flash/Pro via Strands SDK
ollama Ollama Any local model (Llama 3, Mistral, etc.)
local HuggingFace Any local causal LM (direct, no Strands)

VAD

Name Provider
silero Silero VAD

Configuration

Environment Variables

Variable Required for
BEDROCK_API_KEY or AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY Bedrock LLM
AWS_DEFAULT_REGION Bedrock (default: us-east-1)
OPENAI_API_KEY OpenAI LLM
ANTHROPIC_API_KEY Anthropic LLM
GOOGLE_API_KEY Gemini LLM
DEEPGRAM_API_KEY Deepgram STT/TTS
CARTESIA_API_KEY Cartesia TTS

API keys can also be set through the agent wizard UI — they're stored in ~/.roohai/secrets.yaml with 0600 permissions.

Agent Config

Agents are defined as YAML files in ~/.roohai/agents/. Each agent specifies its models, system prompt, and transport:

name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket

Usage

CLI

roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging

Python API

from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)

REST API

Method Path Description
GET /api/health Health check with active model info
POST /api/transcribe Audio file -> transcription
POST /api/chat Text -> LLM response
POST /api/synthesize Text -> WAV audio
POST /api/voice-chat Audio in -> text + audio out
POST /api/webrtc/offer WebRTC SDP offer/answer
GET /api/models List available and active models
POST /api/models/swap Hot-swap a model at runtime

Examples

The examples/ directory contains complete working apps:

Extending RoohAI

Custom Models

Create a class extending STTModel, TTSModel, or LLMModel:

from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)

LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)

See the Strands SDK integration for a full example with tool use and conversation memory.

Documentation

Full docs are available at http://localhost:8000/guide when the server is running, including architecture details, advanced configuration, and the complete model catalog.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[all]"
pytest

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roohai-0.1.6.tar.gz (137.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roohai-0.1.6-py3-none-any.whl (154.5 kB view details)

Uploaded Python 3

File details

Details for the file roohai-0.1.6.tar.gz.

File metadata

  • Download URL: roohai-0.1.6.tar.gz
  • Upload date:
  • Size: 137.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.6.tar.gz
Algorithm Hash digest
SHA256 2d9597995259d54503af577b6b47da99e079e314b76bd6ffce910db772d9306f
MD5 6eac5c3af237f49df60be42a7e27f755
BLAKE2b-256 53ea9de9671a4d6139324c539163e8dcd1305380dc8ae1fc9608007079a79980

See more details on using hashes here.

File details

Details for the file roohai-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: roohai-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 154.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for roohai-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 102b6160803032b1b564417adbf2f8a0f7d46dd67e47d9d382278278190c19c6
MD5 fe7a00f4096a4761b0af66648f138b10
BLAKE2b-256 e7b1543c4e9e8ebd982c3a30be6ad68968929f25cdd4f23c779002488753b942

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page