Skip to main content

Provider-agnostic voice RAG pipeline. Plug in your voice provider, LLM, vector store, and document parsers.

Project description

voice-rag

PyPI version License: MIT Python CI

Ingest your docs. Answer questions by voice. Deploy in minutes.

voice-rag is a Python library and CLI for building voice-powered RAG pipelines. Point it at a folder of documents, choose your LLM and voice provider, and get an OpenAI-compatible webhook ready to wire into ElevenLabs, Deepgram, or any voice platform.

pip install "voice-rag[elevenlabs]"
export OPENAI_API_KEY=sk-...
voice-rag init && voice-rag ingest ./docs --recreate && voice-rag serve
# → serving at http://localhost:8000/v1

What it does

your docs  →  Qdrant (hybrid dense + BM25)  →  retrieved chunks
                                                       ↓
voice platform  →  speech-to-text  →  /v1/chat/completions  →  LLM  →  TTS

Each turn from your voice platform hits the webhook, embeds the user utterance, retrieves the most relevant chunks, injects them into the system prompt, and streams the LLM response back as SSE — all in one pip install.


Install

# ElevenLabs voice + OpenAI LLM (most common)
pip install "voice-rag[elevenlabs]"

# All providers
pip install "voice-rag[all]"

# Pick only what you need
pip install "voice-rag[anthropic,pdf]"
Extra Adds
elevenlabs ElevenLabs voice adapter
deepgram Deepgram voice adapter
anthropic Anthropic (Claude) LLM client
gemini Google Gemini LLM client
pdf PDF parser (PyMuPDF)
docx Word document parser
all Everything above

Quickstart

# 1. Create a config file
voice-rag init

# 2. Ingest your documents (supports .md, .txt, .pdf, .docx)
voice-rag ingest ./docs --recreate

# 3. Start the webhook server
voice-rag serve

Point your ElevenLabs agent's Custom LLM URL to http://localhost:8000/v1.

By default, vectors are stored locally in .qdrant — no separate Qdrant server needed. Set vector_store.url to connect to a remote instance.


CLI reference

voice-rag init [--dir PATH]               # create voice-rag.yaml
voice-rag ingest <path> [--recreate]      # ingest a file or directory
voice-rag serve [--host] [--port] [--reload]
voice-rag query <text> [--limit N]        # test retrieval without a server
voice-rag inspect                         # show collection stats
voice-rag doctor                          # check API keys and Qdrant connectivity

Python API

from voice_rag import KnowledgeAgent, VoiceRagConfig

config = VoiceRagConfig()          # reads from voice-rag.yaml or env vars
agent = KnowledgeAgent(config=config)

agent.ingest("./docs", recreate=True)

app = agent.create_app()           # returns a FastAPI app
# run with: uvicorn app:app --port 8000

Configuration

Config is loaded from voice-rag.yaml (run voice-rag init to generate one) or environment variables. Environment variables override the YAML file.

Key Env var Default
llm.provider LLM_PROVIDER openai
llm.model LLM_MODEL gpt-4o-mini
llm.api_key / embedding.api_key OPENAI_API_KEY
llm.base_url LLM_BASE_URL https://api.openai.com/v1
embedding.model EMBEDDING_MODEL text-embedding-3-small
vector_store.url VECTOR_STORE_URL empty → local .qdrant
vector_store.collection_name VECTOR_STORE_COLLECTION_NAME knowledge_base
server.port SERVER_PORT 8000
server.enable_debug_retrieval SERVER_ENABLE_DEBUG_RETRIEVAL false

See voice-rag.yaml for the full annotated schema.


Providers

Category Supported
LLM OpenAI, Anthropic, Gemini (any OpenAI-compatible URL via llm.base_url)
Voice ElevenLabs, Deepgram
Embeddings OpenAI
Vector store Qdrant (local embedded or remote)
Parsers .txt, .md, .pdf, .docx

Starter kit

Want a full working demo with a Next.js frontend and Railway deploy button? See kytona/elevenlabs-knowledge-agent — a thin wrapper around voice-rag with an ElevenLabs voice UI.


Development

git clone https://github.com/kytona/voice-rag
cd voice-rag
pip install -e ".[all,dev]"
pytest tests/ -v

See CONTRIBUTING.md for how to add new LLM, voice, embedding, or vector store connectors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_rag-0.1.3.tar.gz (98.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_rag-0.1.3-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file voice_rag-0.1.3.tar.gz.

File metadata

  • Download URL: voice_rag-0.1.3.tar.gz
  • Upload date:
  • Size: 98.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.3.tar.gz
Algorithm Hash digest
SHA256 bf2c943ef08f9e405560cc2488eb6ff768ebd457bf4439e8406b7e328cdfed4a
MD5 cb39d8314099632c763b64108eb59fc5
BLAKE2b-256 d9e724e567e4894d3fcc40fae8b1dd9066674225eae0b380b9a791e60182be3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.3.tar.gz:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voice_rag-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: voice_rag-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 893af42dc19d12fd7b9297f9291522c9058640c84cec12314e00906119d4d251
MD5 f8214b187ca7d214181eae304e1bb9f3
BLAKE2b-256 d688273ce72fa17927697f7e0860adac6aa408eac87fc8b882b5d64cd6829616

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.3-py3-none-any.whl:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page