Provider-agnostic voice RAG pipeline. Plug in your voice provider, LLM, vector store, and document parsers.
Project description
voice-rag
Ingest your docs. Answer questions by voice. Deploy in minutes.
voice-rag is a Python library and CLI for building voice-powered RAG pipelines. Point it at a folder of documents, choose your LLM and voice provider, and get an OpenAI-compatible webhook ready to wire into ElevenLabs, Deepgram, or any voice platform.
pip install "voice-rag[elevenlabs]"
export OPENAI_API_KEY=sk-...
voice-rag init && voice-rag ingest ./docs --recreate && voice-rag serve
# → serving at http://localhost:8000/v1
What it does
your docs → Qdrant (hybrid dense + BM25) → retrieved chunks
↓
voice platform → speech-to-text → /v1/chat/completions → LLM → TTS
Each turn from your voice platform hits the webhook, embeds the user utterance, retrieves the most relevant chunks, injects them into the system prompt, and streams the LLM response back as SSE — all in one pip install.
Install
# ElevenLabs voice + OpenAI LLM (most common)
pip install "voice-rag[elevenlabs]"
# All providers
pip install "voice-rag[all]"
# Pick only what you need
pip install "voice-rag[anthropic,pdf]"
| Extra | Adds |
|---|---|
elevenlabs |
ElevenLabs voice adapter |
deepgram |
Deepgram voice adapter |
anthropic |
Anthropic (Claude) LLM client |
gemini |
Google Gemini LLM client |
pdf |
PDF parser (PyMuPDF) |
docx |
Word document parser |
all |
Everything above |
Quickstart
# 1. Create a config file
voice-rag init
# 2. Ingest your documents (supports .md, .txt, .pdf, .docx)
voice-rag ingest ./docs --recreate
# 3. Start the webhook server
voice-rag serve
Point your ElevenLabs agent's Custom LLM URL to http://localhost:8000/v1.
By default, vectors are stored locally in .qdrant — no separate Qdrant server needed. Set vector_store.url to connect to a remote instance.
CLI reference
voice-rag init [--dir PATH] # create voice-rag.yaml
voice-rag ingest <path> [--recreate] # ingest a file or directory
voice-rag serve [--host] [--port] [--reload]
voice-rag query <text> [--limit N] # test retrieval without a server
voice-rag inspect # show collection stats
voice-rag doctor # check API keys and Qdrant connectivity
Python API
from voice_rag import KnowledgeAgent, VoiceRagConfig
config = VoiceRagConfig() # reads from voice-rag.yaml or env vars
agent = KnowledgeAgent(config=config)
agent.ingest("./docs", recreate=True)
app = agent.create_app() # returns a FastAPI app
# run with: uvicorn app:app --port 8000
Configuration
Config is loaded from voice-rag.yaml (run voice-rag init to generate one) or environment variables. Environment variables override the YAML file.
| Key | Env var | Default |
|---|---|---|
llm.provider |
LLM_PROVIDER |
openai |
llm.model |
LLM_MODEL |
gpt-4o-mini |
llm.api_key / embedding.api_key |
OPENAI_API_KEY |
— |
llm.base_url |
LLM_BASE_URL |
https://api.openai.com/v1 |
embedding.model |
EMBEDDING_MODEL |
text-embedding-3-small |
vector_store.url |
VECTOR_STORE_URL |
empty → local .qdrant |
vector_store.collection_name |
VECTOR_STORE_COLLECTION_NAME |
knowledge_base |
server.port |
SERVER_PORT |
8000 |
server.enable_debug_retrieval |
SERVER_ENABLE_DEBUG_RETRIEVAL |
false |
See voice-rag.yaml for the full annotated schema.
Providers
| Category | Supported |
|---|---|
| LLM | OpenAI, Anthropic, Gemini (any OpenAI-compatible URL via llm.base_url) |
| Voice | ElevenLabs, Deepgram |
| Embeddings | OpenAI |
| Vector store | Qdrant (local embedded or remote) |
| Parsers | .txt, .md, .pdf, .docx |
Starter kit
Want a full working demo with a Next.js frontend and Railway deploy button? See kytona/elevenlabs-knowledge-agent — a thin wrapper around voice-rag with an ElevenLabs voice UI.
Development
git clone https://github.com/kytona/voice-rag
cd voice-rag
pip install -e ".[all,dev]"
pytest tests/ -v
See CONTRIBUTING.md for how to add new LLM, voice, embedding, or vector store connectors.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_rag-0.1.1.tar.gz.
File metadata
- Download URL: voice_rag-0.1.1.tar.gz
- Upload date:
- Size: 97.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38d760f06ee71f8befce9a36e654d704945b37ebbe257b3bf804a82d3943c592
|
|
| MD5 |
3f62a23e395a49b7503c7cec47927e4a
|
|
| BLAKE2b-256 |
f6bcc81ab72fdc706404e4116a34b10eefdc699b9ee8def39d4d66212270da96
|
Provenance
The following attestation bundles were made for voice_rag-0.1.1.tar.gz:
Publisher:
publish.yml on kytona/voice-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_rag-0.1.1.tar.gz -
Subject digest:
38d760f06ee71f8befce9a36e654d704945b37ebbe257b3bf804a82d3943c592 - Sigstore transparency entry: 1114966448
- Sigstore integration time:
-
Permalink:
kytona/voice-rag@c45b7fdf80d2f463b2b4e415711fd812d0caa961 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/kytona
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c45b7fdf80d2f463b2b4e415711fd812d0caa961 -
Trigger Event:
push
-
Statement type:
File details
Details for the file voice_rag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: voice_rag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad690672fb91265b2126fe1a31f7c6da2e51f5986ac2c37fcfe7f4199516e677
|
|
| MD5 |
554a9ed72bd53ccb158b48274a619fcf
|
|
| BLAKE2b-256 |
fad913335bfad61adc7f8bb8bb1e744fc116fa6e2939a2507254f271d633b716
|
Provenance
The following attestation bundles were made for voice_rag-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on kytona/voice-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voice_rag-0.1.1-py3-none-any.whl -
Subject digest:
ad690672fb91265b2126fe1a31f7c6da2e51f5986ac2c37fcfe7f4199516e677 - Sigstore transparency entry: 1114966452
- Sigstore integration time:
-
Permalink:
kytona/voice-rag@c45b7fdf80d2f463b2b4e415711fd812d0caa961 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/kytona
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c45b7fdf80d2f463b2b4e415711fd812d0caa961 -
Trigger Event:
push
-
Statement type: