Skip to main content

Provider-agnostic voice RAG pipeline. Plug in your voice provider, LLM, vector store, and document parsers.

Project description

voice-rag

PyPI version License: MIT Python CI

Ingest your docs. Answer questions by voice. Deploy in minutes.

voice-rag is a Python library and CLI for building voice-powered RAG pipelines. Point it at a folder of documents, choose your LLM and voice provider, and get an OpenAI-compatible webhook ready to wire into ElevenLabs, Deepgram, or any voice platform.

pip install "voice-rag[elevenlabs]"
export OPENAI_API_KEY=sk-...
voice-rag init && voice-rag ingest ./docs --recreate && voice-rag serve
# → serving at http://localhost:8000/v1

What it does

your docs  →  Qdrant (hybrid dense + BM25)  →  retrieved chunks
                                                       ↓
voice platform  →  speech-to-text  →  /v1/chat/completions  →  LLM  →  TTS

Each turn from your voice platform hits the webhook, embeds the user utterance, retrieves the most relevant chunks, injects them into the system prompt, and streams the LLM response back as SSE — all in one pip install.


Install

# ElevenLabs voice + OpenAI LLM (most common)
pip install "voice-rag[elevenlabs]"

# All providers
pip install "voice-rag[all]"

# Pick only what you need
pip install "voice-rag[anthropic,pdf]"
Extra Adds
elevenlabs ElevenLabs voice adapter
deepgram Deepgram voice adapter
anthropic Anthropic (Claude) LLM client
gemini Google Gemini LLM client
pdf PDF parser (PyMuPDF)
docx Word document parser
all Everything above

Quickstart

# 1. Create a config file
voice-rag init

# 2. Ingest your documents (supports .md, .txt, .pdf, .docx)
voice-rag ingest ./docs --recreate

# 3. Start the webhook server
voice-rag serve

Point your ElevenLabs agent's Custom LLM URL to http://localhost:8000/v1.

By default, vectors are stored locally in .qdrant — no separate Qdrant server needed. Set vector_store.url to connect to a remote instance.


CLI reference

voice-rag init [--dir PATH]               # create voice-rag.yaml
voice-rag ingest <path> [--recreate]      # ingest a file or directory
voice-rag serve [--host] [--port] [--reload]
voice-rag query <text> [--limit N]        # test retrieval without a server
voice-rag inspect                         # show collection stats
voice-rag doctor                          # check API keys and Qdrant connectivity

Python API

from voice_rag import KnowledgeAgent, VoiceRagConfig

config = VoiceRagConfig()          # reads from voice-rag.yaml or env vars
agent = KnowledgeAgent(config=config)

agent.ingest("./docs", recreate=True)

app = agent.create_app()           # returns a FastAPI app
# run with: uvicorn app:app --port 8000

Configuration

Config is loaded from voice-rag.yaml (run voice-rag init to generate one) or environment variables. Environment variables override the YAML file.

Key Env var Default
llm.provider LLM_PROVIDER openai
llm.model LLM_MODEL gpt-4o-mini
llm.api_key / embedding.api_key OPENAI_API_KEY
llm.base_url LLM_BASE_URL https://api.openai.com/v1
embedding.model EMBEDDING_MODEL text-embedding-3-small
vector_store.url VECTOR_STORE_URL empty → local .qdrant
vector_store.collection_name VECTOR_STORE_COLLECTION_NAME knowledge_base
server.port SERVER_PORT 8000
server.enable_debug_retrieval SERVER_ENABLE_DEBUG_RETRIEVAL false

See voice-rag.yaml for the full annotated schema.


Providers

Category Supported
LLM OpenAI, Anthropic, Gemini (any OpenAI-compatible URL via llm.base_url)
Voice ElevenLabs, Deepgram
Embeddings OpenAI
Vector store Qdrant (local embedded or remote)
Parsers .txt, .md, .pdf, .docx

Starter kit

Want a full working demo with a Next.js frontend and Railway deploy button? See kytona/elevenlabs-knowledge-agent — a thin wrapper around voice-rag with an ElevenLabs voice UI.


Development

git clone https://github.com/kytona/voice-rag
cd voice-rag
pip install -e ".[all,dev]"
pytest tests/ -v

See CONTRIBUTING.md for how to add new LLM, voice, embedding, or vector store connectors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_rag-0.1.1.tar.gz (97.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_rag-0.1.1-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file voice_rag-0.1.1.tar.gz.

File metadata

  • Download URL: voice_rag-0.1.1.tar.gz
  • Upload date:
  • Size: 97.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.1.tar.gz
Algorithm Hash digest
SHA256 38d760f06ee71f8befce9a36e654d704945b37ebbe257b3bf804a82d3943c592
MD5 3f62a23e395a49b7503c7cec47927e4a
BLAKE2b-256 f6bcc81ab72fdc706404e4116a34b10eefdc699b9ee8def39d4d66212270da96

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.1.tar.gz:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voice_rag-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: voice_rag-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad690672fb91265b2126fe1a31f7c6da2e51f5986ac2c37fcfe7f4199516e677
MD5 554a9ed72bd53ccb158b48274a619fcf
BLAKE2b-256 fad913335bfad61adc7f8bb8bb1e744fc116fa6e2939a2507254f271d633b716

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.1-py3-none-any.whl:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page