Skip to main content

Provider-agnostic voice RAG pipeline. Plug in your voice provider, LLM, vector store, and document parsers.

Project description

voice-rag

PyPI version License: MIT Python CI

Ingest your docs. Answer questions by voice. Deploy in minutes.

voice-rag is a Python library and CLI for building voice-powered RAG pipelines. Point it at a folder of documents, choose your LLM and voice provider, and get an OpenAI-compatible webhook ready to wire into ElevenLabs, Deepgram, or any voice platform.

pip install "voice-rag[elevenlabs]"
export OPENAI_API_KEY=sk-...
voice-rag init && voice-rag ingest ./docs --recreate && voice-rag serve
# → serving at http://localhost:8000/v1

What it does

your docs  →  Qdrant (hybrid dense + BM25)  →  retrieved chunks
                                                       ↓
voice platform  →  speech-to-text  →  /v1/chat/completions  →  LLM  →  TTS

Each turn from your voice platform hits the webhook, embeds the user utterance, retrieves the most relevant chunks, injects them into the system prompt, and streams the LLM response back as SSE — all in one pip install.


Install

# ElevenLabs voice + OpenAI LLM (most common)
pip install "voice-rag[elevenlabs]"

# All providers
pip install "voice-rag[all]"

# Pick only what you need
pip install "voice-rag[anthropic,pdf]"
Extra Adds
elevenlabs ElevenLabs voice adapter
deepgram Deepgram voice adapter
anthropic Anthropic (Claude) LLM client
gemini Google Gemini LLM client
pdf PDF parser (PyMuPDF)
docx Word document parser
all Everything above

Quickstart

# 1. Create a config file
voice-rag init

# 2. Ingest your documents (supports .md, .txt, .pdf, .docx)
voice-rag ingest ./docs --recreate

# 3. Start the webhook server
voice-rag serve

Point your ElevenLabs agent's Custom LLM URL to http://localhost:8000/v1.

By default, vectors are stored locally in .qdrant — no separate Qdrant server needed. Set vector_store.url to connect to a remote instance.


CLI reference

voice-rag init [--dir PATH]               # create voice-rag.yaml
voice-rag ingest <path> [--recreate]      # ingest a file or directory
voice-rag serve [--host] [--port] [--reload]
voice-rag query <text> [--limit N]        # test retrieval without a server
voice-rag inspect                         # show collection stats
voice-rag doctor                          # check API keys and Qdrant connectivity

Python API

from voice_rag import KnowledgeAgent, VoiceRagConfig

config = VoiceRagConfig()          # reads from voice-rag.yaml or env vars
agent = KnowledgeAgent(config=config)

agent.ingest("./docs", recreate=True)

app = agent.create_app()           # returns a FastAPI app
# run with: uvicorn app:app --port 8000

Configuration

Config is loaded from voice-rag.yaml (run voice-rag init to generate one) or environment variables. Environment variables override the YAML file.

Key Env var Default
llm.provider LLM_PROVIDER openai
llm.model LLM_MODEL gpt-4o-mini
llm.api_key / embedding.api_key OPENAI_API_KEY
llm.base_url LLM_BASE_URL https://api.openai.com/v1
embedding.model EMBEDDING_MODEL text-embedding-3-small
vector_store.url VECTOR_STORE_URL empty → local .qdrant
vector_store.collection_name VECTOR_STORE_COLLECTION_NAME knowledge_base
server.port SERVER_PORT 8000
server.enable_debug_retrieval SERVER_ENABLE_DEBUG_RETRIEVAL false

See voice-rag.yaml for the full annotated schema.


Providers

Category Supported
LLM OpenAI, Anthropic, Gemini (any OpenAI-compatible URL via llm.base_url)
Voice ElevenLabs, Deepgram
Embeddings OpenAI
Vector store Qdrant (local embedded or remote)
Parsers .txt, .md, .pdf, .docx

Starter kit

Want a full working demo with a Next.js frontend and Railway deploy button? See kytona/elevenlabs-knowledge-agent — a thin wrapper around voice-rag with an ElevenLabs voice UI.


Development

git clone https://github.com/kytona/voice-rag
cd voice-rag
pip install -e ".[all,dev]"
pytest tests/ -v

See CONTRIBUTING.md for how to add new LLM, voice, embedding, or vector store connectors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_rag-0.1.2.tar.gz (97.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_rag-0.1.2-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file voice_rag-0.1.2.tar.gz.

File metadata

  • Download URL: voice_rag-0.1.2.tar.gz
  • Upload date:
  • Size: 97.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9765cc224180d84ec4bb2a4f6931623950ca7507c0e91f02e8520add918ee548
MD5 30cd5a56c01147c5f46c3e0178b5d5d6
BLAKE2b-256 d8c5ff0b4ca561a2a75c9c2724c96b7daa3e70e4ae24a3e838706fd1e3221431

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.2.tar.gz:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voice_rag-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: voice_rag-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voice_rag-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1ba406949b2579eb17abca35e8731d4cecdb7b63d3dd3a8d32765e113312b25d
MD5 7c7640b3db4e4de94ff48d2895e36e4a
BLAKE2b-256 dd7ea2eac6b6ec36e3c185642fdff98f8130bf696d8a8182c771c6a753583e9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for voice_rag-0.1.2-py3-none-any.whl:

Publisher: publish.yml on kytona/voice-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page