Sub-millisecond predictive retrieval plugin for Pipecat. Open-source VoiceAgentRAG.
Project description
pipecat-primd
Sub-millisecond predictive retrieval for Pipecat. Open-source VoiceAgentRAG.
pipecat-primd is a Pipecat FrameProcessor that connects a voice pipeline to primd — a Rust retrieval runtime that speculates on partial transcripts during STT and pre-warms next-turn answers during TTS.
End-of-utterance retrieval drops from ~157 µs (naive SIMD scan) to ~1.6 µs when speculation matches — 98× faster, deterministically, on a 100k-doc corpus.
Install
pip install pipecat-primd
You also need primd running locally:
cargo install primd-cli
primd index --input examples/faq.jsonl --out /tmp/primd-faq --embedder hashed
primd serve --index /tmp/primd-faq --bind 127.0.0.1:8080
Use
from pipecat.pipeline.pipeline import Pipeline
from pipecat_primd import PrimdRetriever
retriever = PrimdRetriever(
primd_url="http://127.0.0.1:8080",
top_k=5,
corpus_text={"faq-001": "We offer a 14-day free trial...", ...},
)
pipeline = Pipeline([
transport.input(),
stt,
retriever, # ← interim → /observe, final → /finalize, bot speak → /warm
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
That's it. The retriever:
- feeds
InterimTranscriptionFrameinto primd's/session/{id}/observeso primd starts retrieving while the user is still speaking, - hits
/session/{id}/finalizeonTranscriptionFrame— if the partial converged on the final, returns from a sub-microsecond cache, - calls
/session/{id}/warmonBotStartedSpeakingFrameso the next turn is already pre-scoped before STT lands.
Standalone Client
If you want primd retrieval without Pipecat (CLI tools, batch jobs, custom pipelines):
import asyncio
from pipecat_primd import PrimdClient
async def main() -> None:
async with PrimdClient("http://127.0.0.1:8080") as primd:
result = await primd.query("is there a free trial", top_k=3)
for hit in result.hits:
print(f"{hit.rank}: {hit.id} ({hit.event}) dist={hit.distance}")
asyncio.run(main())
The client also exposes the session methods directly: observe, finalize, warm, reset.
Why session calls?
/query is convenient but doesn't unlock the latency win — primd has no idea what's coming until you call it. The session API lets primd:
- Speculate on partial transcripts during STT (no critical-path cost).
- Short-circuit
/finalizeif the partial converged on the final — cache hit, microseconds. - Pre-warm the predictor's scope during TTS so the next turn's
/observeis already constrained.
This is the dual-agent fast-talker / slow-thinker pattern from Salesforce VoiceAgentRAG, wired up.
Compatibility
Tested against Pipecat 0.0.49+. Frame names occasionally shift between Pipecat releases — if your version renames InterimTranscriptionFrame, TranscriptionFrame, or BotStartedSpeakingFrame, update the imports in pipecat_primd/retriever.py.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipecat_primd-0.1.0.tar.gz.
File metadata
- Download URL: pipecat_primd-0.1.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fe97bddde66cbf4b945ec0324f1cecc5c2d3ff149b9e0f23b184183e592076b
|
|
| MD5 |
c9cc76aa5f8c7966de4f68fddbb18c1e
|
|
| BLAKE2b-256 |
c5b1fa912138fe4e6224b5336d66b09595a109c1fd28c6d25fa3638fb960de4c
|
File details
Details for the file pipecat_primd-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pipecat_primd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c4f10691c25667a0d9a901dad1b2634d627e10425882a738703e26ae42b7b3c
|
|
| MD5 |
867dbbb805baf8dee7e13d11649b18ba
|
|
| BLAKE2b-256 |
3a22b7c8ba77f07d76805358accfe415f6eed736dcad328a9edf6d2de05fd217
|