Skip to main content

Sub-millisecond predictive retrieval plugin for Pipecat. Open-source VoiceAgentRAG.

Project description

pipecat-primd

Sub-millisecond predictive retrieval for Pipecat. Open-source VoiceAgentRAG.

pipecat-primd is a Pipecat FrameProcessor that connects a voice pipeline to primd — a Rust retrieval runtime that speculates on partial transcripts during STT and pre-warms next-turn answers during TTS.

End-of-utterance retrieval drops from ~157 µs (naive SIMD scan) to ~1.6 µs when speculation matches — 98× faster, deterministically, on a 100k-doc corpus.

Install

pip install pipecat-primd

You also need primd running locally:

cargo install primd-cli
primd index --input examples/faq.jsonl --out /tmp/primd-faq --embedder hashed
primd serve --index /tmp/primd-faq --bind 127.0.0.1:8080

Use

from pipecat.pipeline.pipeline import Pipeline
from pipecat_primd import PrimdRetriever

retriever = PrimdRetriever(
    primd_url="http://127.0.0.1:8080",
    top_k=5,
    corpus_text={"faq-001": "We offer a 14-day free trial...", ...},
)

pipeline = Pipeline([
    transport.input(),
    stt,
    retriever,                # ← interim → /observe, final → /finalize, bot speak → /warm
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant(),
])

That's it. The retriever:

  • feeds InterimTranscriptionFrame into primd's /session/{id}/observe so primd starts retrieving while the user is still speaking,
  • hits /session/{id}/finalize on TranscriptionFrame — if the partial converged on the final, returns from a sub-microsecond cache,
  • calls /session/{id}/warm on BotStartedSpeakingFrame so the next turn is already pre-scoped before STT lands.

Standalone Client

If you want primd retrieval without Pipecat (CLI tools, batch jobs, custom pipelines):

import asyncio
from pipecat_primd import PrimdClient

async def main() -> None:
    async with PrimdClient("http://127.0.0.1:8080") as primd:
        result = await primd.query("is there a free trial", top_k=3)
        for hit in result.hits:
            print(f"{hit.rank}: {hit.id} ({hit.event}) dist={hit.distance}")

asyncio.run(main())

The client also exposes the session methods directly: observe, finalize, warm, reset.

Why session calls?

/query is convenient but doesn't unlock the latency win — primd has no idea what's coming until you call it. The session API lets primd:

  1. Speculate on partial transcripts during STT (no critical-path cost).
  2. Short-circuit /finalize if the partial converged on the final — cache hit, microseconds.
  3. Pre-warm the predictor's scope during TTS so the next turn's /observe is already constrained.

This is the dual-agent fast-talker / slow-thinker pattern from Salesforce VoiceAgentRAG, wired up.

Compatibility

Tested against Pipecat 0.0.49+. Frame names occasionally shift between Pipecat releases — if your version renames InterimTranscriptionFrame, TranscriptionFrame, or BotStartedSpeakingFrame, update the imports in pipecat_primd/retriever.py.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_primd-0.1.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipecat_primd-0.1.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file pipecat_primd-0.1.0.tar.gz.

File metadata

  • Download URL: pipecat_primd-0.1.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pipecat_primd-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7fe97bddde66cbf4b945ec0324f1cecc5c2d3ff149b9e0f23b184183e592076b
MD5 c9cc76aa5f8c7966de4f68fddbb18c1e
BLAKE2b-256 c5b1fa912138fe4e6224b5336d66b09595a109c1fd28c6d25fa3638fb960de4c

See more details on using hashes here.

File details

Details for the file pipecat_primd-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pipecat_primd-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pipecat_primd-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c4f10691c25667a0d9a901dad1b2634d627e10425882a738703e26ae42b7b3c
MD5 867dbbb805baf8dee7e13d11649b18ba
BLAKE2b-256 3a22b7c8ba77f07d76805358accfe415f6eed736dcad328a9edf6d2de05fd217

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page