Faster and smarter Retrieval Augmented Generation using Speculative Retrieval and Context Tetris.
Project description
Quira
Lightning-Fast, Context-Dense RAG Framework for Python
Stop waiting. Start predicting.
Quickstart · How It Works · Benchmarks · API · Contributing
🔥 The Problem
Traditional RAG is slow and wasteful:
User types query → Hits Enter → WAIT → Vector search → WAIT → Stuff 10 chunks → WAIT → LLM response
⏱️ 1.5s avg latency, 65% of context is noise
✨ The Quira Solution
Quira predicts what users need before they finish typing, compresses context to maximize density, and tracks conversation state to eliminate redundant fetches:
User starts typing → Quira searches speculatively → User hits Enter → Context already cached!
→ Differential fetch (only new chunks) → Context Tetris (compress + score)
⏱️ 210ms avg latency, 94% context density
📦 Quickstart
Install
pip install quira
Usage
import asyncio
from quira import quiraPipeline, UserSession
async def main():
# Initialize with your own clients
pipeline = quiraPipeline(
qdrant_client=qdrant,
redis_client=redis,
groq_client=groq,
embed_func=my_embed_func,
spacy_model=my_spacy_model
)
session = UserSession(user_id="user_123")
# 🏎️ Speculative fetch while user types
await pipeline.handle_typing_event(session, "What is the re")
# 🎯 Submit — context is already warm!
answer = await pipeline.process_submission(
session, "What is the return policy?"
)
print(answer)
asyncio.run(main())
Ingest PDFs
# Parse, chunk, embed, and store — one line.
chunks = await pipeline.ingestor.ingest_pdf("user_123", "docs/return_policy.pdf")
print(f"Indexed {chunks} chunks into Qdrant")
⚙️ How It Works
Quira is built on 4 core modules that work together as a unified pipeline:
🏎️ Module 1 — Speculative RetrievalListens to user keystrokes via WebSocket. Uses adaptive debouncing (250ms–600ms based on typing speed) to fire Qdrant searches before the user submits. Results are cached in Redis with SHA-256 hashed keys. |
🧩 Module 2 — Context TetrisScores every chunk on 4 dimensions: Relevance, Recency, Uniqueness, and Density. Uses Groq LLM to compress filler text. Orders chunks in a U-shape (best chunks at start and end) to combat "Lost in the Middle" syndrome. |
🔄 Module 3 — Differential RetrievalMaintains a stateful Context Pool across conversation turns. Measures cosine similarity between consecutive queries. If similarity > 0.6, fetches only delta chunks. Garbage-collects stale context when topics shift. |
📄 Module 4 — Document IngestionParses PDFs with PyMuPDF. Splits text into overlapping chunks (1000 chars / 200 overlap by default) to prevent sentence fragmentation. Generates embeddings and upserts directly into Qdrant. |
Architecture
┌──────────────────────────────────────────────────────────────┐
│ QUIRA PIPELINE │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Speculative │───▶│ Differential │───▶│ Context Tetris │ │
│ │ Retriever │ │ Retriever │ │ (Compress + │ │
│ │ (Predict) │ │ (Delta) │ │ Score + Pack)│ │
│ └──────┬───────┘ └──────┬───────┘ └───────┬────────┘ │
│ │ │ │ │
│ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │
│ │ Redis │ │ Qdrant │ │ Groq │ │
│ │ (Cache) │ │(Vectors)│ │ (LLM) │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└──────────────────────────────────────────────────────────────┘
📊 Benchmarks
| Metric | Traditional RAG | Quira | Improvement |
|---|---|---|---|
| Avg Latency | 1,450 ms | 210 ms | 🚀 85% faster |
| Context Density | 35% | 94% | 🧠 2.6× denser |
| Token Cost | Baseline | -40% | 💰 40% cheaper |
| Redundant Fetches | Every turn | Delta only | ♻️ ~70% fewer |
📚 API Reference
quiraPipeline(qdrant, redis, groq, embed_func, spacy_model)
The main pipeline class. Accepts your own client instances.
| Method | Description |
|---|---|
handle_typing_event(session, keystrokes) |
Trigger speculative retrieval on keystrokes |
process_submission(session, query) |
Full retrieval + compression pipeline |
ingestor.ingest_pdf(user_id, path) |
Parse, chunk, embed, and store a PDF |
ingestor.ingest_text(user_id, text) |
Chunk, embed, and store raw text |
UserSession(user_id, websocket=None)
Tracks per-user conversation state, context pools, and turn history.
🔒 Security
Quira is regularly audited with Bandit (Python AST security linter):
- ✅ 0 vulnerabilities across all severity levels
- ✅ SHA-256 hashing for all cache keys (no weak hashes)
- ✅ No hardcoded secrets or credentials
- ✅ Safe file I/O with proper exception handling
🤝 Contributing
Contributions are welcome! Please open an issue or submit a pull request.
# Clone the repo
git clone https://github.com/DevDarsh26/quira.git
cd quira
# Create a virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/Linux
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quira-0.1.0.tar.gz.
File metadata
- Download URL: quira-0.1.0.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c24309fbfdcaf8fe32f91fafb60c5a56a5a0959cda4f0dddc697b39cf298dd85
|
|
| MD5 |
7ec483aa543891339bfde2c481efc64d
|
|
| BLAKE2b-256 |
eaebfb0c005c447eb759f24b222fb0bcb103273184c0cce98301d9d7e3f6ce98
|
Provenance
The following attestation bundles were made for quira-0.1.0.tar.gz:
Publisher:
publish.yml on DevDarsh26/Quira
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quira-0.1.0.tar.gz -
Subject digest:
c24309fbfdcaf8fe32f91fafb60c5a56a5a0959cda4f0dddc697b39cf298dd85 - Sigstore transparency entry: 1838028800
- Sigstore integration time:
-
Permalink:
DevDarsh26/Quira@30b8c8608ada7e3aca33562a36ae1bd6b7678786 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DevDarsh26
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@30b8c8608ada7e3aca33562a36ae1bd6b7678786 -
Trigger Event:
push
-
Statement type:
File details
Details for the file quira-0.1.0-py3-none-any.whl.
File metadata
- Download URL: quira-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bb78165062bc963c5f2cdb93f48e9d8b6e796bd366d4737592cb99f03086639
|
|
| MD5 |
96336654a9658ac1b78255da2fb6ca30
|
|
| BLAKE2b-256 |
13821772e34414555d391b29fae80f4839e592ceec029ddcaa9efbc8185ed7b7
|
Provenance
The following attestation bundles were made for quira-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on DevDarsh26/Quira
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quira-0.1.0-py3-none-any.whl -
Subject digest:
1bb78165062bc963c5f2cdb93f48e9d8b6e796bd366d4737592cb99f03086639 - Sigstore transparency entry: 1838028974
- Sigstore integration time:
-
Permalink:
DevDarsh26/Quira@30b8c8608ada7e3aca33562a36ae1bd6b7678786 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DevDarsh26
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@30b8c8608ada7e3aca33562a36ae1bd6b7678786 -
Trigger Event:
push
-
Statement type: