Skip to main content

Faster and smarter Retrieval Augmented Generation using Speculative Retrieval and Context Tetris.

Project description

Quira Logo

Quira

Lightning-Fast, Context-Dense RAG Framework for Python

Stop waiting. Start predicting.


PyPI License Python GitHub



Quickstart  ·  How It Works  ·  Benchmarks  ·  API  ·  Contributing



🔥 The Problem

Traditional RAG is slow and wasteful:

User types query → Hits Enter → WAIT → Vector search → WAIT → Stuff 10 chunks → WAIT → LLM response
                                 ⏱️ 1.5s avg latency, 65% of context is noise

✨ The Quira Solution

Quira predicts what users need before they finish typing, compresses context to maximize density, and tracks conversation state to eliminate redundant fetches:

User starts typing → Quira searches speculatively → User hits Enter → Context already cached!
                     → Differential fetch (only new chunks) → Context Tetris (compress + score)
                                 ⏱️ 210ms avg latency, 94% context density

📦 Quickstart

Install

pip install quira

Usage

import asyncio
from quira import quiraPipeline, UserSession

async def main():
    # Initialize with your own clients
    pipeline = quiraPipeline(
        qdrant_client=qdrant,
        redis_client=redis,
        groq_client=groq,
        embed_func=my_embed_func,
        spacy_model=my_spacy_model
    )

    session = UserSession(user_id="user_123")

    # 🏎️ Speculative fetch while user types
    await pipeline.handle_typing_event(session, "What is the re")

    # 🎯 Submit — context is already warm!
    answer = await pipeline.process_submission(
        session, "What is the return policy?"
    )
    print(answer)

asyncio.run(main())

Ingest PDFs

# Parse, chunk, embed, and store — one line.
chunks = await pipeline.ingestor.ingest_pdf("user_123", "docs/return_policy.pdf")
print(f"Indexed {chunks} chunks into Qdrant")

⚙️ How It Works

Quira is built on 4 core modules that work together as a unified pipeline:

🏎️ Module 1 — Speculative Retrieval

Listens to user keystrokes via WebSocket. Uses adaptive debouncing (250ms–600ms based on typing speed) to fire Qdrant searches before the user submits. Results are cached in Redis with SHA-256 hashed keys.

🧩 Module 2 — Context Tetris

Scores every chunk on 4 dimensions: Relevance, Recency, Uniqueness, and Density. Uses Groq LLM to compress filler text. Orders chunks in a U-shape (best chunks at start and end) to combat "Lost in the Middle" syndrome.

🔄 Module 3 — Differential Retrieval

Maintains a stateful Context Pool across conversation turns. Measures cosine similarity between consecutive queries. If similarity > 0.6, fetches only delta chunks. Garbage-collects stale context when topics shift.

📄 Module 4 — Document Ingestion

Parses PDFs with PyMuPDF. Splits text into overlapping chunks (1000 chars / 200 overlap by default) to prevent sentence fragmentation. Generates embeddings and upserts directly into Qdrant.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        QUIRA PIPELINE                        │
│                                                              │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────────┐  │
│  │  Speculative │───▶│ Differential │───▶│ Context Tetris │  │
│  │  Retriever   │    │  Retriever   │    │  (Compress +   │  │
│  │  (Predict)   │    │  (Delta)     │    │   Score + Pack)│  │
│  └──────┬───────┘    └──────┬───────┘    └───────┬────────┘  │
│         │                   │                    │           │
│    ┌────▼────┐         ┌────▼────┐          ┌────▼────┐     │
│    │  Redis  │         │ Qdrant  │          │  Groq   │     │
│    │ (Cache) │         │(Vectors)│          │  (LLM)  │     │
│    └─────────┘         └─────────┘          └─────────┘     │
└──────────────────────────────────────────────────────────────┘

📊 Benchmarks

Metric Traditional RAG Quira Improvement
Avg Latency 1,450 ms 210 ms 🚀 85% faster
Context Density 35% 94% 🧠 2.6× denser
Token Cost Baseline -40% 💰 40% cheaper
Redundant Fetches Every turn Delta only ♻️ ~70% fewer

📚 API Reference

quiraPipeline(qdrant, redis, groq, embed_func, spacy_model)

The main pipeline class. Accepts your own client instances.

Method Description
handle_typing_event(session, keystrokes) Trigger speculative retrieval on keystrokes
process_submission(session, query) Full retrieval + compression pipeline
ingestor.ingest_pdf(user_id, path) Parse, chunk, embed, and store a PDF
ingestor.ingest_text(user_id, text) Chunk, embed, and store raw text

UserSession(user_id, websocket=None)

Tracks per-user conversation state, context pools, and turn history.


🔒 Security

Quira is regularly audited with Bandit (Python AST security linter):

  • 0 vulnerabilities across all severity levels
  • ✅ SHA-256 hashing for all cache keys (no weak hashes)
  • ✅ No hardcoded secrets or credentials
  • ✅ Safe file I/O with proper exception handling

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

# Clone the repo
git clone https://github.com/DevDarsh26/quira.git
cd quira

# Create a virtual environment
python -m venv .venv
.venv\Scripts\activate   # Windows
source .venv/bin/activate  # macOS/Linux

# Install in editable mode with dev dependencies
pip install -e ".[dev]"


Built with ❤️ by darshmodii.in

GitHub   Website

If you like Quira, drop a ⭐ on GitHub — it means the world!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quira-0.1.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quira-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file quira-0.1.0.tar.gz.

File metadata

  • Download URL: quira-0.1.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for quira-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c24309fbfdcaf8fe32f91fafb60c5a56a5a0959cda4f0dddc697b39cf298dd85
MD5 7ec483aa543891339bfde2c481efc64d
BLAKE2b-256 eaebfb0c005c447eb759f24b222fb0bcb103273184c0cce98301d9d7e3f6ce98

See more details on using hashes here.

Provenance

The following attestation bundles were made for quira-0.1.0.tar.gz:

Publisher: publish.yml on DevDarsh26/Quira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file quira-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: quira-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for quira-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1bb78165062bc963c5f2cdb93f48e9d8b6e796bd366d4737592cb99f03086639
MD5 96336654a9658ac1b78255da2fb6ca30
BLAKE2b-256 13821772e34414555d391b29fae80f4839e592ceec029ddcaa9efbc8185ed7b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for quira-0.1.0-py3-none-any.whl:

Publisher: publish.yml on DevDarsh26/Quira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page