Faster and smarter Retrieval Augmented Generation using Speculative Retrieval and Context Tetris.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

darshmodii

These details have not been verified by PyPI

Project description

Quira

Lightning-Fast, Context-Dense RAG Framework for Python

Stop waiting. Start predicting.

Quickstart · How It Works · Cost Savings · API · Contributing

🔥 The Problem with Traditional RAG

Traditional Retrieval-Augmented Generation (RAG) is slow and expensive:

High Latency: User types query → Hits Enter → WAIT → Vector search → WAIT → Stuff 10 large chunks into LLM → WAIT → Response.
"Lost in the Middle" Syndrome: You stuff massive chunks of text into the context window, most of which is useless filler. The LLM loses track of the actual facts.
Expensive Redundancy: On every turn of the conversation, you re-fetch and re-process the exact same context over and over again.

✨ The Quira Solution

Quira solves this by predicting what users need before they finish typing, dynamically compressing context to maximize density, and statefully tracking the conversation.

⏱️ 85% faster latency | 🧠 2.6× denser context | 💰 40% cheaper token costs

🏗️ Architecture

graph TD
    User([User Typing]) -->|WebSocket Stream| Speculative[1. Speculative Retriever]
    Speculative -->|Predictive Search| Cache[(Redis Cache)]
    UserSubmit([User Hits Enter]) --> Diff[3. Differential Retriever]
    Diff -->|Cosine Similarity > 0.6?| DeltaFetch{Fetch Delta Chunks Only}
    Cache --> DeltaFetch
    DeltaFetch --> Tetris[2. Context Tetris]
    Tetris -->|Relevance, Recency, Density| Groq[Groq LLM Compression]
    Groq -->|U-Shape Order| FinalContext[Packed Context]
    FinalContext --> MainLLM{Your Main LLM}

📦 Quickstart

1. Install via pip

pip install quira

2. Basic Setup

Quira does not hardcode API keys. You bring your own clients, meaning you have full control over your usage and billing.

import asyncio
from quira import quiraPipeline, UserSession
from qdrant_client import QdrantClient
from groq import Groq
import spacy
from fastembed import TextEmbedding

async def main():
    # 1. Initialize your clients (Bring Your Own Keys)
    qdrant = QdrantClient(":memory:") # Or your cloud Qdrant URL
    redis_mock = None # Pass a real Upstash Redis client in production
    groq = Groq(api_key="your_groq_api_key") 
    spacy_model = spacy.load("en_core_web_sm")
    
    embed_model = TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
    embed_func = lambda text: list(embed_model.embed([text]))[0]

    # 2. Initialize Quira Pipeline
    pipeline = quiraPipeline(
        qdrant_client=qdrant,
        redis_client=redis_mock,
        groq_client=groq,
        embed_func=embed_func,
        spacy_model=spacy_model
    )

    # 3. Create a session for a specific user
    session = UserSession(user_id="user_123")

    # 4. Ingest some documents!
    print("Ingesting document...")
    await pipeline.ingestor.ingest_text("user_123", "Our return policy allows returns within 30 days of purchase.")

    # 5. 🏎️ Speculative fetch (triggers while user is typing in the UI)
    await pipeline.handle_typing_event(session, "What is the re")

    # 6. 🎯 Submit (Context is already warm from the speculative fetch!)
    answer = await pipeline.process_submission(
        session, "What is the return policy?"
    )
    print(answer)

if __name__ == "__main__":
    asyncio.run(main())

⚙️ How It Works: The 4 Core Modules

Quira is built on 4 beautifully orchestrated modules:

🏎️ Module 1: Speculative Retrieval

Instead of waiting for the user to hit "Enter", Quira listens to keystrokes. Using adaptive debouncing (250ms–600ms based on typing speed), it fires Qdrant searches in the background. By the time the user hits Enter, the vector search is already cached in Redis.

🧩 Module 2: Context Tetris

Not all retrieved context is equal. Quira scores every chunk on 4 dimensions:

Relevance (Cosine similarity)
Recency (Half-life decay for older chunks)
Uniqueness (Penalizes duplicate information)
Density (Entity-to-token ratio)

It then uses the blazing-fast Groq LLM to compress filler text out of the chunks, and orders them in a U-shape (best chunks at the very start and end) to prevent the LLM from "losing" facts in the middle of the prompt.

🔄 Module 3: Differential Retrieval

In a normal RAG chat, asking a follow-up question triggers a completely new vector search. Quira maintains a Context Pool. It measures the cosine similarity between the current and previous query. If the topic hasn't changed drastically, Quira only fetches Delta Chunks (new information) and merges it, saving massive amounts of redundant processing.

📄 Module 4: Document Ingestion

Built-in PyMuPDF parsing with overlapping text chunking (default 1000 chars / 200 overlap) to prevent sentence fragmentation. Automatically generates embeddings and upserts them directly into Qdrant.

💰 Why Quira Saves You Money

You might wonder: "Doesn't using Groq for Context Tetris cost extra money?"

No, it actually saves you up to 40% on your bill. Here's why:

Groq is Hyper-Cheap: The llama-3.1-8b-instant model used to compress context costs fractions of a penny.
Your Main LLM is Expensive: You are likely sending your final prompt to a heavy model like GPT-4o or Claude 3.5 Sonnet. By using cheap Groq tokens to compress the context, you send significantly fewer tokens to the expensive main LLM.
Differential Caching: You stop re-fetching and re-sending identical chunks of text on every single conversational turn.

📊 Benchmarks

Metric	Traditional RAG	Quira	Improvement
Avg Latency	1,450 ms	210 ms	🚀 85% faster
Context Density	35%	94%	🧠 2.6× denser
Token Cost	Baseline	-40%	💰 40% cheaper
Redundant Fetches	Every turn	Delta only	♻️ ~70% fewer

📚 API Reference

`quiraPipeline(qdrant, redis, groq, embed_func, spacy_model)`

The main pipeline class. Accepts your own client instances.

Method	Description
`handle_typing_event(session, keystrokes)`	Trigger speculative retrieval on keystrokes
`process_submission(session, query)`	Full retrieval + compression pipeline
`ingestor.ingest_pdf(user_id, path)`	Parse, chunk, embed, and store a PDF
`ingestor.ingest_text(user_id, text)`	Chunk, embed, and store raw text

`UserSession(user_id, websocket=None)`

Tracks per-user conversation state, context pools, and turn history. Keeps different users' data strictly isolated.

🔒 Security

Quira is regularly audited:

✅ 0 vulnerabilities across all severity levels via Bandit
✅ SHA-256 hashing for all cache keys (no weak hashes)
✅ Bring Your Own Keys architecture — absolutely zero API keys or credentials are included or required by the library itself. You retain 100% control over your API secrets.

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

# Clone the repo
git clone https://github.com/DevDarsh26/Quira.git
cd Quira

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

Built with ❤️ by darshmodii.in

_{If you like Quira, drop a ⭐ on GitHub — it means the world!}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

darshmodii

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 16, 2026

0.1.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quira-0.2.0.tar.gz (30.7 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quira-0.2.0-py3-none-any.whl (34.4 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file quira-0.2.0.tar.gz.

File metadata

Download URL: quira-0.2.0.tar.gz
Upload date: Jun 16, 2026
Size: 30.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for quira-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a6824a97e5986ac76dc66e7463745c72c3c5afdf346cf9642b14b488948708d1`
MD5	`4f4bb1c77fbf0fdc1f5322a7df5c1c3d`
BLAKE2b-256	`0cc605f6cd35d33fb6a62825f3bfbfe330af73cb682774edc48b41ed612e48c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quira-0.2.0.tar.gz:

Publisher: publish.yml on DevDarsh26/Quira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quira-0.2.0.tar.gz
- Subject digest: a6824a97e5986ac76dc66e7463745c72c3c5afdf346cf9642b14b488948708d1
- Sigstore transparency entry: 1838538056
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: DevDarsh26/Quira@a4f9801a9224f189afdfe1168e5526e727ac821d
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DevDarsh26
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a4f9801a9224f189afdfe1168e5526e727ac821d
- Trigger Event: push

File details

Details for the file quira-0.2.0-py3-none-any.whl.

File metadata

Download URL: quira-0.2.0-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for quira-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f6a7a3187465073c909be6b4341b3a026ca6fccf068d9ff448730f605903dd7`
MD5	`16d6b1756795f7b9f40f4bd3718ab492`
BLAKE2b-256	`f25b1649fb45d11c85b155e46edc5d8a8142968397a0e7473a922f20fa846e38`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quira-0.2.0-py3-none-any.whl:

Publisher: publish.yml on DevDarsh26/Quira

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quira-0.2.0-py3-none-any.whl
- Subject digest: 6f6a7a3187465073c909be6b4341b3a026ca6fccf068d9ff448730f605903dd7
- Sigstore transparency entry: 1838538169
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: DevDarsh26/Quira@a4f9801a9224f189afdfe1168e5526e727ac821d
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DevDarsh26
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a4f9801a9224f189afdfe1168e5526e727ac821d
- Trigger Event: push

quira 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Quira

🔥 The Problem with Traditional RAG

✨ The Quira Solution

🏗️ Architecture

📦 Quickstart

1. Install via pip

2. Basic Setup

⚙️ How It Works: The 4 Core Modules

🏎️ Module 1: Speculative Retrieval

🧩 Module 2: Context Tetris

🔄 Module 3: Differential Retrieval

📄 Module 4: Document Ingestion

💰 Why Quira Saves You Money

📊 Benchmarks

📚 API Reference

quiraPipeline(qdrant, redis, groq, embed_func, spacy_model)

UserSession(user_id, websocket=None)

🔒 Security

🤝 Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`quiraPipeline(qdrant, redis, groq, embed_func, spacy_model)`

`UserSession(user_id, websocket=None)`