Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works.

These details have not been verified by PyPI

Project links

Project description

QuantumRAG

QuantumRAG Hero

Open-source RAG engine that deeply indexes your documents so every answer is accurate, cited, and fast.

from quantumrag import Engine

engine = Engine()                # Zero config — auto-detects your API key
engine.ingest("./docs")          # PDF, DOCX, HWP, XLSX, PPTX, MD, CSV, TXT
result = engine.query("What are the key security findings?")
print(result.answer)
# The audit identified 3 critical findings: ... [1][2]
# Confidence: STRONGLY_SUPPORTED

Put documents in. Ask questions. Get cited answers. No configuration needed.

Why QuantumRAG

Most RAG systems embed documents into vectors and hope the right chunks come back. When questions are phrased differently, when answers span multiple documents, or when you need exact entity matches — they fail silently.

QuantumRAG takes a different approach: understand documents deeply at indexing time, so queries are fast and accurate by default.

Every document is understood through multiple lenses — semantic meaning, hypothetical questions it could answer, keywords and synonyms, structured facts, entity relationships — and these perspectives are fused at query time to find the right answer regardless of how you phrase the question.

The result: 176 scenario tests + 105 QA questions across 4 datasets — covering multi-hop reasoning, cross-document verification, numerical calculation, hallucination prevention, and entity-specific filtering that conventional RAG cannot handle.

Three Ways to Use It

	What you do	What happens
Just use it	`engine.ingest("./docs")` → `engine.query("...")`	Parser, chunker, index, routing — all auto-selected
Tune it	Adjust fusion weights, pick models, set domain	Better results for your specific use case
Own it	Custom parsers, chunkers, retrievers, generators	Every layer is replaceable via plugins

Zero configuration to first answer. Full control when you need it.

Quick Start

Installation

pip install quantumrag

# With all dependencies (recommended)
pip install quantumrag[all]

# Minimal + Korean support only
pip install quantumrag[korean]

CLI

# Initialize a project
quantumrag init

# Ingest documents
quantumrag ingest ./docs --recursive

# Ask a question
quantumrag query "What chunking strategies are available?"

# Interactive multi-turn chat
quantumrag chat

# Start HTTP API server with web playground
quantumrag serve --port 8000

Zero Configuration

Engine() auto-detects your environment and picks the best available provider:

Detected Key	Provider	Embedding	Generation
`OPENAI_API_KEY`	OpenAI	text-embedding-3-small	gpt-4.1-nano / gpt-4.1-mini
`GOOGLE_API_KEY`	Gemini	text-embedding-004	gemini-2.5-flash-lite / flash
`ANTHROPIC_API_KEY`	Anthropic	local (bge-m3)	claude-haiku / claude-sonnet
(none)	Ollama	local (bge-m3)	llama3.2:3b

Local Models (No API Key)

from quantumrag import Engine

engine = Engine(
    embedding_model="nomic-embed-text",
    generation_model="llama3.2",
)
engine.ingest("./docs")
result = engine.query("Summarize the documents")

Web Playground

Start the API server and use the built-in web playground:

quantumrag serve --port 8000

Open http://localhost:8000/playground to ingest documents and ask questions interactively.

QuantumRAG Web Playground

Korean Support

QuantumRAG treats Korean as a first language, not a translation.

Feature	Description
HWP/HWPX Parsing	Native parsing for Korean government/office documents
Kiwi Morphology	Accurate Korean tokenization for BM25 indexing
EUC-KR Encoding	Automatic legacy encoding detection and conversion
Mixed Script	Optimal tokenizer selection for Korean-English mixed text
Bilingual Prompts	System prompts switch based on query language
Korean Query Patterns	Agglutinative morphology-aware query routing

pip install kiwipiepy  # Required for Korean morphology

How It Works

Index-Heavy, Query-Light

The core design: expensive computation happens once at ingestion, enabling cheap and precise queries.

Indexing Pipeline (ingest time — heavy)

Documents (PDF, DOCX, HWP, PPTX, XLSX, HTML, MD, CSV, TXT)
  ├─ Auto-select parser & chunking strategy
  ├─ Multi-Resolution Summaries (document → section → chunk)
  ├─ Structured Fact Extraction (entities, attributes, relations)
  ├─ Derived Index Enrichment (synonyms, hierarchy terms)
  ├─ Entity-Centric Reverse Index (entity → chunk_id mapping)
  └─ Triple Index Build
       ├─ Original Embedding (semantic meaning)
       ├─ HyPE Embedding (hypothetical questions → embeddings)
       └─ Contextual BM25 (morphology-aware keyword index)

Query Pipeline (query time — light)

User Query
  ├─ Query Rewrite / Expansion
  ├─ Entity Detection & Attribute Filtering
  ├─ Adaptive Routing (simple → nano, medium → mini, complex → full)
  ├─ Triple Index Fusion Search (RRF: 0.4 / 0.35 / 0.25)
  ├─ Reranking (FlashRank / BGE / Cohere / Jina)
  ├─ Context Compression
  ├─ Source-Grounded Generation → Answer [1][2] + Confidence
  └─ Post-Correction (Retrieval Retry → Self-Correct → Fact Verify → Completeness)

Triple Index Fusion

Three retrieval methods combined via Reciprocal Rank Fusion — each catches what the others miss:

Index	What it finds	Why it matters
Original Embedding	Semantically similar content	Handles paraphrasing and conceptual queries
HyPE Embedding	Content that answers similar questions	Bridges the question↔document gap
Contextual BM25	Exact keyword and entity matches	Precise when you know what you're looking for

4-Level Indexing

All rule-based, zero LLM cost at ingest time:

Multi-Resolution Summaries — Document, section, and chunk-level for breadth
Structured Fact Extraction — Domain-aware patterns (IDs, severity, versions, contracts)
Derived Index Enrichment — Synonyms and hierarchy terms boost BM25 recall
Entity-Centric Reverse Index — Complete recall for entity queries and attribute filters

HTTP API

quantumrag serve --port 8000

Method	Endpoint	Description
`POST`	`/v1/ingest`	Ingest documents from path
`POST`	`/v1/ingest/upload`	Upload and ingest files
`POST`	`/v1/ingest/text`	Ingest raw text
`POST`	`/v1/query`	Query (sync)
`POST`	`/v1/query/stream`	Query (SSE streaming)
`GET`	`/v1/documents`	List documents
`DELETE`	`/v1/documents/{id}`	Delete a document
`GET`	`/v1/status`	Engine status
`POST`	`/v1/evaluate`	Run evaluation
`POST`	`/v1/feedback`	Submit feedback
`GET`	`/health`	Health check

Web playground: http://localhost:8000/playground Interactive API docs: http://localhost:8000/docs

Configuration

# quantumrag.yaml
project_name: "my-knowledge-base"
language: "ko"                          # ko, en, auto
domain: "general"                       # general, legal, medical, financial, technical

models:
  embedding:
    provider: "openai"                  # openai, gemini, ollama, local
    model: "text-embedding-3-small"
  generation:
    simple:
      provider: "openai"
      model: "gpt-5.4-nano"            # Low-cost for simple queries (~70%)
    medium:
      provider: "openai"
      model: "gpt-5.4-mini"            # Mid-tier for moderate queries (~20%)
    complex:
      provider: "anthropic"
      model: "claude-sonnet-4-20250514" # Full model for complex queries (~10%)
  reranker:
    provider: "flashrank"               # flashrank (free/CPU), bge, cohere, jina
  hype:
    provider: "openai"
    model: "gpt-5.4-nano"
    questions_per_chunk: 3

retrieval:
  top_k: 7
  fusion_weights:
    original: 0.4
    hype: 0.35
    bm25: 0.25
  rerank: true
  compression: true

storage:
  vector_db: "lancedb"
  document_store: "sqlite"
  data_dir: "./quantumrag_data"

Environment variables override config (prefix: QUANTUMRAG_):

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export QUANTUMRAG_LANGUAGE=ko

Evaluation

Built-in Evaluation

6 metrics, 176 scenario tests, and 4 QA datasets (105 questions):

engine = Engine()
result = engine.evaluate()
print(result.summary)
# retrieval_recall: 0.92, faithfulness: 0.95
# answer_relevancy: 0.88, completeness: 0.85

QA Datasets

Real-world web content used for systematic RAG validation:

Dataset	Focus	Questions	Pass Rate
ds-001	Multilingual + numerical precision	20	85-100%
ds-002	Type system + cross-topic confusion	25	88%
ds-003	Dense technical + cross-document	30	83-87%
ds-004	Table extraction + contradiction detection	30	77-90%
Combined	All sources merged (retrieval stress test)	105	29%

The Combined QA test reveals that retrieval precision is the key bottleneck at scale: 68 of 75 failures are retrieval-caused. This is the primary area for improvement.

# Individual dataset
.venv/bin/python datasets/run_qa.py ds-001

# Combined (retrieval precision test)
.venv/bin/python datasets/run_qa_combined.py

# Scenario tests
make scenario-test

Comparison

LangChain and LlamaIndex give you building blocks. OpenAI gives you a black box. QuantumRAG gives you an engine — opinionated defaults that work, with every layer customizable.

Feature	QuantumRAG	LangChain	LlamaIndex	OpenAI file_search
Triple Index (Embedding + HyPE + BM25)	Yes	No	No	No
4-Level Indexing	Yes	No	No	No
Entity-Centric Reverse Index	Yes	No	No	No
Korean Language (HWP, Kiwi)	Native	Plugin	Plugin	No
Adaptive Query Routing	Yes	Manual	No	No
Offline / Local LLM	Yes (Ollama)	Yes	Yes	No
Built-in Evaluation	Yes	Via LangSmith	Yes	No
Zero GPU Required	Yes	Depends	Depends	N/A
Zero Config to First Answer	Yes	No	No	Partial

Project Structure

quantumrag/
├── core/
│   ├── engine.py              # Single entry point for all operations
│   ├── config.py              # Configuration (Pydantic + YAML + env vars)
│   ├── models.py              # Data models (Chunk, Source, QueryResult, ...)
│   ├── ingest/
│   │   ├── parser/            # PDF, DOCX, PPTX, XLSX, HWP, HTML, MD, CSV, TXT
│   │   ├── chunker/           # Strategies: auto, semantic, fixed, structural
│   │   ├── indexer/           # Triple Index + 4-Level Indexing + fact extraction
│   │   └── denoiser.py        # Input quality filtering
│   ├── retrieve/
│   │   ├── fusion.py          # RRF triple index fusion search
│   │   ├── reranker.py        # FlashRank, BGE, Cohere, Jina
│   │   ├── query_classifier.py # Adaptive complexity routing
│   │   ├── entity_detector.py # Entity query detection + attribute filtering
│   │   └── fact_index.py      # Structured fact lookup
│   ├── generate/
│   │   ├── generator.py       # Source-grounded generation with citations
│   │   ├── router.py          # Simple/Medium/Complex query routing
│   │   ├── fact_verifier.py   # Hallucination detection (zero LLM cost)
│   │   ├── completeness.py    # Multi-part answer verification
│   │   ├── map_reduce.py      # Aggregation query processing
│   │   └── query_expander.py  # Colloquial → formal query expansion
│   ├── pipeline/
│   │   ├── postprocess.py     # Correction chain (retry → verify → complete)
│   │   └── context.py         # Pipeline context and document profiling
│   ├── storage/               # SQLite, LanceDB, Tantivy, Chroma, FAISS
│   ├── llm/                   # OpenAI, Anthropic, Gemini, Ollama
│   ├── autotune/              # Parameter optimization framework
│   ├── cache/                 # Semantic cache with TTL
│   └── evaluate/              # Metrics, synthetic QA generation
├── api/                       # FastAPI HTTP server + web playground
├── cli/                       # Typer CLI (init, ingest, query, serve, status)
├── connectors/                # File, S3, URL, Google Drive, Notion
├── korean/                    # Kiwi morphology, EUC-KR encoding
├── plugins/                   # Plugin registry & hook system
datasets/                      # QA datasets (4 datasets, 105 questions)
├── run_qa.py                  # Individual dataset runner
├── run_qa_combined.py         # Combined retrieval stress test
└── STATUS.md                  # Auto-generated dashboard
tests/
├── unit/                      # 782 unit tests
├── scenarios/                 # 176 scenario test cases (v1-v4)
├── security/                  # SSRF, path traversal, injection tests
└── scale/                     # Scale testing framework

Development

git clone https://github.com/quantumaikr/quantumrag.git
cd quantumrag
uv sync --dev

# Tiered testing
make quick           # Lint only (0.1s)
make smoke           # Lint + core tests (2s)
make check           # Lint + all unit tests (7s)
make scenario-test   # Scenario tests (requires API keys)

# Target-specific tests
make test-gen        # Generation tests
make test-ret        # Retrieval tests
make test-ingest     # Ingest tests
make test-api        # API/CLI tests

# Utilities
make fix             # Auto-fix lint issues
make help            # All available commands

System Requirements

Python: 3.10, 3.11, 3.12
RAM: 2GB minimum, 4GB+ recommended
GPU: Not required (CPU-only by default)
Storage: SQLite + LanceDB + Tantivy (all local, no external services)
OS: Linux, macOS, Windows (WSL2)

Documentation

Full documentation in English and Korean:

Contact

Developer: QuantumAI Inc.
Email: hi@quantumai.kr

License

Apache License 2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.4

Mar 31, 2026

0.4.3

Mar 29, 2026

0.4.2

Mar 29, 2026

This version

0.4.1

Mar 29, 2026

0.4.0

Mar 29, 2026

0.3.3

Mar 29, 2026

0.3.2

Mar 29, 2026

0.3.1

Mar 29, 2026

0.3.0

Mar 29, 2026

0.2.1

Mar 28, 2026

0.2.0

Mar 28, 2026

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantumrag-0.4.1.tar.gz (2.8 MB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quantumrag-0.4.1-py3-none-any.whl (294.8 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file quantumrag-0.4.1.tar.gz.

File metadata

Download URL: quantumrag-0.4.1.tar.gz
Upload date: Mar 29, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantumrag-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`25df26c5f2f6a1db16b97baeba0e081db6c72c744835dba7d5dcd5cb7b64b27f`
MD5	`4f162ff3be09fad17dc57f65a69cfd8e`
BLAKE2b-256	`a5c36ec7dc3d1a09b251f5850d4a8d8f4e6af8a5eda9ae9962fd4409eb552619`

See more details on using hashes here.

File details

Details for the file quantumrag-0.4.1-py3-none-any.whl.

File metadata

Download URL: quantumrag-0.4.1-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 294.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantumrag-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52be1ffb78cf643c5cd5bd8b8ba7233ffd0b1a71d49d63de1d280a7b7bf04e6a`
MD5	`0af00e58934420c09d46a1a7199c66e1`
BLAKE2b-256	`b53a741acebb69ea33cace874575927e139940dd1c99bab8e833440287e90347`

See more details on using hashes here.

quantumrag 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

QuantumRAG

Why QuantumRAG

Three Ways to Use It

Quick Start

Installation

CLI

Zero Configuration

Local Models (No API Key)

Web Playground

Korean Support

How It Works

Index-Heavy, Query-Light

Triple Index Fusion

4-Level Indexing

HTTP API

Configuration

Evaluation

Built-in Evaluation

QA Datasets

Comparison

Project Structure

Development

System Requirements

Documentation

Contact

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes