Skip to main content

Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works.

Project description

QuantumRAG

Python 3.10+ License: Apache 2.0 Status: Alpha Scenario Tests Code style: ruff

English | 한국어

Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works.

QuantumRAG is an open-source Retrieval-Augmented Generation engine that pushes expensive computation to indexing time for fast, accurate queries. It features Triple Index Fusion (Original Embedding + HyPE + Contextual BM25) combined with 4-Level Indexing (Multi-Resolution Summaries, Structured Fact Extraction, Derived Index Enrichment, Entity-Centric Reverse Index) to achieve 98.9% accuracy across 87 real-world scenario tests.


Key Features

Retrieval

  • Triple Index Fusion — Original Embedding + HyPE (Hypothetical Prompt Embedding) + Contextual BM25, combined via Reciprocal Rank Fusion (RRF)
  • 4-Level Indexing — Multi-Resolution summaries, Structured Fact extraction, Derived synonym/hierarchy terms, Entity-Centric reverse index
  • Adaptive Query Routing — Automatic classification into Simple / Medium / Complex paths with per-tier model selection
  • Entity-Centric Reverse Index — Exact-match recall for entity IDs (SEC-001, PAT-003), attribute filters (severity:Critical), and range queries (severity >= High)

Generation

  • Source-Grounded Generation — Every answer cites specific chunks with inline [1], [2] references
  • Map-Reduce RAG — Parallel extraction + aggregation for enumeration and cross-document queries
  • Query Decomposition — Compound questions are split into sub-queries for independent retrieval
  • Confidence AssessmentSTRONGLY_SUPPORTED, PARTIALLY_SUPPORTED, or INSUFFICIENT_EVIDENCE with configurable thresholds

Infrastructure

  • Korean-First — Native HWP/HWPX parsing, Kiwi morphological analysis, EUC-KR encoding, bilingual prompts
  • Multi-Format Parsing — PDF, DOCX, PPTX, XLSX, HTML, Markdown, CSV, HWP/HWPX, plain text
  • Multi-Provider LLM — OpenAI, Anthropic, Google Gemini, Ollama (local) with per-tier configuration
  • HTTP API — FastAPI server with SSE streaming, API key auth, rate limiting
  • Built-in Evaluation — Synthetic QA generation, Recall@K, Faithfulness, Answer Relevancy, Completeness metrics
  • Plugin System — Extend with custom parsers, chunkers, retrievers, generators
  • Multi-Tenant — Isolated storage per tenant with configurable limits
  • Data Connectors — Local filesystem, Google Drive, Notion, AWS S3, Web URL

How It Works

Indexing Pipeline (ingest time — heavy)

Documents (PDF, DOCX, HWP, ...)
  ├─ Parse & Chunk (auto/semantic/fixed/structural strategies)
  ├─ Multi-Resolution Summaries (document → section → chunk)
  ├─ Structured Fact Extraction (entities, attributes, relations)
  ├─ Derived Index Enrichment (synonyms, hierarchy terms for BM25)
  ├─ Entity-Centric Reverse Index (entity → chunk_id mapping)
  └─ Triple Index Build
       ├─ Original Embedding (text-embedding-3-small)
       ├─ HyPE Embedding (hypothetical questions → embeddings)
       └─ Contextual BM25 (Kiwi morphology tokenized terms)

Query Pipeline (query time — light)

User Query
  ├─ Query Rewrite / Decomposition
  ├─ Entity Detection (IDs, severity filters, status filters)
  ├─ Adaptive Routing (simple → nano, medium → mini, complex → full)
  ├─ Triple Index Fusion Search (RRF: 0.4 / 0.35 / 0.25)
  ├─ Entity Index Injection (exact-match chunks merged into results)
  ├─ Reranking (FlashRank / BGE / Cohere / Jina)
  ├─ Context Compression (extractive, query-aware)
  ├─ Source-Grounded Generation (with citations)
  └─ Confidence Assessment → Answer [1][2]

Quick Start

Installation

pip install quantumrag

# With all dependencies (recommended)
pip install quantumrag[all]

# Minimal + Korean support only
pip install quantumrag[korean]

Python SDK

from quantumrag import Engine

engine = Engine()
engine.ingest("./docs")
result = engine.query("What is the main topic?")
print(result.answer)
# Sources: [1] report.pdf (p.3), [2] summary.docx (p.1)

CLI

# Initialize a project
quantumrag init

# Ingest documents
quantumrag ingest ./docs --recursive

# Ask a question
quantumrag query "What is the revenue?"

# Watch mode — auto-ingest on file changes
quantumrag ingest ./docs --watch

# Start HTTP API server
quantumrag serve --port 8000

Local Models (No API Key)

from quantumrag import Engine

engine = Engine(
    embedding_model="nomic-embed-text",
    generation_model="llama3.2",
)
engine.ingest("./docs")
result = engine.query("Summarize the documents")

Configuration

# quantumrag.yaml
project_name: "my-knowledge-base"
language: "ko"                          # ko, en, auto
domain: "general"                       # general, legal, medical, financial, technical

models:
  embedding:
    provider: "openai"                  # openai, gemini, ollama, local
    model: "text-embedding-3-small"
  generation:
    simple:
      provider: "openai"
      model: "gpt-5.4-nano"            # Low-cost for simple queries (~70%)
    medium:
      provider: "openai"
      model: "gpt-5.4-mini"            # Mid-tier for moderate queries (~20%)
    complex:
      provider: "anthropic"
      model: "claude-sonnet-4-20250514" # Full model for complex queries (~10%)
  reranker:
    provider: "flashrank"               # flashrank (free/CPU), bge, cohere, jina
  hype:
    provider: "openai"
    model: "gpt-5.4-nano"
    questions_per_chunk: 3

retrieval:
  top_k: 7
  fusion_weights:
    original: 0.4
    hype: 0.35
    bm25: 0.25
  rerank: true
  compression: true

storage:
  vector_db: "lancedb"
  document_store: "sqlite"
  data_dir: "./quantumrag_data"

Environment variables override config (prefix: QUANTUMRAG_):

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export QUANTUMRAG_LANGUAGE=ko
# Nested: QUANTUMRAG_MODELS__EMBEDDING__PROVIDER=gemini

HTTP API

quantumrag serve --port 8000
Method Endpoint Description
POST /v1/ingest Ingest documents from path
POST /v1/ingest/upload Upload and ingest files
POST /v1/ingest/text Ingest raw text
POST /v1/query Query (sync)
POST /v1/query/stream Query (SSE streaming)
GET /v1/documents List documents
DELETE /v1/documents/{id} Delete a document
GET /v1/status Engine status
POST /v1/evaluate Run evaluation
POST /v1/feedback Submit feedback
GET /health Health check

Interactive docs: http://localhost:8000/docs

Korean Support

QuantumRAG is built with first-class Korean language support:

Feature Description
HWP/HWPX Parsing Native parsing for Korean government/office documents
Kiwi Morphology Accurate Korean tokenization for BM25 indexing
EUC-KR Encoding Automatic legacy encoding detection and conversion
Mixed Script Optimal tokenizer selection for Korean-English mixed text
Bilingual Prompts System prompts switch between Korean/English based on query language
Korean Query Patterns Agglutinative morphology-aware query routing and decomposition
pip install kiwipiepy  # Required for Korean morphology

Evaluation

QuantumRAG includes a built-in evaluation system with 6 metrics:

engine = Engine()
result = engine.evaluate()
print(result.summary)
# retrieval_recall: 0.92
# faithfulness: 0.95
# answer_relevancy: 0.88
# completeness: 0.85
# latency: 1.2s avg
# cost: $0.003/query avg

Scenario Test Suite

87 end-to-end scenario tests across 16 categories with 4 difficulty levels:

Category Tests Description
Factual Confirmation 7 Basic fact retrieval, personnel, dates
Multi-Hop Reasoning 6 Cross-document information fusion
Numerical Calculations 6 Math, percentages, comparisons
Temporal Reasoning 6 Timeline, changelog, version tracking
Negation/Exclusion 5 "Not supported", incomplete features
Cross-Document Synthesis 5 Multi-source data integration
Paraphrase Robustness 6 Colloquial and rephrased queries
Multi-Turn Conversation 5 Coreference resolution, entity tracking
Edge Cases 7 Boundary inputs, adversarial queries
Precision Search 6 Fine-grained detail extraction
Implicit Inference 5 Information not directly stated
Competitive Analysis 3 Market positioning, competitor comparison
Conditional Reasoning 5 IF/THEN scenarios, sufficiency checks
Multi-Constraint Filtering 5 Multiple criteria intersection
Derived Quantitative 5 Calculations from multiple sources
Cross-Verification 4 Consistency checks across documents
uv run python tests/run_scenario_tests.py

Project Structure

quantumrag/
├── core/
│   ├── engine.py              # Single entry point
│   ├── config.py              # Configuration (Pydantic + YAML)
│   ├── models.py              # Data models (Chunk, QueryResult, ...)
│   ├── ingest/
│   │   ├── parser/            # Multi-format document parsing
│   │   ├── chunker/           # 6 chunking strategies
│   │   └── indexer/           # Triple Index + 4-Level Indexing
│   │       ├── triple_index_builder.py
│   │       ├── multi_resolution.py
│   │       ├── fact_extractor.py
│   │       ├── derived_index.py
│   │       └── entity_index.py
│   ├── retrieve/
│   │   ├── fusion.py          # RRF triple index fusion
│   │   ├── reranker.py        # Multi-provider reranking
│   │   ├── compressor.py      # Context compression
│   │   ├── entity_detector.py # Entity query detection
│   │   └── constellation.py   # Chunk relationship graph
│   ├── generate/
│   │   ├── generator.py       # Source-grounded generation
│   │   ├── router.py          # Query complexity routing
│   │   ├── rewriter.py        # Query rewriting
│   │   ├── map_reduce.py      # Map-Reduce for aggregation
│   │   └── decomposer.py      # Query decomposition
│   ├── storage/               # SQLite + LanceDB + Tantivy
│   ├── llm/                   # Provider abstraction layer
│   │   └── providers/         # OpenAI, Anthropic, Gemini, Ollama
│   ├── evaluate/              # Evaluation metrics & synthetic QA
│   ├── cache/                 # Semantic cache
│   ├── security/              # Input sanitization, API auth
│   ├── observability/         # Structured logging, tracing
│   └── multitenancy/          # Tenant isolation
├── api/                       # FastAPI HTTP server
├── cli/                       # Typer CLI
├── connectors/                # File, GDrive, Notion, S3, URL
├── korean/                    # Kiwi morphology, encoding
└── plugins/                   # Plugin registry & hooks

Comparison

Feature QuantumRAG LangChain LlamaIndex OpenAI file_search
Triple Index (Embedding + HyPE + BM25) Yes No No No
4-Level Indexing Yes No No No
Entity-Centric Reverse Index Yes No No No
Index-Heavy Architecture Yes No Partial No
Korean Language (HWP, Kiwi) Native Plugin Plugin No
Adaptive Query Routing Yes Manual No No
Map-Reduce RAG Yes Yes Yes No
Offline / Local LLM Yes (Ollama) Yes Yes No
Built-in Evaluation Yes Via LangSmith Yes No
Zero GPU Required Yes Depends Depends N/A

Development

git clone https://github.com/quantumrag/quantumrag.git
cd quantumrag
pip install -e ".[dev,all]"

# Run unit tests
pytest tests/ -q

# Run scenario tests
uv run python tests/run_scenario_tests.py

# Lint
ruff check quantumrag/ tests/

System Requirements

  • Python: 3.10, 3.11, 3.12
  • RAM: 2GB minimum, 4GB+ recommended
  • GPU: Not required (CPU-only by default)
  • Storage: SQLite + LanceDB + Tantivy (all local, no external services)
  • OS: Linux, macOS, Windows (WSL2)

License

Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantumrag-0.1.0.tar.gz (372.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantumrag-0.1.0-py3-none-any.whl (234.1 kB view details)

Uploaded Python 3

File details

Details for the file quantumrag-0.1.0.tar.gz.

File metadata

  • Download URL: quantumrag-0.1.0.tar.gz
  • Upload date:
  • Size: 372.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantumrag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6506fa1190f4d6cf26b815a9556c76d376118a9275931de7806d87aba58770f
MD5 b6d1ca6b0b33d7342872e59f89f2eed5
BLAKE2b-256 605905a0e4c96940b3aefc5804f862e960683ec293546c05bcb1abc8751efd2b

See more details on using hashes here.

File details

Details for the file quantumrag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: quantumrag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 234.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantumrag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c494e64205d5ee5152260799ec47233f503582ce70e457424ebecf09ba4e0f9
MD5 71c2b1a34a233bd63ca3200407a64c65
BLAKE2b-256 6049dd04b5eb8112e4486c7fdb19da27369d4cbf0760f09556e5f32c609b1722

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page