HydRAG — Multi-Headed Retrieval-Augmented Generation with CRAG supervision

These details have not been verified by PyPI

Project links

Project description

hydrag-core

HydRAG — Multi-Headed Retrieval-Augmented Generation with CRAG supervision.

A standalone, domain-agnostic retrieval pipeline that fuses multiple retrieval heads via Reciprocal Rank Fusion (RRF) and uses a Corrective RAG (CRAG) supervisor to judge context sufficiency before triggering fallback strategies.

Features

Multi-headed retrieval — Five pipeline heads: BM25 fast-path → Primary (hybrid / code-aware) → CRAG supervisor → Semantic fallback → Web fallback
Per-head control — Enable or disable individual heads at runtime via config or environment variables
Domain-agnostic by default — prose profile works with any corpus (documentation, legal, medical, etc.)
Code-aware opt-in — code profile detects symbols (CamelCase, snake_case, dotted paths) and routes to code-aware search
CRAG supervisor — LLM-graded context sufficiency with streaming verdict parsing and confidence-gated fast-path skip
CRAG classifier — Optional distilled DistilBERT binary classifier (<15 ms inference) replaces LLM-based CRAG via teacher-student training pipeline
Pluggable adapters — Implement the VectorStoreAdapter protocol to connect any vector store
Multi-provider LLM support — Select built-in ollama, huggingface, or openai_compat via config, or inject any custom LLMProvider
Zero required dependencies — stdlib-only core; optional extras for ChromaDB, Firecrawl, fine-tuning, and dev tools
Fully typed — PEP 561 compatible with py.typed marker; strict mypy passes

Installation

pip install hydrag-core

With optional extras:

pip install hydrag-core[chromadb]     # ChromaDB adapter support
pip install hydrag-core[firecrawl]    # Web fallback via Firecrawl
pip install hydrag-core[tune]         # CRAG classifier fine-tuning (transformers, torch, onnxruntime)
pip install hydrag-core[dev]          # Development tools (pytest, ruff, mypy)

Quick Start

from hydrag import HydRAG, HydRAGConfig

# 1. Implement the VectorStoreAdapter protocol for your store
class MyAdapter:
    def semantic_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def keyword_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def hybrid_search(self, query: str, n_results: int = 5) -> list[str]: ...

# 2. Create engine and search
adapter = MyAdapter()
engine = HydRAG(adapter)
results = engine.search("How do I configure logging?")

for r in results:
    print(f"[{r.head_origin}] score={r.score:.4f}: {r.text[:80]}")

Code Profile

config = HydRAGConfig(profile="code")
engine = HydRAG(adapter, config=config)
# Queries with symbols (e.g. "HttpRequest class") trigger code-aware retrieval
results = engine.search("How does `HttpRequest` handle timeouts?")

Custom LLM Provider

from hydrag import LLMProvider

class MyLLM:
    def generate(self, prompt: str, model: str = "", timeout: int = 30) -> str | None:
        # Your LLM inference here
        ...

engine = HydRAG(adapter, llm=MyLLM())

Built-In LLM Providers

HydRAG can construct the LLM provider from config without custom wiring.

from hydrag import HydRAG, HydRAGConfig

# Default: Ollama
cfg = HydRAGConfig(llm_provider="ollama", ollama_host="http://localhost:11434")
engine = HydRAG(adapter, config=cfg)

# Hugging Face TGI-compatible endpoint
cfg = HydRAGConfig(
    llm_provider="huggingface",
    hf_api_base="http://localhost:8080",
    hf_model_id="meta-llama/Llama-3.1-8B-Instruct",
)
engine = HydRAG(adapter, config=cfg)

# OpenAI-compatible endpoint (vLLM, LM Studio, OpenRouter-compatible gateway, etc.)
cfg = HydRAGConfig(
    llm_provider="openai_compat",
    openai_compat_api_base="http://localhost:8000",
    openai_compat_model="Qwen/Qwen2.5-7B-Instruct",
)
engine = HydRAG(adapter, config=cfg)

Tokens are read from environment by default:

HYDRAG_HF_API_TOKEN for llm_provider="huggingface"
HYDRAG_OPENAI_COMPAT_API_KEY for llm_provider="openai_compat"

Streaming CRAG Provider

For lower-latency CRAG verdict parsing, implement StreamingLLMProvider:

from hydrag import StreamingLLMProvider

class MyStreamingLLM:
    def generate(self, prompt: str, model: str = "", timeout: int = 30) -> str | None: ...
    def generate_stream(self, prompt: str, model: str = "", timeout: int = 30) -> str | None:
        # Return full response; first token is parsed for early verdict
        ...

engine = HydRAG(adapter, llm=MyStreamingLLM())

Per-Head Control

# Run only BM25 fast-path + primary retrieval (skip CRAG, semantic, web)
config = HydRAGConfig(
    enable_head_0=True,
    enable_head_1=True,
    enable_head_2_crag=False,
    enable_head_3a_semantic=False,
    enable_head_3b_web=False,
)
engine = HydRAG(adapter, config=config)
results = engine.search("quick keyword lookup")

CRAG Classifier (Optional)

from hydrag import tune, CRAGClassifier

# Fine-tune a fast classifier from your corpus
tune(
    adapter=adapter,
    llm=llm,
    output_dir="models/crag_classifier",
    n_samples=500,
)

# Use it — auto-detected when crag_mode="auto" and path is set
config = HydRAGConfig(
    crag_mode="auto",
    crag_classifier_path="models/crag_classifier",
)
engine = HydRAG(adapter, config=config)

Configuration

All settings can be set via environment variables (HYDRAG_ prefix) or passed directly:

Setting	Env Var	Default	Description
`profile`	`HYDRAG_PROFILE`	`"prose"`	`"prose"` or `"code"`
`embedding_model`	`HYDRAG_EMBEDDING_MODEL`	`"Alibaba-NLP/gte-Qwen2-7B-instruct"`	Embedding model name
`crag_model`	`CRAG_MODEL`	`"qwen3:4b"`	Model for CRAG supervisor
`crag_timeout`	`CRAG_TIMEOUT`	`30`	Timeout (seconds) for CRAG calls
`ollama_host`	`OLLAMA_HOST`	`"http://localhost:11434"`	Ollama API endpoint
`llm_provider`	`HYDRAG_LLM_PROVIDER`	`"ollama"`	LLM backend selector: `ollama`, `huggingface`, `openai_compat`
`hf_model_id`	`HYDRAG_HF_MODEL_ID`	`""`	Optional model id for Hugging Face provider
`hf_api_base`	`HYDRAG_HF_API_BASE`	`""`	Hugging Face/TGI base URL (required when `llm_provider="huggingface"`)
`hf_timeout`	`HYDRAG_HF_TIMEOUT`	`30`	Timeout (seconds) for Hugging Face provider calls
`openai_compat_api_base`	`HYDRAG_OPENAI_COMPAT_API_BASE`	`""`	OpenAI-compatible API base URL (required when `llm_provider="openai_compat"`)
`openai_compat_model`	`HYDRAG_OPENAI_COMPAT_MODEL`	`""`	Model name for OpenAI-compatible provider (required when `llm_provider="openai_compat"`)
`openai_compat_timeout`	`HYDRAG_OPENAI_COMPAT_TIMEOUT`	`30`	Timeout (seconds) for OpenAI-compatible provider calls
`openai_compat_endpoint`	`HYDRAG_OPENAI_COMPAT_ENDPOINT`	`"/v1/chat/completions"`	Override OpenAI-compatible chat completions endpoint path
`enable_web_fallback`	`HYDRAG_ENABLE_WEB_FALLBACK`	`false`	Enable Firecrawl web fallback
`allow_web_on_empty_primary`	`HYDRAG_ALLOW_WEB_ON_EMPTY`	`false`	Allow web fallback when Head 1 returns empty
`allow_markdown_in_web_fallback`	`HYDRAG_ALLOW_MARKDOWN_WEB`	`false`	Preserve markdown in web fallback content
`rrf_k`	`HYDRAG_RRF_K`	`60`	RRF smoothing constant
`min_candidate_pool`	`HYDRAG_MIN_CANDIDATE_POOL`	`8`	Minimum candidates per head
`web_chunk_limit`	`HYDRAG_WEB_CHUNK_LIMIT`	`3000`	Max chars per web chunk after sanitization
`crag_min_relevance`	`HYDRAG_CRAG_MIN_RELEVANCE`	`0.67`	Minimum relevance threshold for CRAG
`crag_context_chunks`	`HYDRAG_CRAG_CONTEXT_CHUNKS`	`5`	Chunks sent to CRAG supervisor
`crag_char_limit`	`HYDRAG_CRAG_CHAR_LIMIT`	`1500`	Per-chunk character limit in CRAG prompt
`enable_fast_path`	`HYDRAG_ENABLE_FAST_PATH`	`true`	Head 0 BM25 fast-path
`fast_path_bm25_threshold`	`HYDRAG_FAST_PATH_BM25_THRESHOLD`	`0.67`	BM25 score threshold for fast-path
`fast_path_confidence_threshold`	`HYDRAG_FAST_PATH_CONFIDENCE_THRESHOLD`	`0.8`	Score threshold to skip CRAG entirely
`crag_stream`	`HYDRAG_CRAG_STREAM`	`true`	Parse first token for early CRAG verdict
`crag_mode`	`HYDRAG_CRAG_MODE`	`"auto"`	`"auto"`, `"llm"`, or `"classifier"`
`crag_classifier_path`	`HYDRAG_CRAG_CLASSIFIER_PATH`	`""`	Path to ONNX classifier model
`enable_head_0`	`HYDRAG_ENABLE_HEAD_0`	`true`	BM25 fast-path head
`enable_head_1`	`HYDRAG_ENABLE_HEAD_1`	`true`	Primary retrieval head
`enable_head_2_crag`	`HYDRAG_ENABLE_HEAD_2_CRAG`	`true`	CRAG supervisor head
`enable_head_3a_semantic`	`HYDRAG_ENABLE_HEAD_3A_SEMANTIC`	`true`	Semantic fallback head
`enable_head_3b_web`	`HYDRAG_ENABLE_HEAD_3B_WEB`	`false`	Web fallback head
`fallback_timeout_s`	`HYDRAG_FALLBACK_TIMEOUT_S`	`5.0`	Timeout for fallback head futures
`rrf_head_weights`	`HYDRAG_RRF_HEAD_WEIGHTS`	(JSON)	Per-head RRF weight map

# From environment
config = HydRAGConfig.from_env()

# Direct
config = HydRAGConfig(profile="code", rrf_k=100, enable_web_fallback=True)

Adapter Protocol

Required methods (must implement all three):

class VectorStoreAdapter(Protocol):
    def semantic_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def keyword_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def hybrid_search(self, query: str, n_results: int = 5) -> list[str]: ...

Optional methods (gracefully handled if missing):

    def crag_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def graph_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def rewrite_query(self, query: str) -> str: ...

Pipeline Architecture

Query → [Profile Router]
           │
           ├─ Head 0: BM25 Fast-Path
           │    └─ score ≥ threshold? → early return
           │    └─ score ≥ confidence threshold? → skip CRAG
           │
           ├─ Head 1: Primary Retrieval
           │    ├─ prose → hybrid_search()
           │    └─ code + symbols → code-aware (semantic + keyword RRF)
           │
           ├─ Head 2: CRAG Supervisor
           │    ├─ classifier (ONNX, <15ms) ← crag_mode="auto"|"classifier"
           │    └─ LLM (streaming verdict)  ← crag_mode="auto"|"llm"
           │         ├─ SUFFICIENT → return Head 1 results
           │         └─ INSUFFICIENT ↓
           │
           ├─ Head 3a: Semantic Fallback
           │    └─ rewrite + CRAG search + keyword + graph → RRF
           │
           └─ Head 3b: Web Fallback (optional)
                └─ Firecrawl search → sanitize → RRF

           Final: RRF Fusion across all active heads → list[RetrievalResult]

Development

# Clone and install in development mode
git clone https://github.com/gromanchenko/hydrag.git
cd hydrag-core
pip install -e ".[dev]"

# Run tests
python -m pytest tests/

# Lint
ruff check src/ tests/

# Type check
mypy src/

Versioning

HydRAG uses SemVer. Version sources that must stay in sync:

pyproject.toml ([project].version)
src/hydrag/_version.py (__version__)

Type Checking

This package ships a PEP 561 py.typed marker. Downstream projects using mypy --strict will get full type coverage out of the box.

License

Apache-2.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.18

Apr 26, 2026

1.4.0

Apr 26, 2026

1.3.3

Apr 8, 2026

This version

1.3.1

Apr 7, 2026

1.3.0

Apr 6, 2026

1.2.49

Mar 27, 2026

1.2.34

Mar 21, 2026

1.2.12

Mar 18, 2026

1.2.11

Mar 18, 2026

1.2.8

Mar 18, 2026

1.2.4

Mar 18, 2026

1.0.5

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydrag_core-1.3.1.tar.gz (87.6 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hydrag_core-1.3.1-py3-none-any.whl (62.2 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file hydrag_core-1.3.1.tar.gz.

File metadata

Download URL: hydrag_core-1.3.1.tar.gz
Upload date: Apr 7, 2026
Size: 87.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hydrag_core-1.3.1.tar.gz
Algorithm	Hash digest
SHA256	`0d7fc5a8f8b9ab6bfb0f71762344a3f181cccb1d99c5191f54c2c31c6716304e`
MD5	`d62f9d932e33cbf73b82501c3ccc1f49`
BLAKE2b-256	`b99e2330cfbecaed1dbc39ea0fc32f57a666be3e9285497230df06823b31b86d`

See more details on using hashes here.

File details

Details for the file hydrag_core-1.3.1-py3-none-any.whl.

File metadata

Download URL: hydrag_core-1.3.1-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 62.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hydrag_core-1.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66a494b6037feafbd0ebc608c363bd6559cf25db729a0dc7c329da958f53e8e4`
MD5	`52d786743956379097a78fe90e4de0b5`
BLAKE2b-256	`2c14b6904e015ae806e251b3ba48b88cdd57c81d5f82b5997c053e20e274fa73`

See more details on using hashes here.

hydrag-core 1.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hydrag-core

Features

Installation

Quick Start

Code Profile

Custom LLM Provider

Built-In LLM Providers

Streaming CRAG Provider

Per-Head Control

CRAG Classifier (Optional)

Configuration

Adapter Protocol

Pipeline Architecture

Development

Versioning

Type Checking

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes