Significance-threshold retrieval for RAG pipelines — stop injecting noise into your LLM context
Project description
σ-RAG · Sigma-RAG
Stop injecting noise into your LLM context. σ-RAG gates retrieval with a statistical significance threshold so your model only sees chunks that are actually relevant — not just the least-bad ones.
The Problem with Standard RAG
Standard RAG always returns the top-k chunks, regardless of whether any of them are relevant to the query.
Query: "What caused the 2008 financial crisis?"
Corpus: Python tutorials, particle-physics papers, cooking recipes
Top-3 RAG returns: chunk_47 (sim=0.31), chunk_12 (sim=0.29), chunk_89 (sim=0.28)
← ALL noise. LLM hallucinates an answer anyway.
σ-RAG returns: ⚠️ No significant evidence found. Response suppressed.
← Hallucination prevented.
When no chunk is relevant, top-k RAG silently feeds the LLM garbage context. The LLM, trained to be helpful, fabricates a plausible-sounding answer. σ-RAG breaks this failure mode.
How It Works
σ-RAG characterises the noise floor of your embedding space — the distribution of cosine similarities between random, unrelated document pairs. This is analogous to estimating the background noise level before declaring a signal detection.
1. Sample N random cross-document pairs from your corpus
2. Fit a Gaussian: μ_noise, σ_noise
3. Threshold = μ_noise + n·σ_noise (default n=2, FAR ≈ 2.3%)
4. At query time: only chunks with similarity > threshold are "significant"
5. If zero chunks clear the bar → suppress generation entirely
The threshold has a principled interpretation: at n=2σ, the false alarm rate (probability a random noise chunk clears the bar) is ≈ 2.3%. At n=3σ, it drops to ≈ 0.13%.
Benchmark
Evaluated on a mixed corpus (physics papers + cooking articles) with answerable and unanswerable questions:
| Metric | Standard Top-3 | σ-RAG (2σ) |
|---|---|---|
| Precision@3 (answerable) | 1.00 | 1.00 |
| Recall@3 (answerable) | 1.00 | 0.95 |
| Hallucination risk (unanswerable) | 100% | 0% |
| Avg chunks passed to LLM | 3.0 | 1.8 |
σ-RAG matches top-k on answerable questions while eliminating hallucination risk on unanswerable ones.
Installation
# Minimal (numpy only — uses HashEmbedder, good for testing)
pip install sigma-rag
# Recommended (local sentence-transformers embeddings)
pip install "sigma-rag[local]"
# With Anthropic LLM backend
pip install "sigma-rag[local,anthropic]"
# Everything
pip install "sigma-rag[all]"
Quick Start
from sigma_rag import SigmaIndex, SigmaRAGPipeline
# 1. Build the index
index = SigmaIndex()
index.add_documents([
"The Higgs boson was discovered at the LHC in 2012 by ATLAS and CMS at 5σ significance...",
"A discovery in particle physics requires a local p-value below 2.87e-7 (5σ)...",
"The Standard Model describes quarks, leptons, gauge bosons, and the Higgs field...",
])
index.calibrate() # fits the background distribution
# 2. Query (offline echo mode — no API key needed)
pipeline = SigmaRAGPipeline(index, llm="echo")
# Answerable query → returns answer
response = pipeline.query("What significance was required to claim the Higgs discovery?")
print(response.has_evidence) # True
print(f"Used {len(response.retrieval.significant)} chunks")
# Unanswerable query → suppressed
response = pipeline.query("What is the best pasta carbonara recipe?")
print(response.has_evidence) # False ← hallucination prevented
print(response.answer) # "⚠️ σ-RAG: No significant evidence..."
API Overview
SigmaIndex
index = SigmaIndex(
chunk_size=512, # max chars per chunk
chunk_overlap=64, # overlap between consecutive chunks
n_sigma=2.0, # default significance threshold
)
index.add_documents(docs) # list of strings or (text, metadata) tuples
index.calibrate() # REQUIRED before querying
SigmaRAGPipeline
pipeline = SigmaRAGPipeline(
index,
n_sigma=2.0, # threshold (override per-query with pipeline.query(..., n_sigma=3.0))
max_results=5, # max chunks to pass to LLM
llm="anthropic", # "anthropic" | "openai" | "echo"
model="claude-haiku-4-5-20251001",
temperature=0.1,
)
response = pipeline.query("Your question here")
RAGResponse fields
response.answer # str — the answer (or suppression message)
response.has_evidence # bool — False means generation was suppressed
response.retrieval # RetrievalResult with .significant and .noise lists
response.retrieval.significant[0].z_score # how many σ above noise floor
response.retrieval.significant[0].p_value # probability under null
Side-by-side comparison
comparison = pipeline.compare_with_topk("What is dark matter?", k=5)
print(comparison["sigma_rag"].answer)
print(comparison["top_k"].answer)
Embedder Backends
| Embedder | Install | Quality | API Key |
|---|---|---|---|
HashEmbedder |
built-in | Testing only | No |
SentenceTransformerEmbedder |
pip install "sigma-rag[local]" |
Good | No |
OpenAIEmbedder |
pip install "sigma-rag[openai]" |
Excellent | Yes |
from sigma_rag import SigmaIndex, OpenAIEmbedder
index = SigmaIndex(embedder=OpenAIEmbedder(model="text-embedding-3-large"))
Adjusting the Threshold
# More permissive: catch more relevant chunks, higher false-alarm rate
response = pipeline.query(question, n_sigma=1.5) # FAR ≈ 6.7%
# More conservative: fewer false positives, may miss weak signals
response = pipeline.query(question, n_sigma=3.0) # FAR ≈ 0.13%
Running the Demo
git clone https://github.com/kpal002/sigma-rag
cd sigma-rag
pip install -e ".[dev]"
# Offline demo (no API key)
python demo.py --llm echo
# With Anthropic
ANTHROPIC_API_KEY=sk-... python demo.py --llm anthropic
Running Tests
pytest # all tests
pytest -m "not slow" # skip slow tests
pytest tests/test_retriever.py -v
Project Structure
sigma-rag/
├── sigma_rag/
│ ├── __init__.py # public API exports
│ ├── types.py # Chunk, ScoredChunk, RetrievalResult, RAGResponse
│ ├── stats.py # pure-numpy norm_cdf, ks_test (scipy optional)
│ ├── noise_floor.py # NoiseFloor — fits & queries the null distribution
│ ├── embedder.py # Embedder ABC + SentenceTransformer/OpenAI/Hash backends
│ ├── index.py # SigmaIndex — document ingestion, chunking, calibration
│ ├── retriever.py # SigmaRetriever + TopKRetriever baseline
│ └── pipeline.py # SigmaRAGPipeline — end-to-end QA
├── tests/
│ ├── conftest.py
│ ├── test_embedder.py
│ ├── test_noise_floor.py
│ ├── test_index.py
│ ├── test_retriever.py
│ └── test_pipeline.py
├── notebooks/
│ └── demo.ipynb # σ-RAG vs top-k visual comparison
├── demo.py # CLI demo script
├── benchmark.py # benchmark vs top-k
├── pyproject.toml
└── README.md
The Physics Backstory
The idea comes from signal significance testing in particle physics. When the ATLAS or CMS experiments search for a new particle at the LHC, they don't declare a discovery just because they see "the biggest excess we've found today." They declare a discovery only when the local significance — how many standard deviations above the estimated background the observed excess is — reaches 5σ (local p-value < 2.87 × 10⁻⁷). Below that bar, the excess is considered consistent with a background fluctuation, and no claim is made.
The procedure has two distinct steps:
- Background estimation — measure the expected yield from known Standard Model processes (QCD multijet, W/Z+jets, top pairs…) using control regions or sidebands in data, before looking at the signal region.
- Significance gate — only if the observed excess clears the threshold does the experiment report evidence of a new signal.
Standard RAG lacks both steps. It has no background model and no significance gate — it always returns the top-k chunks regardless of whether any of them are actually relevant. σ-RAG imports the same two-step logic into the retrieval layer: estimate the background distribution of cosine similarities from random document pairs, set a threshold with interpretable false-alarm semantics (default 2σ ≈ 2.3% FAR), and refuse to pass sub-threshold context to the LLM.
Citation
If you use σ-RAG in research, please cite:
@software{pal2025sigmarag,
author = {Pal, Kuntal},
title = {σ-RAG: Significance-Threshold Retrieval for RAG Pipelines},
year = {2025},
url = {https://github.com/kpal002/sigma-rag},
}
License
MIT © Kuntal Pal
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sigma_rag-0.1.0.tar.gz.
File metadata
- Download URL: sigma_rag-0.1.0.tar.gz
- Upload date:
- Size: 38.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40cdcbc9417ccbb0281bc1bbc2e6754e28292727fc06c24ebe9ab8d4175e17b0
|
|
| MD5 |
dcb8be835305dd84fbc73a4305ea7877
|
|
| BLAKE2b-256 |
9cc018af7b033909d6ddcb7f15e331a247ff13852cb34e6239e977aef755ca86
|
File details
Details for the file sigma_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sigma_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11db403e6c9390bb088b8654ba5359fbb508e220e007a050cc71c2f72dc3cbd8
|
|
| MD5 |
9d2dfd8686b63029f2c2681cd45f8c6b
|
|
| BLAKE2b-256 |
479d0f3b7224b5ddb8bd37802c2a9b335f75c3b33614029f109ca317b87ed066
|