Significance-threshold retrieval for RAG pipelines — stop injecting noise into your LLM context

These details have not been verified by PyPI

Project links

Project description

σ-RAG · Sigma-RAG

Stop injecting noise into your LLM context. σ-RAG gates retrieval with a statistical significance threshold so your model only sees chunks that are actually relevant — not just the least-bad ones.

The Problem with Standard RAG

Standard RAG always returns the top-k chunks, regardless of whether any of them are relevant to the query.

Query: "What caused the 2008 financial crisis?"
Corpus: Python tutorials, particle-physics papers, cooking recipes

Top-3 RAG returns:  chunk_47 (sim=0.31), chunk_12 (sim=0.29), chunk_89 (sim=0.28)
                    ← ALL noise. LLM hallucinates an answer anyway.

σ-RAG returns:      ⚠️  No significant evidence found. Response suppressed.
                    ← Hallucination prevented.

When no chunk is relevant, top-k RAG silently feeds the LLM garbage context. The LLM, trained to be helpful, fabricates a plausible-sounding answer. σ-RAG breaks this failure mode.

How It Works

σ-RAG characterises the noise floor of your embedding space — the distribution of cosine similarities between random, unrelated document pairs. This is analogous to estimating the background noise level before declaring a signal detection.

1. Sample N random cross-document pairs from your corpus
2. Fit a Gaussian: μ_noise, σ_noise
3. Threshold = μ_noise + n·σ_noise   (default n=2, FAR ≈ 2.3%)
4. At query time: only chunks with similarity > threshold are "significant"
5. If zero chunks clear the bar → suppress generation entirely

The threshold has a principled interpretation: at n=2σ, the false alarm rate (probability a random noise chunk clears the bar) is ≈ 2.3%. At n=3σ, it drops to ≈ 0.13%.

Benchmark

Evaluated on a mixed corpus (physics papers + cooking articles) with answerable and unanswerable questions:

Metric	Standard Top-3	σ-RAG (2σ)
Precision@3 (answerable)	1.00	1.00
Recall@3 (answerable)	1.00	0.95
Hallucination risk (unanswerable)	100%	0%
Avg chunks passed to LLM	3.0	1.8

σ-RAG matches top-k on answerable questions while eliminating hallucination risk on unanswerable ones.

Installation

# Minimal (numpy only — uses HashEmbedder, good for testing)
pip install sigma-rag

# Recommended (local sentence-transformers embeddings)
pip install "sigma-rag[local]"

# With Anthropic LLM backend
pip install "sigma-rag[local,anthropic]"

# Everything
pip install "sigma-rag[all]"

Quick Start

from sigma_rag import SigmaIndex, SigmaRAGPipeline

# 1. Build the index
index = SigmaIndex()
index.add_documents([
    "The Higgs boson was discovered at the LHC in 2012 by ATLAS and CMS at 5σ significance...",
    "A discovery in particle physics requires a local p-value below 2.87e-7 (5σ)...",
    "The Standard Model describes quarks, leptons, gauge bosons, and the Higgs field...",
])
index.calibrate()   # fits the background distribution

# 2. Query (offline echo mode — no API key needed)
pipeline = SigmaRAGPipeline(index, llm="echo")

# Answerable query → returns answer
response = pipeline.query("What significance was required to claim the Higgs discovery?")
print(response.has_evidence)     # True
print(f"Used {len(response.retrieval.significant)} chunks")

# Unanswerable query → suppressed
response = pipeline.query("What is the best pasta carbonara recipe?")
print(response.has_evidence)     # False  ← hallucination prevented
print(response.answer)           # "⚠️  σ-RAG: No significant evidence..."

API Overview

`SigmaIndex`

index = SigmaIndex(
    chunk_size=512,       # max chars per chunk
    chunk_overlap=64,     # overlap between consecutive chunks
    n_sigma=2.0,          # default significance threshold
)
index.add_documents(docs)   # list of strings or (text, metadata) tuples
index.calibrate()            # REQUIRED before querying

`SigmaRAGPipeline`

pipeline = SigmaRAGPipeline(
    index,
    n_sigma=2.0,           # threshold (override per-query with pipeline.query(..., n_sigma=3.0))
    max_results=5,         # max chunks to pass to LLM
    llm="anthropic",       # "anthropic" | "openai" | "echo"
    model="claude-haiku-4-5-20251001",
    temperature=0.1,
)
response = pipeline.query("Your question here")

`RAGResponse` fields

response.answer           # str — the answer (or suppression message)
response.has_evidence     # bool — False means generation was suppressed
response.retrieval        # RetrievalResult with .significant and .noise lists
response.retrieval.significant[0].z_score    # how many σ above noise floor
response.retrieval.significant[0].p_value    # probability under null

Side-by-side comparison

comparison = pipeline.compare_with_topk("What is dark matter?", k=5)
print(comparison["sigma_rag"].answer)
print(comparison["top_k"].answer)

Embedder Backends

Embedder	Install	Quality	API Key
`HashEmbedder`	built-in	Testing only	No
`SentenceTransformerEmbedder`	`pip install "sigma-rag[local]"`	Good	No
`OpenAIEmbedder`	`pip install "sigma-rag[openai]"`	Excellent	Yes

from sigma_rag import SigmaIndex, OpenAIEmbedder

index = SigmaIndex(embedder=OpenAIEmbedder(model="text-embedding-3-large"))

Adjusting the Threshold

# More permissive: catch more relevant chunks, higher false-alarm rate
response = pipeline.query(question, n_sigma=1.5)   # FAR ≈ 6.7%

# More conservative: fewer false positives, may miss weak signals
response = pipeline.query(question, n_sigma=3.0)   # FAR ≈ 0.13%

Running the Demo

git clone https://github.com/kpal002/sigma-rag
cd sigma-rag
pip install -e ".[dev]"

# Offline demo (no API key)
python demo.py --llm echo

# With Anthropic
ANTHROPIC_API_KEY=sk-... python demo.py --llm anthropic

Running Tests

pytest                        # all tests
pytest -m "not slow"          # skip slow tests
pytest tests/test_retriever.py -v

Project Structure

sigma-rag/
├── sigma_rag/
│   ├── __init__.py       # public API exports
│   ├── types.py          # Chunk, ScoredChunk, RetrievalResult, RAGResponse
│   ├── stats.py          # pure-numpy norm_cdf, ks_test (scipy optional)
│   ├── noise_floor.py    # NoiseFloor — fits & queries the null distribution
│   ├── embedder.py       # Embedder ABC + SentenceTransformer/OpenAI/Hash backends
│   ├── index.py          # SigmaIndex — document ingestion, chunking, calibration
│   ├── retriever.py      # SigmaRetriever + TopKRetriever baseline
│   └── pipeline.py       # SigmaRAGPipeline — end-to-end QA
├── tests/
│   ├── conftest.py
│   ├── test_embedder.py
│   ├── test_noise_floor.py
│   ├── test_index.py
│   ├── test_retriever.py
│   └── test_pipeline.py
├── notebooks/
│   └── demo.ipynb        # σ-RAG vs top-k visual comparison
├── demo.py               # CLI demo script
├── benchmark.py          # benchmark vs top-k
├── pyproject.toml
└── README.md

The Physics Backstory

The idea comes from signal significance testing in particle physics. When the ATLAS or CMS experiments search for a new particle at the LHC, they don't declare a discovery just because they see "the biggest excess we've found today." They declare a discovery only when the local significance — how many standard deviations above the estimated background the observed excess is — reaches 5σ (local p-value < 2.87 × 10⁻⁷). Below that bar, the excess is considered consistent with a background fluctuation, and no claim is made.

The procedure has two distinct steps:

Background estimation — measure the expected yield from known Standard Model processes (QCD multijet, W/Z+jets, top pairs…) using control regions or sidebands in data, before looking at the signal region.
Significance gate — only if the observed excess clears the threshold does the experiment report evidence of a new signal.

Standard RAG lacks both steps. It has no background model and no significance gate — it always returns the top-k chunks regardless of whether any of them are actually relevant. σ-RAG imports the same two-step logic into the retrieval layer: estimate the background distribution of cosine similarities from random document pairs, set a threshold with interpretable false-alarm semantics (default 2σ ≈ 2.3% FAR), and refuse to pass sub-threshold context to the LLM.

Citation

If you use σ-RAG in research, please cite:

@software{pal2025sigmarag,
  author  = {Pal, Kuntal},
  title   = {σ-RAG: Significance-Threshold Retrieval for RAG Pipelines},
  year    = {2025},
  url     = {https://github.com/kpal002/sigma-rag},
}

License

MIT © Kuntal Pal

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sigma_rag-0.1.0.tar.gz (38.7 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sigma_rag-0.1.0-py3-none-any.whl (31.0 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file sigma_rag-0.1.0.tar.gz.

File metadata

Download URL: sigma_rag-0.1.0.tar.gz
Upload date: May 9, 2026
Size: 38.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for sigma_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`40cdcbc9417ccbb0281bc1bbc2e6754e28292727fc06c24ebe9ab8d4175e17b0`
MD5	`dcb8be835305dd84fbc73a4305ea7877`
BLAKE2b-256	`9cc018af7b033909d6ddcb7f15e331a247ff13852cb34e6239e977aef755ca86`

See more details on using hashes here.

File details

Details for the file sigma_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: sigma_rag-0.1.0-py3-none-any.whl
Upload date: May 9, 2026
Size: 31.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for sigma_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11db403e6c9390bb088b8654ba5359fbb508e220e007a050cc71c2f72dc3cbd8`
MD5	`9d2dfd8686b63029f2c2681cd45f8c6b`
BLAKE2b-256	`479d0f3b7224b5ddb8bd37802c2a9b335f75c3b33614029f109ca317b87ed066`

See more details on using hashes here.

sigma-rag 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

σ-RAG · Sigma-RAG

The Problem with Standard RAG

How It Works

Benchmark

Installation

Quick Start

API Overview

SigmaIndex

SigmaRAGPipeline

RAGResponse fields

Side-by-side comparison

Embedder Backends

Adjusting the Threshold

Running the Demo

Running Tests

Project Structure

The Physics Backstory

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SigmaIndex`

`SigmaRAGPipeline`

`RAGResponse` fields