A production-grade custom LangChain retriever combining Chroma vector search and local BM25 search with Reciprocal Rank Fusion (RRF).

These details have not been verified by PyPI

Project description

🚀 Chroma-Hybrid-RRF

chroma-hybrid-rrf is a production-grade, highly performant custom LangChain-compatible retriever that merges dense vector semantic search (using ChromaDB) and sparse keyword keyword search (using BM25) using Reciprocal Rank Fusion (RRF).

By combining keyword matching and vector embeddings, this retriever increases query precision and robustness, mitigating issues like synonym misses and context retrieval gaps.

📐 How It Works: Reciprocal Rank Fusion (RRF)

Reciprocal Rank Fusion is a highly reliable algorithm that scores documents solely based on their rank order from different retrievers (rather than comparing raw similarity scores or distances, which vary widely between vector spaces and keyword counters).

The RRF score for a document $d$ across retrieval models $M$ is calculated as:

$$RRF_Score(d \in D) = \sum_{m \in M} \frac{1}{k + r_m(d)}$$

Where:

$M$: The set of retrievers (Dense vector search + Sparse BM25 keyword search).
$r_m(d)$: The 1-based rank position of document $d$ in the result list returned by retriever $m$.
$k$: A constant smoothing parameter (default 60) that prevents low ranks (outliers) from dominating the overall scoring.

🛠️ Key Features

Dual-retrieval pipelines: Performs dense search via ChromaDB and sparse keyword search via BM25.
Auto-Sync indexing: Dynamically pulls and indexes documents from ChromaDB to construct the BM25 search corpus automatically.
Metadata preservation: Retains all original source metadata and appends the calculated rrf_score for debugging and evaluation.
LangChain BaseRetriever compliance: Full drop-in integration with LangChain chains (|) and LCEL (LangChain Expression Language).
Async-ready: Supports standard async calling conventions (ainvoke).

📦 Installation

To install chroma-hybrid-rrf locally in editable mode for development:

git clone https://github.com/Raj2001A/chroma-hybrid-rrf.git
cd chroma-hybrid-rrf
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate
pip install -e .[dev]

⚡ Quick Start

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from chroma_hybrid_rrf import ChromaHybridRRFRetriever

# 1. Initialize dense Chroma Vector Store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
    collection_name="my_docs", 
    embedding_function=embeddings, 
    persist_directory="./chroma_db"
)

# 2. Create the Custom Hybrid RRF Retriever
retriever = ChromaHybridRRFRetriever(
    chroma_vectorstore=vectorstore,
    rrf_k=60,       # RRF constant k
    top_n=4         # Return top 4 fused documents
)

# 3. Retrieve fused documents
query = "Explain LangGraph multi-agent orchestration"
fused_docs = retriever.invoke(query)

for rank, doc in enumerate(fused_docs):
    print(f"Rank {rank + 1} | Score: {doc.metadata['rrf_score']:.6f}")
    print(f"Content: {doc.page_content}\n")

🧪 Evaluation via RAGAS

Evaluating retrieval precision is critical for building production-grade RAG systems. Using the RAGAS framework, you can evaluate the effectiveness of this retriever across key retrieval and generation metrics:

Context Precision: Measures how well the retriever ranks relevant documents at the top.
Context Recall: Verifies if all relevant ground-truth facts are successfully retrieved.

Setup RAGAS Evaluation:

from ragas import evaluate
from ragas.metrics import context_precision, context_recall
from datasets import Dataset

# Construct your evaluation dataset
eval_data = {
    "question": ["How do you orchestrate agents?"],
    "contexts": [[doc.page_content for doc in fused_docs]],
    "ground_truth": ["LangGraph is used for building stateful, multi-actor applications with LLMs."]
}

dataset = Dataset.from_dict(eval_data)
results = evaluate(dataset, metrics=[context_precision, context_recall])
print(results)

🧪 Testing

To run the test suite and verify calculation correctness:

pytest tests/

🤝 Contributing

Contributions are highly welcome! To contribute:

Fork the repository.
Create a new feature branch: git checkout -b feat/your-feature.
Write your changes and add tests.
Run pytest to make sure all tests pass.
Push to your branch and open a Pull Request.

📜 License

Distributed under the MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chroma_hybrid_rrf-0.1.0.tar.gz (8.1 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chroma_hybrid_rrf-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file chroma_hybrid_rrf-0.1.0.tar.gz.

File metadata

Download URL: chroma_hybrid_rrf-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for chroma_hybrid_rrf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`907ed75d4a06baf145169b7df9235c3892087c5df80d04d693d6f800afb373f0`
MD5	`3e571ce8f859e930e5e90b9750721abb`
BLAKE2b-256	`0a155c808702461f61fdb48648808bfb9bebf4ba364dd30ebd7662a084632cc0`

See more details on using hashes here.

File details

Details for the file chroma_hybrid_rrf-0.1.0-py3-none-any.whl.

File metadata

Download URL: chroma_hybrid_rrf-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for chroma_hybrid_rrf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed6de9458f7e96b5a682fd05a2b00413b268bcbb9a8ee9df618f0d8523ca42a4`
MD5	`4be562b9cdf61335b02af64538066ad4`
BLAKE2b-256	`6873327d55911fcdeadb06f10d885322215cdceaf5d21c0554857ad1600e8ece`

See more details on using hashes here.

chroma-hybrid-rrf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🚀 Chroma-Hybrid-RRF

📐 How It Works: Reciprocal Rank Fusion (RRF)

🛠️ Key Features

📦 Installation

⚡ Quick Start

🧪 Evaluation via RAGAS

Setup RAGAS Evaluation:

🧪 Testing

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes