LangChain integration for LongProbe — sub-second RAG retrieval regression testing with chunk-level diffing
Project description
🔬 langchain-longprobe
Sub-second RAG retrieval regression testing for LangChain
The first retrieval regression testing integration for LangChain. Not another evaluation framework — a test runner that catches lost chunks in milliseconds.
Why langchain-longprobe?
Every RAG developer faces the same problem: you upgrade LangChain, swap a vector store, or tweak a chunking strategy — and your retrieval silently degrades. Existing evaluation tools (Ragas, DeepEval) tell you the LLM answer got worse, but they can't tell you which specific chunks were lost.
langchain-longprobe bridges this gap:
| Feature | LangChain Evaluators / Ragas | langchain-longprobe |
|---|---|---|
| Focus | LLM response quality | Retrieval stability |
| Speed | 10s–60s (LLM-as-judge) | Sub-second (chunk match) |
| Feedback | Pass/Fail score | Visual diff of lost/gained chunks |
| Workflow | Batch analysis | pytest / CI integration |
| Detects | Bad answers | Why retrieval failed |
Installation
pip install langchain-longprobe
This installs both langchain-longprobe and the core longprobe library.
Quick Start
1. Define Golden Questions
Create a goldens.yaml with your expected retrieval results:
name: "my-rag-golden-set"
version: "1.0"
questions:
- id: "q1"
question: "What is the refund policy?"
match_mode: "id"
required_chunks:
- "chunk_refund_01"
- "chunk_refund_02"
top_k: 5
- id: "q2"
question: "What are the payment terms?"
match_mode: "text"
required_chunks:
- "net 30 days from invoice"
top_k: 5
2. Probe Your Retriever
from langchain_longprobe import RetrievalProbe
from langchain_community.vectorstores import Chroma
# Your existing LangChain setup
vectorstore = Chroma(persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# Create a probe
probe = RetrievalProbe.from_retriever(
retriever=retriever,
goldens_path="goldens.yaml",
)
# Run regression check (sub-second)
report = probe.run()
print(f"Recall: {report.overall_recall:.2%}")
print(f"Pass Rate: {report.pass_rate:.2%}")
# See exactly what's missing
for qid, chunks in probe.get_missing_chunks().items():
print(f" {qid}: lost {chunks}")
3. Track Regressions Over Time
# Save a baseline after your first successful run
probe.run()
probe.save_baseline("v1.0")
# After code changes, compare against baseline
probe.run()
diff = probe.diff("v1.0")
print(f"Regressions: {len(diff['regressions'])}")
print(f"Improvements: {len(diff['improvements'])}")
API Reference
RetrievalProbe — Main Entry Point
The recommended way to use langchain-longprobe. Wraps any LangChain BaseRetriever.
from langchain_longprobe import RetrievalProbe
probe = RetrievalProbe.from_retriever(
retriever=your_retriever,
goldens_path="goldens.yaml",
recall_threshold=0.85,
)
report = probe.run()
probe.save_baseline("v1.0")
diff = probe.diff("v1.0")
missing = probe.get_missing_chunks()
ProbedRetriever — Drop-in Retriever Wrapper
A LangChain BaseRetriever that wraps your retriever and adds regression testing.
Use as a drop-in replacement in your existing chains.
from langchain_longprobe import ProbedRetriever
probed = ProbedRetriever(
retriever=your_retriever,
goldens_path="goldens.yaml",
check_on_invoke=False, # set True for automatic checks
)
# Use exactly like a normal retriever
docs = probed.invoke("What is the refund policy?")
# Manually trigger a regression check
report = probed.check()
probed.save_baseline("v1.0")
LongProbeCallbackHandler — Passive Monitoring
Attach to any LangChain chain to passively record retrieval events.
from langchain_longprobe import LongProbeCallbackHandler
handler = LongProbeCallbackHandler(
goldens_path="goldens.yaml",
recall_threshold=0.85,
fail_on_regression=True,
)
# Attach to retriever calls
docs = retriever.invoke("query", config={"callbacks": [handler]})
# Inspect results
print(handler.retrieval_log)
# Run a full check
report = handler.run_probe(retriever)
RetrievalRegressionRunnable — LCEL Integration
Use LongProbe as a composable step in LangChain Expression Language chains.
from langchain_longprobe import RetrievalRegressionRunnable
runnable = RetrievalRegressionRunnable(
retriever=your_retriever,
goldens_path="goldens.yaml",
fail_on_regression=True,
)
# Invoke with options
result = runnable.invoke({
"top_k": 10,
"save_baseline": "v2.0",
"baseline_label": "v1.0", # compare against this
})
print(result["overall_recall"])
print(result["missing_chunks"])
Pytest Integration
conftest.py
import pytest
from langchain_longprobe import RetrievalProbe
@pytest.fixture
def probe(my_retriever):
return RetrievalProbe.from_retriever(
retriever=my_retriever,
goldens_path="goldens.yaml",
recall_threshold=0.85,
)
Writing Tests
def test_retrieval_recall(probe):
"""Ensure retrieval recall stays above threshold."""
report = probe.run()
assert report.overall_recall >= 0.85, (
f"Recall dropped to {report.overall_recall:.2f}. "
f"Missing: {probe.get_missing_chunks()}"
)
def test_no_regression(probe):
"""Ensure no chunks were lost vs. baseline."""
probe.assert_no_regression("v1.0")
Command Line
pytest --langchain-longprobe-goldens goldens.yaml --langchain-longprobe-threshold 0.85
GitHub Actions
name: RAG Regression Check
on: [push, pull_request]
jobs:
rag-probe:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install langchain-longprobe
- name: Run regression check
run: pytest tests/test_rag_regression.py -v
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Examples
Basic Regression Check
from langchain_longprobe import RetrievalProbe
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()
probe = RetrievalProbe.from_retriever(retriever, goldens_path="goldens.yaml")
report = probe.run()
if report.regression_detected:
print("⚠️ Regression detected!")
for qid, chunks in probe.get_missing_chunks().items():
print(f" Question {qid}: lost chunks {chunks}")
else:
print("✅ All chunks present")
CI/CD Pipeline Integration
# tests/test_rag_regression.py
import pytest
from langchain_longprobe import RetrievalProbe
@pytest.fixture(scope="session")
def probe():
from langchain_community.vectorstores import Chroma
retriever = Chroma(persist_directory="./db").as_retriever()
return RetrievalProbe.from_retriever(
retriever=retriever,
goldens_path="goldens.yaml",
recall_threshold=0.85,
)
def test_recall_above_threshold(probe):
report = probe.run()
assert report.overall_recall >= 0.85
def test_no_regressions_vs_baseline(probe):
probe.assert_no_regression("production")
def test_critical_questions_pass(probe):
report = probe.run()
for result in report.results:
if "critical" in result.question_id:
assert result.passed, f"Critical question {result.question_id} failed"
Part of the Long Suite
langchain-longprobe is the official LangChain integration for LongProbe, part of the EnDevSols Long Suite of RAG tools:
- LongParser — Document ingestion and chunking
- LongTrainer — RAG chatbot framework
- LongTracer — Hallucination detection
- LongProbe — Retrieval regression testing (core library)
- langchain-longprobe — LangChain integration ← You are here
Contributing
We welcome contributions! Please see the LongProbe Contributing Guide for guidelines.
License
MIT License — see LICENSE for details.
Developed by EnDevSols • GitHub • LongProbe Core
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_longprobe-0.1.0.tar.gz.
File metadata
- Download URL: langchain_longprobe-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8de0021cfdb851563dbe664543b0ef74491d24f74c99076ca28fb21ef95e5b17
|
|
| MD5 |
782e9b729c5c26b8606b8a2b532331dc
|
|
| BLAKE2b-256 |
9f20bdd233de3aff02a378ca19a07416e219378d312e070c9027090ae2294619
|
File details
Details for the file langchain_longprobe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_longprobe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fafda4b89e7ddc0f3cbfbdc90e0097de9fa3cdb7451b679f1f31e7b6dc57e56f
|
|
| MD5 |
86d61891cb2bbdb2c75e0d82d01c2c0f
|
|
| BLAKE2b-256 |
52597ab51f27563fc6580edcb2b1eee54fcf9edbe27409b192200f4e67312797
|