LangChain integration for LongProbe — sub-second RAG retrieval regression testing with chunk-level diffing

These details have not been verified by PyPI

Project links

Project description

🔬 langchain-longprobe

Sub-second RAG retrieval regression testing for LangChain

The first retrieval regression testing integration for LangChain. Not another evaluation framework — a test runner that catches lost chunks in milliseconds.

Quick Start • API Reference • Pytest Integration • Examples

Why langchain-longprobe?

Every RAG developer faces the same problem: you upgrade LangChain, swap a vector store, or tweak a chunking strategy — and your retrieval silently degrades. Existing evaluation tools (Ragas, DeepEval) tell you the LLM answer got worse, but they can't tell you which specific chunks were lost.

langchain-longprobe bridges this gap:

Feature	LangChain Evaluators / Ragas	langchain-longprobe
Focus	LLM response quality	Retrieval stability
Speed	10s–60s (LLM-as-judge)	Sub-second (chunk match)
Feedback	Pass/Fail score	Visual diff of lost/gained chunks
Workflow	Batch analysis	`pytest` / CI integration
Detects	Bad answers	Why retrieval failed

Installation

pip install langchain-longprobe

This installs both langchain-longprobe and the core longprobe library.

Quick Start

1. Define Golden Questions

Create a goldens.yaml with your expected retrieval results:

name: "my-rag-golden-set"
version: "1.0"

questions:
  - id: "q1"
    question: "What is the refund policy?"
    match_mode: "id"
    required_chunks:
      - "chunk_refund_01"
      - "chunk_refund_02"
    top_k: 5

  - id: "q2"
    question: "What are the payment terms?"
    match_mode: "text"
    required_chunks:
      - "net 30 days from invoice"
    top_k: 5

2. Probe Your Retriever

from langchain_longprobe import RetrievalProbe
from langchain_community.vectorstores import Chroma

# Your existing LangChain setup
vectorstore = Chroma(persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Create a probe
probe = RetrievalProbe.from_retriever(
    retriever=retriever,
    goldens_path="goldens.yaml",
)

# Run regression check (sub-second)
report = probe.run()
print(f"Recall: {report.overall_recall:.2%}")
print(f"Pass Rate: {report.pass_rate:.2%}")

# See exactly what's missing
for qid, chunks in probe.get_missing_chunks().items():
    print(f"  {qid}: lost {chunks}")

3. Track Regressions Over Time

# Save a baseline after your first successful run
probe.run()
probe.save_baseline("v1.0")

# After code changes, compare against baseline
probe.run()
diff = probe.diff("v1.0")
print(f"Regressions: {len(diff['regressions'])}")
print(f"Improvements: {len(diff['improvements'])}")

API Reference

`RetrievalProbe` — Main Entry Point

The recommended way to use langchain-longprobe. Wraps any LangChain BaseRetriever.

from langchain_longprobe import RetrievalProbe

probe = RetrievalProbe.from_retriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
)

report = probe.run()
probe.save_baseline("v1.0")
diff = probe.diff("v1.0")
missing = probe.get_missing_chunks()

`ProbedRetriever` — Drop-in Retriever Wrapper

A LangChain BaseRetriever that wraps your retriever and adds regression testing. Use as a drop-in replacement in your existing chains.

from langchain_longprobe import ProbedRetriever

probed = ProbedRetriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    check_on_invoke=False,  # set True for automatic checks
)

# Use exactly like a normal retriever
docs = probed.invoke("What is the refund policy?")

# Manually trigger a regression check
report = probed.check()
probed.save_baseline("v1.0")

`LongProbeCallbackHandler` — Passive Monitoring

Attach to any LangChain chain to passively record retrieval events.

from langchain_longprobe import LongProbeCallbackHandler

handler = LongProbeCallbackHandler(
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
    fail_on_regression=True,
)

# Attach to retriever calls
docs = retriever.invoke("query", config={"callbacks": [handler]})

# Inspect results
print(handler.retrieval_log)

# Run a full check
report = handler.run_probe(retriever)

`RetrievalRegressionRunnable` — LCEL Integration

Use LongProbe as a composable step in LangChain Expression Language chains.

from langchain_longprobe import RetrievalRegressionRunnable

runnable = RetrievalRegressionRunnable(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    fail_on_regression=True,
)

# Invoke with options
result = runnable.invoke({
    "top_k": 10,
    "save_baseline": "v2.0",
    "baseline_label": "v1.0",  # compare against this
})

print(result["overall_recall"])
print(result["missing_chunks"])

Pytest Integration

conftest.py

import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture
def probe(my_retriever):
    return RetrievalProbe.from_retriever(
        retriever=my_retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

Writing Tests

def test_retrieval_recall(probe):
    """Ensure retrieval recall stays above threshold."""
    report = probe.run()
    assert report.overall_recall >= 0.85, (
        f"Recall dropped to {report.overall_recall:.2f}. "
        f"Missing: {probe.get_missing_chunks()}"
    )

def test_no_regression(probe):
    """Ensure no chunks were lost vs. baseline."""
    probe.assert_no_regression("v1.0")

Command Line

pytest --langchain-longprobe-goldens goldens.yaml --langchain-longprobe-threshold 0.85

GitHub Actions

name: RAG Regression Check

on: [push, pull_request]

jobs:
  rag-probe:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install langchain-longprobe
      - name: Run regression check
        run: pytest tests/test_rag_regression.py -v
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Examples

Basic Regression Check

from langchain_longprobe import RetrievalProbe
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()

probe = RetrievalProbe.from_retriever(retriever, goldens_path="goldens.yaml")
report = probe.run()

if report.regression_detected:
    print("⚠️  Regression detected!")
    for qid, chunks in probe.get_missing_chunks().items():
        print(f"  Question {qid}: lost chunks {chunks}")
else:
    print("✅ All chunks present")

CI/CD Pipeline Integration

# tests/test_rag_regression.py
import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture(scope="session")
def probe():
    from langchain_community.vectorstores import Chroma
    retriever = Chroma(persist_directory="./db").as_retriever()
    return RetrievalProbe.from_retriever(
        retriever=retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

def test_recall_above_threshold(probe):
    report = probe.run()
    assert report.overall_recall >= 0.85

def test_no_regressions_vs_baseline(probe):
    probe.assert_no_regression("production")

def test_critical_questions_pass(probe):
    report = probe.run()
    for result in report.results:
        if "critical" in result.question_id:
            assert result.passed, f"Critical question {result.question_id} failed"

Part of the Long Suite

langchain-longprobe is the official LangChain integration for LongProbe, part of the EnDevSols Long Suite of RAG tools:

LongParser — Document ingestion and chunking
LongTrainer — RAG chatbot framework
LongTracer — Hallucination detection
LongProbe — Retrieval regression testing (core library)
langchain-longprobe — LangChain integration ← You are here

Contributing

We welcome contributions! Please see the LongProbe Contributing Guide for guidelines.

License

MIT License — see LICENSE for details.

Developed by EnDevSols • GitHub • LongProbe Core

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 6, 2026

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_longprobe-0.1.0.tar.gz (15.7 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_longprobe-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file langchain_longprobe-0.1.0.tar.gz.

File metadata

Download URL: langchain_longprobe-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 15.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_longprobe-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8de0021cfdb851563dbe664543b0ef74491d24f74c99076ca28fb21ef95e5b17`
MD5	`782e9b729c5c26b8606b8a2b532331dc`
BLAKE2b-256	`9f20bdd233de3aff02a378ca19a07416e219378d312e070c9027090ae2294619`

See more details on using hashes here.

File details

Details for the file langchain_longprobe-0.1.0-py3-none-any.whl.

File metadata

Download URL: langchain_longprobe-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 15.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_longprobe-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fafda4b89e7ddc0f3cbfbdc90e0097de9fa3cdb7451b679f1f31e7b6dc57e56f`
MD5	`86d61891cb2bbdb2c75e0d82d01c2c0f`
BLAKE2b-256	`52597ab51f27563fc6580edcb2b1eee54fcf9edbe27409b192200f4e67312797`

See more details on using hashes here.

langchain-longprobe 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔬 langchain-longprobe

Why langchain-longprobe?

Installation

Quick Start

1. Define Golden Questions

2. Probe Your Retriever

3. Track Regressions Over Time

API Reference

RetrievalProbe — Main Entry Point

ProbedRetriever — Drop-in Retriever Wrapper

LongProbeCallbackHandler — Passive Monitoring

RetrievalRegressionRunnable — LCEL Integration

Pytest Integration

conftest.py

Writing Tests

Command Line

GitHub Actions

Examples

Basic Regression Check

CI/CD Pipeline Integration

Part of the Long Suite

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`RetrievalProbe` — Main Entry Point

`ProbedRetriever` — Drop-in Retriever Wrapper

`LongProbeCallbackHandler` — Passive Monitoring

`RetrievalRegressionRunnable` — LCEL Integration