Skip to main content

LangChain integration for LongProbe — sub-second RAG retrieval regression testing with chunk-level diffing

Project description

🔬 langchain-longprobe

Sub-second RAG retrieval regression testing for LangChain

PyPI version Python Versions License: MIT LongProbe Core

The first retrieval regression testing integration for LangChain. Not another evaluation framework — a test runner that catches lost chunks in milliseconds.

Quick StartAPI ReferencePytest IntegrationExamples


Why langchain-longprobe?

Every RAG developer faces the same problem: you upgrade LangChain, swap a vector store, or tweak a chunking strategy — and your retrieval silently degrades. Existing evaluation tools (Ragas, DeepEval) tell you the LLM answer got worse, but they can't tell you which specific chunks were lost.

langchain-longprobe bridges this gap:

Feature LangChain Evaluators / Ragas langchain-longprobe
Focus LLM response quality Retrieval stability
Speed 10s–60s (LLM-as-judge) Sub-second (chunk match)
Feedback Pass/Fail score Visual diff of lost/gained chunks
Workflow Batch analysis pytest / CI integration
Detects Bad answers Why retrieval failed

Installation

pip install langchain-longprobe

This installs both langchain-longprobe and the core longprobe library.

Quick Start

1. Define Golden Questions

Create a goldens.yaml with your expected retrieval results:

name: "my-rag-golden-set"
version: "1.0"

questions:
  - id: "q1"
    question: "What is the refund policy?"
    match_mode: "id"
    required_chunks:
      - "chunk_refund_01"
      - "chunk_refund_02"
    top_k: 5

  - id: "q2"
    question: "What are the payment terms?"
    match_mode: "text"
    required_chunks:
      - "net 30 days from invoice"
    top_k: 5

2. Probe Your Retriever

from langchain_longprobe import RetrievalProbe
from langchain_community.vectorstores import Chroma

# Your existing LangChain setup
vectorstore = Chroma(persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Create a probe
probe = RetrievalProbe.from_retriever(
    retriever=retriever,
    goldens_path="goldens.yaml",
)

# Run regression check (sub-second)
report = probe.run()
print(f"Recall: {report.overall_recall:.2%}")
print(f"Pass Rate: {report.pass_rate:.2%}")

# See exactly what's missing
for qid, chunks in probe.get_missing_chunks().items():
    print(f"  {qid}: lost {chunks}")

3. Track Regressions Over Time

# Save a baseline after your first successful run
probe.run()
probe.save_baseline("v1.0")

# After code changes, compare against baseline
probe.run()
diff = probe.diff("v1.0")
print(f"Regressions: {len(diff['regressions'])}")
print(f"Improvements: {len(diff['improvements'])}")

API Reference

RetrievalProbe — Main Entry Point

The recommended way to use langchain-longprobe. Wraps any LangChain BaseRetriever.

from langchain_longprobe import RetrievalProbe

probe = RetrievalProbe.from_retriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
)

report = probe.run()
probe.save_baseline("v1.0")
diff = probe.diff("v1.0")
missing = probe.get_missing_chunks()

ProbedRetriever — Drop-in Retriever Wrapper

A LangChain BaseRetriever that wraps your retriever and adds regression testing. Use as a drop-in replacement in your existing chains.

from langchain_longprobe import ProbedRetriever

probed = ProbedRetriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    check_on_invoke=False,  # set True for automatic checks
)

# Use exactly like a normal retriever
docs = probed.invoke("What is the refund policy?")

# Manually trigger a regression check
report = probed.check()
probed.save_baseline("v1.0")

LongProbeCallbackHandler — Passive Monitoring

Attach to any LangChain chain to passively record retrieval events.

from langchain_longprobe import LongProbeCallbackHandler

handler = LongProbeCallbackHandler(
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
    fail_on_regression=True,
)

# Attach to retriever calls
docs = retriever.invoke("query", config={"callbacks": [handler]})

# Inspect results
print(handler.retrieval_log)

# Run a full check
report = handler.run_probe(retriever)

RetrievalRegressionRunnable — LCEL Integration

Use LongProbe as a composable step in LangChain Expression Language chains.

from langchain_longprobe import RetrievalRegressionRunnable

runnable = RetrievalRegressionRunnable(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    fail_on_regression=True,
)

# Invoke with options
result = runnable.invoke({
    "top_k": 10,
    "save_baseline": "v2.0",
    "baseline_label": "v1.0",  # compare against this
})

print(result["overall_recall"])
print(result["missing_chunks"])

Pytest Integration

conftest.py

import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture
def probe(my_retriever):
    return RetrievalProbe.from_retriever(
        retriever=my_retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

Writing Tests

def test_retrieval_recall(probe):
    """Ensure retrieval recall stays above threshold."""
    report = probe.run()
    assert report.overall_recall >= 0.85, (
        f"Recall dropped to {report.overall_recall:.2f}. "
        f"Missing: {probe.get_missing_chunks()}"
    )

def test_no_regression(probe):
    """Ensure no chunks were lost vs. baseline."""
    probe.assert_no_regression("v1.0")

Command Line

pytest --langchain-longprobe-goldens goldens.yaml --langchain-longprobe-threshold 0.85

GitHub Actions

name: RAG Regression Check

on: [push, pull_request]

jobs:
  rag-probe:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install langchain-longprobe
      - name: Run regression check
        run: pytest tests/test_rag_regression.py -v
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Examples

Basic Regression Check

from langchain_longprobe import RetrievalProbe
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()

probe = RetrievalProbe.from_retriever(retriever, goldens_path="goldens.yaml")
report = probe.run()

if report.regression_detected:
    print("⚠️  Regression detected!")
    for qid, chunks in probe.get_missing_chunks().items():
        print(f"  Question {qid}: lost chunks {chunks}")
else:
    print("✅ All chunks present")

CI/CD Pipeline Integration

# tests/test_rag_regression.py
import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture(scope="session")
def probe():
    from langchain_community.vectorstores import Chroma
    retriever = Chroma(persist_directory="./db").as_retriever()
    return RetrievalProbe.from_retriever(
        retriever=retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

def test_recall_above_threshold(probe):
    report = probe.run()
    assert report.overall_recall >= 0.85

def test_no_regressions_vs_baseline(probe):
    probe.assert_no_regression("production")

def test_critical_questions_pass(probe):
    report = probe.run()
    for result in report.results:
        if "critical" in result.question_id:
            assert result.passed, f"Critical question {result.question_id} failed"

Part of the Long Suite

langchain-longprobe is the official LangChain integration for LongProbe, part of the EnDevSols Long Suite of RAG tools:

Contributing

We welcome contributions! Please see the LongProbe Contributing Guide for guidelines.

License

MIT License — see LICENSE for details.


Developed by EnDevSolsGitHubLongProbe Core

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_longprobe-0.1.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_longprobe-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_longprobe-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_longprobe-0.1.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_longprobe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8de0021cfdb851563dbe664543b0ef74491d24f74c99076ca28fb21ef95e5b17
MD5 782e9b729c5c26b8606b8a2b532331dc
BLAKE2b-256 9f20bdd233de3aff02a378ca19a07416e219378d312e070c9027090ae2294619

See more details on using hashes here.

File details

Details for the file langchain_longprobe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_longprobe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fafda4b89e7ddc0f3cbfbdc90e0097de9fa3cdb7451b679f1f31e7b6dc57e56f
MD5 86d61891cb2bbdb2c75e0d82d01c2c0f
BLAKE2b-256 52597ab51f27563fc6580edcb2b1eee54fcf9edbe27409b192200f4e67312797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page