Skip to main content

LangChain integration for LongProbe — sub-second RAG retrieval regression testing with chunk-level diffing

Project description

🔬 langchain-longprobe

Sub-second RAG retrieval regression testing for LangChain

PyPI version Python Versions License: MIT LongProbe Core

The first retrieval regression testing integration for LangChain. Not another evaluation framework — a test runner that catches lost chunks in milliseconds.

Quick StartAPI ReferencePytest IntegrationExamples


Why langchain-longprobe?

Every RAG developer faces the same problem: you upgrade LangChain, swap a vector store, or tweak a chunking strategy — and your retrieval silently degrades. Existing evaluation tools (Ragas, DeepEval) tell you the LLM answer got worse, but they can't tell you which specific chunks were lost.

langchain-longprobe bridges this gap:

Feature LangChain Evaluators / Ragas langchain-longprobe
Focus LLM response quality Retrieval stability
Speed 10s–60s (LLM-as-judge) Sub-second (chunk match)
Feedback Pass/Fail score Visual diff of lost/gained chunks
Workflow Batch analysis pytest / CI integration
Detects Bad answers Why retrieval failed

Installation

pip install langchain-longprobe

This installs both langchain-longprobe and the core longprobe library.

Quick Start

1. Define Golden Questions

Create a goldens.yaml with your expected retrieval results:

name: "my-rag-golden-set"
version: "1.0"

questions:
  - id: "q1"
    question: "What is the refund policy?"
    match_mode: "id"
    required_chunks:
      - "chunk_refund_01"
      - "chunk_refund_02"
    top_k: 5

  - id: "q2"
    question: "What are the payment terms?"
    match_mode: "text"
    required_chunks:
      - "net 30 days from invoice"
    top_k: 5

2. Probe Your Retriever

from langchain_longprobe import RetrievalProbe
from langchain_community.vectorstores import Chroma

# Your existing LangChain setup
vectorstore = Chroma(persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Create a probe
probe = RetrievalProbe.from_retriever(
    retriever=retriever,
    goldens_path="goldens.yaml",
)

# Run regression check (sub-second)
report = probe.run()
print(f"Recall: {report.overall_recall:.2%}")
print(f"Pass Rate: {report.pass_rate:.2%}")

# See exactly what's missing
for qid, chunks in probe.get_missing_chunks().items():
    print(f"  {qid}: lost {chunks}")

3. Track Regressions Over Time

# Save a baseline after your first successful run
probe.run()
probe.save_baseline("v1.0")

# After code changes, compare against baseline
probe.run()
diff = probe.diff("v1.0")
print(f"Regressions: {len(diff['regressions'])}")
print(f"Improvements: {len(diff['improvements'])}")

API Reference

RetrievalProbe — Main Entry Point

The recommended way to use langchain-longprobe. Wraps any LangChain BaseRetriever.

from langchain_longprobe import RetrievalProbe

probe = RetrievalProbe.from_retriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
)

report = probe.run()
probe.save_baseline("v1.0")
diff = probe.diff("v1.0")
missing = probe.get_missing_chunks()

ProbedRetriever — Drop-in Retriever Wrapper

A LangChain BaseRetriever that wraps your retriever and adds regression testing. Use as a drop-in replacement in your existing chains.

from langchain_longprobe import ProbedRetriever

probed = ProbedRetriever(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    check_on_invoke=False,  # set True for automatic checks
)

# Use exactly like a normal retriever
docs = probed.invoke("What is the refund policy?")

# Manually trigger a regression check
report = probed.check()
probed.save_baseline("v1.0")

LongProbeCallbackHandler — Passive Monitoring

Attach to any LangChain chain to passively record retrieval events.

from langchain_longprobe import LongProbeCallbackHandler

handler = LongProbeCallbackHandler(
    goldens_path="goldens.yaml",
    recall_threshold=0.85,
    fail_on_regression=True,
)

# Attach to retriever calls
docs = retriever.invoke("query", config={"callbacks": [handler]})

# Inspect results
print(handler.retrieval_log)

# Run a full check
report = handler.run_probe(retriever)

RetrievalRegressionRunnable — LCEL Integration

Use LongProbe as a composable step in LangChain Expression Language chains.

from langchain_longprobe import RetrievalRegressionRunnable

runnable = RetrievalRegressionRunnable(
    retriever=your_retriever,
    goldens_path="goldens.yaml",
    fail_on_regression=True,
)

# Invoke with options
result = runnable.invoke({
    "top_k": 10,
    "save_baseline": "v2.0",
    "baseline_label": "v1.0",  # compare against this
})

print(result["overall_recall"])
print(result["missing_chunks"])

Pytest Integration

conftest.py

import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture
def probe(my_retriever):
    return RetrievalProbe.from_retriever(
        retriever=my_retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

Writing Tests

def test_retrieval_recall(probe):
    """Ensure retrieval recall stays above threshold."""
    report = probe.run()
    assert report.overall_recall >= 0.85, (
        f"Recall dropped to {report.overall_recall:.2f}. "
        f"Missing: {probe.get_missing_chunks()}"
    )

def test_no_regression(probe):
    """Ensure no chunks were lost vs. baseline."""
    probe.assert_no_regression("v1.0")

Command Line

pytest --langchain-longprobe-goldens goldens.yaml --langchain-longprobe-threshold 0.85

GitHub Actions

name: RAG Regression Check

on: [push, pull_request]

jobs:
  rag-probe:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install langchain-longprobe
      - name: Run regression check
        run: pytest tests/test_rag_regression.py -v
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Examples

Basic Regression Check

from langchain_longprobe import RetrievalProbe
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()

probe = RetrievalProbe.from_retriever(retriever, goldens_path="goldens.yaml")
report = probe.run()

if report.regression_detected:
    print("⚠️  Regression detected!")
    for qid, chunks in probe.get_missing_chunks().items():
        print(f"  Question {qid}: lost chunks {chunks}")
else:
    print("✅ All chunks present")

CI/CD Pipeline Integration

# tests/test_rag_regression.py
import pytest
from langchain_longprobe import RetrievalProbe

@pytest.fixture(scope="session")
def probe():
    from langchain_community.vectorstores import Chroma
    retriever = Chroma(persist_directory="./db").as_retriever()
    return RetrievalProbe.from_retriever(
        retriever=retriever,
        goldens_path="goldens.yaml",
        recall_threshold=0.85,
    )

def test_recall_above_threshold(probe):
    report = probe.run()
    assert report.overall_recall >= 0.85

def test_no_regressions_vs_baseline(probe):
    probe.assert_no_regression("production")

def test_critical_questions_pass(probe):
    report = probe.run()
    for result in report.results:
        if "critical" in result.question_id:
            assert result.passed, f"Critical question {result.question_id} failed"

Part of the Long Suite

langchain-longprobe is the official LangChain integration for LongProbe, part of the EnDevSols Long Suite of RAG tools:

Contributing

We welcome contributions! Please see the LongProbe Contributing Guide for guidelines.

License

MIT License — see LICENSE for details.


Developed by EnDevSolsGitHubLongProbe Core

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_longprobe-0.1.1.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_longprobe-0.1.1-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_longprobe-0.1.1.tar.gz.

File metadata

  • Download URL: langchain_longprobe-0.1.1.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_longprobe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6c8f2609920c67befc03e9606a36aa4b68c467c52d374581bed33cc2820d7c3c
MD5 e27c123c5d714b1138aefde7efa47b54
BLAKE2b-256 fa2f53926d5f8e7575242ffb3ffae97e5bcac6ae123a9fb2819ecce64c9fb54d

See more details on using hashes here.

File details

Details for the file langchain_longprobe-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_longprobe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e700d5563bf38427b1b7f1f2661e19dd454347ebaf7f765c8e38b4119a2910e1
MD5 1d44b0133e0d52eb65279117641b67e2
BLAKE2b-256 c6118fba50d82909a0004f4818e3e4fac8f6545ba88f4f8977f7946f4d612b35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page