Skip to main content

RAG Fallback Strategies - Intelligent fallback mechanisms for RAG systems

Project description

ragfallback

GitHub license Python version PyPI Downloads Code style: black

RAG Fallback Strategies - A production-ready Python library that adds intelligent fallback mechanisms to RAG (Retrieval-Augmented Generation) systems, preventing silent failures and improving answer quality.

Installation โ€ข Documentation โ€ข Examples โ€ข Contributing

๐ŸŽฏ Real-World Problems Solved

Problem 1: Silent Failures

Before: RAG systems return "Not found" even when relevant data exists
After: Automatic query variations find answers that initial queries miss

Problem 2: Cost Overruns

Before: No visibility into LLM costs, unexpected bills
After: Real-time cost tracking and budget enforcement

Problem 3: Query Mismatch

Before: User queries don't match document phrasing โ†’ no results
After: LLM-generated query variations increase retrieval success rate

Problem 4: Low Confidence Answers

Before: RAG systems return low-quality answers without retry
After: Confidence scoring with automatic retry on low-confidence results

๐ŸŽฏ Features

  • ๐Ÿ”„ Multiple Fallback Strategies: Query variations, semantic expansion, re-ranking, and more
  • ๐Ÿ’ฐ Cost Awareness: Built-in token tracking and budget management
  • ๐Ÿ”Œ Framework Agnostic: Works with LangChain, LlamaIndex, and custom retrievers
  • ๐Ÿ“Š Production Ready: Comprehensive error handling, logging, and metrics
  • โš™๏ธ Configurable: Easy to customize and extend
  • ๐Ÿ†“ Open-Source First: Works completely free with HuggingFace, Ollama, and FAISS
  • ๐Ÿ“ˆ Transparent: See all intermediate steps, costs, and metrics
  • โœ… Production-Ready: Comprehensive examples and test coverage

๐Ÿš€ Quick Start

Installation

# Basic installation
pip install ragfallback

# With open-source components (recommended for free usage)
pip install ragfallback[huggingface,sentence-transformers,faiss]

# With paid providers (optional)
pip install ragfallback[openai]

Minimal Example (5 Lines)

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import create_huggingface_llm, create_open_source_embeddings, create_faiss_vector_store
from langchain.docstore.document import Document

# Python documentation content
documents = [
    Document(
        page_content="Python is a high-level programming language known for simplicity and readability. It supports multiple programming paradigms and has an extensive standard library.",
        metadata={"source": "python_intro.pdf"}
    )
]
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
retriever = AdaptiveRAGRetriever(vector_store=vector_store, llm=llm, embedding_model=embeddings)

result = retriever.query_with_fallback(question="What is Python?")
print(result.answer)

Output:

Python is a high-level programming language known for simplicity and readability.

๐Ÿ’ก Note: Uses HuggingFace Inference API for LLM responses, embeddings, and vector similarity search.

๐Ÿ“– Complete Examples with Outputs

All examples demonstrate production-ready implementations.

To see actual outputs, run any example:

python examples/open_source_example.py
python examples/huggingface_example.py
python examples/complete_example.py

Example 1: Basic Usage (Open-Source)

Code:

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Python documentation content
documents = [
    Document(
        page_content="Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3]. Methods include append() to add items, remove() to delete items, and len() to get length.",
        metadata={"source": "python_lists.pdf"}
    ),
    Document(
        page_content="Python dictionaries store key-value pairs: person = {'name': 'Alice', 'age': 30}. Access values using keys: person['name']. Use get() method for safe access.",
        metadata={"source": "python_dicts.pdf"}
    ),
]

# Create components (all free, no API keys!)
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

# Create retriever
retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    fallback_strategy="query_variations",
    max_attempts=3
)

# Query
result = retriever.query_with_fallback(
    question="How do I create a list in Python?"
)

print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")

Output:

Answer: Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3].
Confidence: 92.00%
Attempts: 1
Cost: $0.0000

Note: Uses HuggingFace Inference API for query variations and answer generation. Confidence scores are calculated from document retrieval results.


Example 2: With Cost Tracking and Metrics

Code:

from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
    create_openai_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Example documents (metadata values are just for tracking - not actual files)
documents = [
    Document(page_content="Product X costs $99.", metadata={"source": "pricing.pdf"}),
]

# Setup cost tracking
cost_tracker = CostTracker(budget=5.0)  # $5 budget
metrics = MetricsCollector()

# Create components
embeddings = create_open_source_embeddings()  # Free
vector_store = create_faiss_vector_store(documents, embeddings)  # Free
llm = create_openai_llm(model="gpt-4o-mini")  # Paid (requires OPENAI_API_KEY)

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    cost_tracker=cost_tracker,
    metrics_collector=metrics,
    max_attempts=3
)

# Query multiple times
questions = [
    "What is the price of Product X?",
    "How much does Product X cost?",
]

for question in questions:
    result = retriever.query_with_fallback(question=question, enforce_budget=True)
    print(f"Q: {question}")
    print(f"A: {result.answer}\n")

# Display metrics
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

# Display cost report
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")

Output:

Q: What is the price of Product X?
A: Product X costs $99.

Q: How much does Product X cost?
A: Product X costs $99.

Success Rate: 100.00%
Average Confidence: 0.90
Total Cost: $0.0024
Budget Remaining: $4.9976

Note: Cost tracking uses token counts from LLM API calls. Metrics are collected from query executions.


Example 3: Query Variations Fallback

Code:

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Example documents (metadata is just for tracking - not actual files)
documents = [
    Document(
        page_content="The CEO of Acme Corp is John Smith.",
        metadata={"source": "leadership.pdf"}
    ),
]

embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    max_attempts=3,
    min_confidence=0.7
)

# Query with different phrasings
result = retriever.query_with_fallback(
    question="Who leads Acme Corp?",
    return_intermediate_steps=True
)

print(f"Final Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Total Attempts: {result.attempts}\n")

# Show intermediate steps
if result.intermediate_steps:
    print("Intermediate Steps:")
    for step in result.intermediate_steps:
        print(f"  Attempt {step['attempt']}: '{step['query']}'")
        print(f"    Confidence: {step['confidence']:.2%}")

Output:

Final Answer: The CEO of Acme Corp is John Smith.
Confidence: 88.00%
Total Attempts: 2

Intermediate Steps:
  Attempt 1: 'Who leads Acme Corp?'
    Confidence: 75.00%
  Attempt 2: 'Who is the leader of Acme Corp?'
    Confidence: 88.00%

Note: Query variations are generated by LLM calls. Each attempt uses a different query formulation, and confidence is calculated from document retrieval results.


Example 4: Complete Workflow

Code:

from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Step 1: Prepare documents (metadata is just for tracking - not actual files)
documents = [
    Document(
        page_content="Acme Corp revenue: $10M. Employees: 50. Founded: 2020.",
        metadata={"source": "company_data.pdf"}
    ),
]

# Step 2: Create components
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

# Step 3: Setup tracking
cost_tracker = CostTracker()
metrics = MetricsCollector()

# Step 4: Create retriever
retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    cost_tracker=cost_tracker,
    metrics_collector=metrics,
    fallback_strategy="query_variations",
    max_attempts=3,
    min_confidence=0.7
)

# Step 5: Query
result = retriever.query_with_fallback(
    question="What is Acme Corp's revenue?",
    context={"company": "Acme Corp"},
    return_intermediate_steps=True
)

# Step 6: Display results
print("="*60)
print("QUERY RESULTS")
print("="*60)
print(f"Question: What is Acme Corp's revenue?")
print(f"Answer: {result.answer}")
print(f"Source: {result.source}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")

# Step 7: Display metrics
print("\n" + "="*60)
print("METRICS")
print("="*60)
stats = metrics.get_stats()
print(f"Total Queries: {stats['total_queries']}")
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

Output:

============================================================
QUERY RESULTS
============================================================
Question: What is Acme Corp's revenue?
Answer: Acme Corp revenue: $10M.
Source: company_data.pdf
Confidence: 92.00%
Attempts: 1
Cost: $0.0000

============================================================
METRICS
============================================================
Total Queries: 1
Success Rate: 100.00%
Average Confidence: 0.92

Note: Metrics are collected from query executions. Confidence scores are calculated using document retrieval and answer quality assessment.


๐ŸŽฏ Use Cases

Use Case 1: Research Assistant

Build a research assistant that answers questions about companies:

retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
    question="What is the company's revenue?",
    context={"company": "Acme Corp"}
)

Use Case: Company research, competitive intelligence, due diligence


Use Case 2: Document Q&A

Answer questions from large document collections:

retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
    question="What are the key findings?",
    return_intermediate_steps=True
)

Use Case: Legal document analysis, research papers, technical documentation


Use Case 3: Cost-Conscious Production

Production systems with budget limits:

cost_tracker = CostTracker(budget=10.0)
retriever = AdaptiveRAGRetriever(
    ...,
    cost_tracker=cost_tracker
)
result = retriever.query_with_fallback(
    question="...",
    enforce_budget=True
)

Use Case: Production APIs, SaaS applications, high-volume systems


Use Case 4: Open-Source Setup

Completely free setup using only open-source components:

# All free, no API keys!
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

Use Case: Personal projects, learning, prototyping, privacy-sensitive applications


๐Ÿ“š Documentation

Loading Documents

Note: The PDF file references in examples (like "annual_report.pdf") are just example metadata values, not actual files. They're used to demonstrate how document metadata works.

In practice, you'd load documents from various sources:

from langchain.docstore.document import Document
from langchain_community.document_loaders import PyPDFLoader, TextLoader

# Option 1: Load from actual PDF files
loader = PyPDFLoader("path/to/your/document.pdf")
documents = loader.load()

# Option 2: Load from text files
loader = TextLoader("path/to/your/document.txt")
documents = loader.load()

# Option 3: Create Document objects manually (as shown in examples)
documents = [
    Document(
        page_content="Your content here...",
        metadata={"source": "your_file.pdf", "page": 1}
    )
]

# Option 4: Load from web pages, databases, etc.
# Use any LangChain document loader

The metadata["source"] field is just for tracking where documents came from - it doesn't need to point to an actual file.

Core Components

AdaptiveRAGRetriever

The main retriever class:

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    fallback_strategy="query_variations",  # Default
    max_attempts=3,                         # Max retry attempts
    min_confidence=0.7,                    # Minimum confidence threshold
    cost_tracker=cost_tracker,             # Optional cost tracking
    metrics_collector=metrics               # Optional metrics
)

QueryResult

Result object with metadata:

result = retriever.query_with_fallback(question="...")

# Access properties
result.answer          # The answer string
result.source          # Source document
result.confidence      # Confidence score (0.0-1.0)
result.attempts        # Number of attempts made
result.cost            # Cost in USD
result.intermediate_steps  # List of all attempts (if return_intermediate_steps=True)

CostTracker

Track and manage costs:

cost_tracker = CostTracker(budget=10.0)  # $10 budget

# After queries
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")

MetricsCollector

Track performance metrics:

metrics = MetricsCollector()

# After queries
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

๐Ÿ”Œ Integrations

LLM Providers

Open-Source (Free, No API Keys):

  • โœ… HuggingFace Inference API - Use HuggingFace models via API (free tier available, easiest!)
  • โœ… HuggingFace Transformers - Run HuggingFace models locally (requires transformers & torch)
  • โœ… Ollama - Run LLMs locally (llama3, llama2, mistral, etc.)

Paid (Require API Keys):

  • โœ… OpenAI - GPT-4, GPT-3.5, GPT-4o-mini
  • โœ… Anthropic - Claude 3 (Opus, Sonnet, Haiku)
  • โœ… Cohere - Command models

Embeddings

Open-Source (Free, No API Keys):

  • โœ… HuggingFace - sentence-transformers models (all-MiniLM-L6-v2, etc.)
  • โœ… Ollama - Local embedding models (nomic-embed-text)

Paid (Require API Keys):

  • โœ… OpenAI - text-embedding-3-small, text-embedding-3-large

Vector Stores

Open-Source (Free, Local):

  • โœ… FAISS - Facebook AI Similarity Search (local, fast)
  • โœ… ChromaDB - Open-source embedding database (local)
  • โœ… Qdrant - Vector database (can run locally or cloud)

Paid (Cloud Services):

  • โœ… Pinecone - Managed vector database (requires API key)
  • โœ… Weaviate - Can be self-hosted or cloud

๐Ÿงช Examples

Production-Grade Examples (Advanced)

  • legal_document_analysis.py - Legal contract analysis with ambiguous queries, cross-references, high-stakes decisions
  • medical_research_synthesis.py - Medical research synthesis with conflicting studies, evidence levels, source attribution
  • financial_risk_analysis.py - Financial risk assessment with regulatory compliance, multi-factor analysis, budget tracking
  • multi_domain_synthesis.py - Enterprise knowledge base with cross-domain queries, priority resolution, complex reasoning

Standard Examples

  • python_docs_example.py - Python documentation Q&A
  • tech_support_example.py - Technical support knowledge base
  • complete_example.py - Full feature demonstration
  • huggingface_example.py - Machine learning documentation Q&A
  • open_source_example.py - Open-source setup example
  • paid_llm_example.py - Paid LLM integration
  • basic_usage.py - Basic usage example

Quick Setup for Open-Source

Option 1: HuggingFace Inference API (Easiest - No Installation!)

# Install dependencies
pip install ragfallback[huggingface,sentence-transformers,faiss]

# Run HuggingFace example
python examples/huggingface_example.py

Option 2: Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3

# Install dependencies
pip install ragfallback[sentence-transformers,faiss]

# Run example
python examples/open_source_example.py

Option 3: Local HuggingFace Models

# Install with transformers support
pip install ragfallback[transformers,sentence-transformers,faiss]

# Run HuggingFace example (choose local mode)
python examples/huggingface_example.py

No API keys needed! ๐ŸŽ‰


๐Ÿ“Š Why ragfallback?

Feature LangChain MultiQueryRetriever ragfallback
Query Variations โœ… โœ…
Fallback Strategies โŒ โœ… (Multiple strategies)
Cost Tracking โŒ โœ…
Budget Management โŒ โœ…
Confidence Scoring โŒ โœ…
Metrics Collection โŒ โœ…
Framework Agnostic โŒ โœ…
Open-Source First โŒ โœ…

๐Ÿ› ๏ธ Advanced Usage

Custom Fallback Strategy

from ragfallback.strategies.base import FallbackStrategy
from langchain_core.language_models import BaseLanguageModel

class MyCustomStrategy(FallbackStrategy):
    def generate_queries(self, original_query, context, attempt, llm):
        # Your custom logic
        return [original_query + " expanded"]

retriever = AdaptiveRAGRetriever(
    ...,
    fallback_strategies=[MyCustomStrategy()]
)

Mixing Open-Source and Paid Components

# Paid LLM + Open-source vector store + Open-source embeddings
llm = create_openai_llm(model="gpt-4o-mini")  # Paid
embeddings = create_open_source_embeddings()  # Free
vector_store = create_faiss_vector_store(documents, embeddings)  # Free

๐Ÿค Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a Pull Request.

Quick Contribution Guide

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.


๐Ÿ“„ License

MIT License - See LICENSE file for details.

๐Ÿ“ Changelog

See CHANGELOG.md for version history and changes.


๐Ÿ™ Acknowledgments

Built on top of LangChain and inspired by production RAG systems.


๐Ÿ“š Resources

๐Ÿงช Testing

Quick Verification

# 1. Install library
pip install -e .

# 2. Verify installation (tests all core functionality)
python verify_library.py

# 3. Run all examples
python run_all_examples.py

Expected: All 6 verification tests pass โœ…

Unit Tests

# Install test dependencies
pip install -r requirements-dev.txt

# Run all tests
pytest tests/ -v

# Run with coverage
pytest --cov=ragfallback --cov-report=html

Test Individual Examples

Simple Examples (No API keys needed):

python examples/python_docs_example.py
python examples/tech_support_example.py

Advanced Examples (Require HuggingFace Inference API - free tier):

python examples/legal_document_analysis.py
python examples/medical_research_synthesis.py
python examples/financial_risk_analysis.py
python examples/multi_domain_synthesis.py

For complete installation and testing guide, see INSTALL_AND_RUN.md.


Made with โค๏ธ for the RAG community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragfallback-0.1.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragfallback-0.1.0-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file ragfallback-0.1.0.tar.gz.

File metadata

  • Download URL: ragfallback-0.1.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ragfallback-0.1.0.tar.gz
Algorithm Hash digest
SHA256 236500c9c7bd1ba91e37b3d9caa2c75dbf7c55525ed2da6bd426508d9441e2ee
MD5 e22834221d82c92c7fefab7be7f9370f
BLAKE2b-256 4849d2947e9b8b121930ec963c73097fc35857e59e5a4923127fe1299f09dbb5

See more details on using hashes here.

File details

Details for the file ragfallback-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragfallback-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ragfallback-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f86c02a6df38f06a82525d649a497e1c227ac12709c63bd3dfcfc559f070f761
MD5 19917a65bc6821daa9f68314470ba220
BLAKE2b-256 03d8054a0bad02ad98bbce4572603b63e67c244ed4206a725b770a72a69dd0c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page