RAG Fallback Strategies - Intelligent fallback mechanisms for RAG systems

These details have not been verified by PyPI

Project links

Project description

ragfallback

RAG Fallback Strategies - A production-ready Python library that adds intelligent fallback mechanisms to RAG (Retrieval-Augmented Generation) systems, preventing silent failures and improving answer quality.

Installation • Documentation • Examples • Contributing

🎯 Real-World Problems Solved

Problem 1: Silent Failures

Before: RAG systems return "Not found" even when relevant data exists
After: Automatic query variations find answers that initial queries miss

Problem 2: Cost Overruns

Before: No visibility into LLM costs, unexpected bills
After: Real-time cost tracking and budget enforcement

Problem 3: Query Mismatch

Before: User queries don't match document phrasing → no results
After: LLM-generated query variations increase retrieval success rate

Problem 4: Low Confidence Answers

Before: RAG systems return low-quality answers without retry
After: Confidence scoring with automatic retry on low-confidence results

🎯 Features

🔄 Multiple Fallback Strategies: Query variations, semantic expansion, re-ranking, and more
💰 Cost Awareness: Built-in token tracking and budget management
🔌 Framework Agnostic: Works with LangChain, LlamaIndex, and custom retrievers
📊 Production Ready: Comprehensive error handling, logging, and metrics
⚙️ Configurable: Easy to customize and extend
🆓 Open-Source First: Works completely free with HuggingFace, Ollama, and FAISS
📈 Transparent: See all intermediate steps, costs, and metrics
✅ Production-Ready: Comprehensive examples and test coverage

🚀 Quick Start

Installation

# Basic installation
pip install ragfallback

# With open-source components (recommended for free usage)
pip install ragfallback[huggingface,sentence-transformers,faiss]

# With paid providers (optional)
pip install ragfallback[openai]

Minimal Example (5 Lines)

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import create_huggingface_llm, create_open_source_embeddings, create_faiss_vector_store
from langchain.docstore.document import Document

# Python documentation content
documents = [
    Document(
        page_content="Python is a high-level programming language known for simplicity and readability. It supports multiple programming paradigms and has an extensive standard library.",
        metadata={"source": "python_intro.pdf"}
    )
]
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)
retriever = AdaptiveRAGRetriever(vector_store=vector_store, llm=llm, embedding_model=embeddings)

result = retriever.query_with_fallback(question="What is Python?")
print(result.answer)

Output:

Python is a high-level programming language known for simplicity and readability.

💡 Note: Uses HuggingFace Inference API for LLM responses, embeddings, and vector similarity search.

📖 Complete Examples with Outputs

All examples demonstrate production-ready implementations.

To see actual outputs, run any example:

python examples/open_source_example.py
python examples/huggingface_example.py
python examples/complete_example.py

Example 1: Basic Usage (Open-Source)

Code:

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Python documentation content
documents = [
    Document(
        page_content="Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3]. Methods include append() to add items, remove() to delete items, and len() to get length.",
        metadata={"source": "python_lists.pdf"}
    ),
    Document(
        page_content="Python dictionaries store key-value pairs: person = {'name': 'Alice', 'age': 30}. Access values using keys: person['name']. Use get() method for safe access.",
        metadata={"source": "python_dicts.pdf"}
    ),
]

# Create components (all free, no API keys!)
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

# Create retriever
retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    fallback_strategy="query_variations",
    max_attempts=3
)

# Query
result = retriever.query_with_fallback(
    question="How do I create a list in Python?"
)

print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")

Output:

Answer: Python lists are mutable sequences created with square brackets: my_list = [1, 2, 3].
Confidence: 92.00%
Attempts: 1
Cost: $0.0000

Note: Uses HuggingFace Inference API for query variations and answer generation. Confidence scores are calculated from document retrieval results.

Example 2: With Cost Tracking and Metrics

Code:

from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
    create_openai_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Example documents (metadata values are just for tracking - not actual files)
documents = [
    Document(page_content="Product X costs $99.", metadata={"source": "pricing.pdf"}),
]

# Setup cost tracking
cost_tracker = CostTracker(budget=5.0)  # $5 budget
metrics = MetricsCollector()

# Create components
embeddings = create_open_source_embeddings()  # Free
vector_store = create_faiss_vector_store(documents, embeddings)  # Free
llm = create_openai_llm(model="gpt-4o-mini")  # Paid (requires OPENAI_API_KEY)

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    cost_tracker=cost_tracker,
    metrics_collector=metrics,
    max_attempts=3
)

# Query multiple times
questions = [
    "What is the price of Product X?",
    "How much does Product X cost?",
]

for question in questions:
    result = retriever.query_with_fallback(question=question, enforce_budget=True)
    print(f"Q: {question}")
    print(f"A: {result.answer}\n")

# Display metrics
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

# Display cost report
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")

Output:

Q: What is the price of Product X?
A: Product X costs $99.

Q: How much does Product X cost?
A: Product X costs $99.

Success Rate: 100.00%
Average Confidence: 0.90
Total Cost: $0.0024
Budget Remaining: $4.9976

Note: Cost tracking uses token counts from LLM API calls. Metrics are collected from query executions.

Example 3: Query Variations Fallback

Code:

from ragfallback import AdaptiveRAGRetriever
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Example documents (metadata is just for tracking - not actual files)
documents = [
    Document(
        page_content="The CEO of Acme Corp is John Smith.",
        metadata={"source": "leadership.pdf"}
    ),
]

embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    max_attempts=3,
    min_confidence=0.7
)

# Query with different phrasings
result = retriever.query_with_fallback(
    question="Who leads Acme Corp?",
    return_intermediate_steps=True
)

print(f"Final Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Total Attempts: {result.attempts}\n")

# Show intermediate steps
if result.intermediate_steps:
    print("Intermediate Steps:")
    for step in result.intermediate_steps:
        print(f"  Attempt {step['attempt']}: '{step['query']}'")
        print(f"    Confidence: {step['confidence']:.2%}")

Output:

Final Answer: The CEO of Acme Corp is John Smith.
Confidence: 88.00%
Total Attempts: 2

Intermediate Steps:
  Attempt 1: 'Who leads Acme Corp?'
    Confidence: 75.00%
  Attempt 2: 'Who is the leader of Acme Corp?'
    Confidence: 88.00%

Note: Query variations are generated by LLM calls. Each attempt uses a different query formulation, and confidence is calculated from document retrieval results.

Example 4: Complete Workflow

Code:

from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
from ragfallback.utils import (
    create_huggingface_llm,
    create_open_source_embeddings,
    create_faiss_vector_store
)
from langchain.docstore.document import Document

# Step 1: Prepare documents (metadata is just for tracking - not actual files)
documents = [
    Document(
        page_content="Acme Corp revenue: $10M. Employees: 50. Founded: 2020.",
        metadata={"source": "company_data.pdf"}
    ),
]

# Step 2: Create components
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

# Step 3: Setup tracking
cost_tracker = CostTracker()
metrics = MetricsCollector()

# Step 4: Create retriever
retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    cost_tracker=cost_tracker,
    metrics_collector=metrics,
    fallback_strategy="query_variations",
    max_attempts=3,
    min_confidence=0.7
)

# Step 5: Query
result = retriever.query_with_fallback(
    question="What is Acme Corp's revenue?",
    context={"company": "Acme Corp"},
    return_intermediate_steps=True
)

# Step 6: Display results
print("="*60)
print("QUERY RESULTS")
print("="*60)
print(f"Question: What is Acme Corp's revenue?")
print(f"Answer: {result.answer}")
print(f"Source: {result.source}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Attempts: {result.attempts}")
print(f"Cost: ${result.cost:.4f}")

# Step 7: Display metrics
print("\n" + "="*60)
print("METRICS")
print("="*60)
stats = metrics.get_stats()
print(f"Total Queries: {stats['total_queries']}")
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

Output:

============================================================
QUERY RESULTS
============================================================
Question: What is Acme Corp's revenue?
Answer: Acme Corp revenue: $10M.
Source: company_data.pdf
Confidence: 92.00%
Attempts: 1
Cost: $0.0000

============================================================
METRICS
============================================================
Total Queries: 1
Success Rate: 100.00%
Average Confidence: 0.92

Note: Metrics are collected from query executions. Confidence scores are calculated using document retrieval and answer quality assessment.

🎯 Use Cases

Use Case 1: Research Assistant

Build a research assistant that answers questions about companies:

retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
    question="What is the company's revenue?",
    context={"company": "Acme Corp"}
)

Use Case: Company research, competitive intelligence, due diligence

Use Case 2: Document Q&A

Answer questions from large document collections:

retriever = AdaptiveRAGRetriever(...)
result = retriever.query_with_fallback(
    question="What are the key findings?",
    return_intermediate_steps=True
)

Use Case: Legal document analysis, research papers, technical documentation

Use Case 3: Cost-Conscious Production

Production systems with budget limits:

cost_tracker = CostTracker(budget=10.0)
retriever = AdaptiveRAGRetriever(
    ...,
    cost_tracker=cost_tracker
)
result = retriever.query_with_fallback(
    question="...",
    enforce_budget=True
)

Use Case: Production APIs, SaaS applications, high-volume systems

Use Case 4: Open-Source Setup

Completely free setup using only open-source components:

# All free, no API keys!
embeddings = create_open_source_embeddings()
vector_store = create_faiss_vector_store(documents, embeddings)
llm = create_huggingface_llm(use_inference_api=True)

Use Case: Personal projects, learning, prototyping, privacy-sensitive applications

📚 Documentation

Loading Documents

Note: The PDF file references in examples (like "annual_report.pdf") are just example metadata values, not actual files. They're used to demonstrate how document metadata works.

In practice, you'd load documents from various sources:

from langchain.docstore.document import Document
from langchain_community.document_loaders import PyPDFLoader, TextLoader

# Option 1: Load from actual PDF files
loader = PyPDFLoader("path/to/your/document.pdf")
documents = loader.load()

# Option 2: Load from text files
loader = TextLoader("path/to/your/document.txt")
documents = loader.load()

# Option 3: Create Document objects manually (as shown in examples)
documents = [
    Document(
        page_content="Your content here...",
        metadata={"source": "your_file.pdf", "page": 1}
    )
]

# Option 4: Load from web pages, databases, etc.
# Use any LangChain document loader

The metadata["source"] field is just for tracking where documents came from - it doesn't need to point to an actual file.

Core Components

AdaptiveRAGRetriever

The main retriever class:

retriever = AdaptiveRAGRetriever(
    vector_store=vector_store,
    llm=llm,
    embedding_model=embeddings,
    fallback_strategy="query_variations",  # Default
    max_attempts=3,                         # Max retry attempts
    min_confidence=0.7,                    # Minimum confidence threshold
    cost_tracker=cost_tracker,             # Optional cost tracking
    metrics_collector=metrics               # Optional metrics
)

QueryResult

Result object with metadata:

result = retriever.query_with_fallback(question="...")

# Access properties
result.answer          # The answer string
result.source          # Source document
result.confidence      # Confidence score (0.0-1.0)
result.attempts        # Number of attempts made
result.cost            # Cost in USD
result.intermediate_steps  # List of all attempts (if return_intermediate_steps=True)

CostTracker

Track and manage costs:

cost_tracker = CostTracker(budget=10.0)  # $10 budget

# After queries
report = cost_tracker.get_report()
print(f"Total Cost: ${report['total_cost']:.4f}")
print(f"Budget Remaining: ${report['budget_remaining']:.4f}")

MetricsCollector

Track performance metrics:

metrics = MetricsCollector()

# After queries
stats = metrics.get_stats()
print(f"Success Rate: {stats['success_rate']:.2%}")
print(f"Average Confidence: {stats['avg_confidence']:.2f}")

🔌 Integrations

LLM Providers

Open-Source (Free, No API Keys):

✅ HuggingFace Inference API - Use HuggingFace models via API (free tier available, easiest!)
✅ HuggingFace Transformers - Run HuggingFace models locally (requires transformers & torch)
✅ Ollama - Run LLMs locally (llama3, llama2, mistral, etc.)

Paid (Require API Keys):

✅ OpenAI - GPT-4, GPT-3.5, GPT-4o-mini
✅ Anthropic - Claude 3 (Opus, Sonnet, Haiku)
✅ Cohere - Command models

Embeddings

Open-Source (Free, No API Keys):

✅ HuggingFace - sentence-transformers models (all-MiniLM-L6-v2, etc.)
✅ Ollama - Local embedding models (nomic-embed-text)

Paid (Require API Keys):

✅ OpenAI - text-embedding-3-small, text-embedding-3-large

Vector Stores

Open-Source (Free, Local):

✅ FAISS - Facebook AI Similarity Search (local, fast)
✅ ChromaDB - Open-source embedding database (local)
✅ Qdrant - Vector database (can run locally or cloud)

Paid (Cloud Services):

✅ Pinecone - Managed vector database (requires API key)
✅ Weaviate - Can be self-hosted or cloud

🧪 Examples

Production-Grade Examples (Advanced)

legal_document_analysis.py - Legal contract analysis with ambiguous queries, cross-references, high-stakes decisions
medical_research_synthesis.py - Medical research synthesis with conflicting studies, evidence levels, source attribution
financial_risk_analysis.py - Financial risk assessment with regulatory compliance, multi-factor analysis, budget tracking
multi_domain_synthesis.py - Enterprise knowledge base with cross-domain queries, priority resolution, complex reasoning

Standard Examples

python_docs_example.py - Python documentation Q&A
tech_support_example.py - Technical support knowledge base
complete_example.py - Full feature demonstration
huggingface_example.py - Machine learning documentation Q&A
open_source_example.py - Open-source setup example
paid_llm_example.py - Paid LLM integration
basic_usage.py - Basic usage example

Quick Setup for Open-Source

Option 1: HuggingFace Inference API (Easiest - No Installation!)

# Install dependencies
pip install ragfallback[huggingface,sentence-transformers,faiss]

# Run HuggingFace example
python examples/huggingface_example.py

Option 2: Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3

# Install dependencies
pip install ragfallback[sentence-transformers,faiss]

# Run example
python examples/open_source_example.py

Option 3: Local HuggingFace Models

# Install with transformers support
pip install ragfallback[transformers,sentence-transformers,faiss]

# Run HuggingFace example (choose local mode)
python examples/huggingface_example.py

No API keys needed! 🎉

📊 Why ragfallback?

Feature	LangChain MultiQueryRetriever	ragfallback
Query Variations	✅	✅
Fallback Strategies	❌	✅ (Multiple strategies)
Cost Tracking	❌	✅
Budget Management	❌	✅
Confidence Scoring	❌	✅
Metrics Collection	❌	✅
Framework Agnostic	❌	✅
Open-Source First	❌	✅

🛠️ Advanced Usage

Custom Fallback Strategy

from ragfallback.strategies.base import FallbackStrategy
from langchain_core.language_models import BaseLanguageModel

class MyCustomStrategy(FallbackStrategy):
    def generate_queries(self, original_query, context, attempt, llm):
        # Your custom logic
        return [original_query + " expanded"]

retriever = AdaptiveRAGRetriever(
    ...,
    fallback_strategies=[MyCustomStrategy()]
)

Mixing Open-Source and Paid Components

# Paid LLM + Open-source vector store + Open-source embeddings
llm = create_openai_llm(model="gpt-4o-mini")  # Paid
embeddings = create_open_source_embeddings()  # Free
vector_store = create_faiss_vector_store(documents, embeddings)  # Free

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a Pull Request.

Quick Contribution Guide

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

📄 License

MIT License - See LICENSE file for details.

📝 Changelog

See CHANGELOG.md for version history and changes.

🙏 Acknowledgments

Built on top of LangChain and inspired by production RAG systems.

📚 Resources

🧪 Testing

Quick Verification

# 1. Install library
pip install -e .

# 2. Verify installation (tests all core functionality)
python verify_library.py

# 3. Run all examples
python run_all_examples.py

Expected: All 6 verification tests pass ✅

Unit Tests

# Install test dependencies
pip install -r requirements-dev.txt

# Run all tests
pytest tests/ -v

# Run with coverage
pytest --cov=ragfallback --cov-report=html

Test Individual Examples

Simple Examples (No API keys needed):

python examples/python_docs_example.py
python examples/tech_support_example.py

Advanced Examples (Require HuggingFace Inference API - free tier):

python examples/legal_document_analysis.py
python examples/medical_research_synthesis.py
python examples/financial_risk_analysis.py
python examples/multi_domain_synthesis.py

For complete installation and testing guide, see INSTALL_AND_RUN.md.

Made with ❤️ for the RAG community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.2.0

Apr 4, 2026

2.1.0

Apr 3, 2026

2.0.2

Mar 21, 2026

2.0.1

Mar 21, 2026

2.0.0

Mar 21, 2026

This version

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragfallback-0.1.0.tar.gz (50.4 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragfallback-0.1.0-py3-none-any.whl (32.0 kB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file ragfallback-0.1.0.tar.gz.

File metadata

Download URL: ragfallback-0.1.0.tar.gz
Upload date: Dec 3, 2025
Size: 50.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ragfallback-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`236500c9c7bd1ba91e37b3d9caa2c75dbf7c55525ed2da6bd426508d9441e2ee`
MD5	`e22834221d82c92c7fefab7be7f9370f`
BLAKE2b-256	`4849d2947e9b8b121930ec963c73097fc35857e59e5a4923127fe1299f09dbb5`

See more details on using hashes here.

File details

Details for the file ragfallback-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragfallback-0.1.0-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 32.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ragfallback-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f86c02a6df38f06a82525d649a497e1c227ac12709c63bd3dfcfc559f070f761`
MD5	`19917a65bc6821daa9f68314470ba220`
BLAKE2b-256	`03d8054a0bad02ad98bbce4572603b63e67c244ed4206a725b770a72a69dd0c3`

See more details on using hashes here.

ragfallback 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ragfallback

🎯 Real-World Problems Solved

Problem 1: Silent Failures

Problem 2: Cost Overruns

Problem 3: Query Mismatch

Problem 4: Low Confidence Answers

🎯 Features

🚀 Quick Start

Installation

Minimal Example (5 Lines)

📖 Complete Examples with Outputs

Example 1: Basic Usage (Open-Source)

Example 2: With Cost Tracking and Metrics

Example 3: Query Variations Fallback

Example 4: Complete Workflow

🎯 Use Cases

Use Case 1: Research Assistant

Use Case 2: Document Q&A

Use Case 3: Cost-Conscious Production

Use Case 4: Open-Source Setup

📚 Documentation

Loading Documents

Core Components

AdaptiveRAGRetriever

QueryResult

CostTracker

MetricsCollector

🔌 Integrations

LLM Providers

Embeddings

Vector Stores

🧪 Examples

Production-Grade Examples (Advanced)

Standard Examples

Quick Setup for Open-Source

📊 Why ragfallback?

🛠️ Advanced Usage

Custom Fallback Strategy

Mixing Open-Source and Paid Components

🤝 Contributing

Quick Contribution Guide

📄 License

📝 Changelog

🙏 Acknowledgments

📚 Resources

🧪 Testing

Quick Verification

Unit Tests

Test Individual Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes