Skip to main content

Compare RAG patterns with real benchmarks. Run one command, see which pattern wins.

Project description

rag-playbook

Stop guessing which RAG pattern to use. Compare them with real numbers.

PyPI Tests License: MIT Python 3.11+

Quick Start · Patterns · Decision Guide · Architecture · CLI Reference


Every RAG tutorial teaches you how to build patterns. None of them tell you which one to actually use.

rag-playbook runs the same query against 8 production-tested RAG patterns and shows you which one wins — with real numbers for quality, latency, and cost.

Quick Start

pip install rag-playbook[openai]
export OPENAI_API_KEY=sk-...

# Compare all patterns on your documents
rag-playbook compare --data ./my_docs/ --query "What is the refund policy?"

rag-playbook compare output

Patterns

# Pattern Best For Latency Cost
01 Naive Simple factual queries ~1s $
02 Hybrid Search Queries with codes, IDs, exact terms ~1.1s $
03 Re-ranking When top-K retrieval isn't precise enough ~1.4s $$
04 Parent-Child Long documents with clear sections ~1s $
05 Query Decomposition Complex multi-part questions ~2.1s $$$
06 HyDE Short or ambiguous queries ~1.5s $$
07 Self-Correcting When hallucination risk is high ~2.8s $$$
08 Agentic When query intent is unclear ~3.2s $$$$

Which pattern should I use? (decision guide)

How is this different from [X]?

NirDiamant/RAG_Techniques FlashRAG Ragas rag-playbook
Format Jupyter notebooks Academic library Eval metrics Library + CLI
pip install No Complex Yes Yes (simple)
Benchmarks None Academic N/A Practical comparison
"Which to use?" No No No YES
License Non-commercial MIT Apache-2.0 MIT

Use as a Library

import asyncio
from rag_playbook import create_pattern, Settings
from rag_playbook.core.embedder import create_embedder
from rag_playbook.core.llm import create_llm
from rag_playbook.core.vector_store import create_vector_store

async def main():
    settings = Settings()  # Reads from .env / environment
    llm = create_llm(settings)
    embedder = create_embedder(settings)
    store = create_vector_store("memory")

    pattern = create_pattern("reranking", llm=llm, embedder=embedder, store=store)

    # Index your documents first (or use the CLI: rag-playbook ingest)
    result = await pattern.query("What is the refund policy?")

    print(result.answer)
    print(f"Cost: ${result.metadata.cost_usd:.4f}")
    print(f"Latency: {result.metadata.latency_ms:.0f}ms")

asyncio.run(main())

See examples/ for more usage patterns.

Installation

# Minimal (in-memory store, OpenAI)
pip install rag-playbook[openai]

# With ChromaDB
pip install rag-playbook[openai,chromadb]

# With pgvector
pip install rag-playbook[openai,pgvector]

# With re-ranking support (Pattern 03)
pip install rag-playbook[openai,chromadb,reranking]

# Everything
pip install rag-playbook[all]

From Source

git clone https://github.com/Aamirofficiall/rag-playbook.git
cd rag-playbook
uv venv --python 3.11 .venv
source .venv/bin/activate
uv pip install -e ".[dev,chromadb,openai]"

CLI Commands

Command Description
rag-playbook compare Compare patterns side-by-side on your documents
rag-playbook run Run a single pattern
rag-playbook recommend Get an LLM-powered pattern recommendation
rag-playbook ingest Load, chunk, embed, and index documents
rag-playbook bench Run full benchmark suite
rag-playbook patterns List all available patterns

Run a single pattern

rag-playbook run reranking --data ./docs/ --query "What are the laptop specs?"

rag-playbook run output

List available patterns

rag-playbook patterns

rag-playbook patterns output

Get a pattern recommendation

rag-playbook recommend --query "What is the refund policy?"

rag-playbook recommend output

Ingest documents

rag-playbook ingest --data ./sample_docs/

rag-playbook ingest output

See CLI Reference for full usage.

Configuration

Environment variables map directly to settings — no prefix needed:

# .env — Core settings
OPENAI_API_KEY=sk-...                  # Required for OpenAI provider
DEFAULT_LLM_PROVIDER=openai            # openai | anthropic
DEFAULT_LLM_MODEL=gpt-4o-mini         # Any model your provider supports
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSION=1536
DEFAULT_TOP_K=5
DEFAULT_CHUNK_SIZE=512
DEFAULT_CHUNK_OVERLAP=50

Using OpenRouter (or any OpenAI-compatible API)

Works with any OpenAI-compatible endpoint — OpenRouter, Azure OpenAI, Ollama, vLLM, etc:

OPENAI_API_KEY=sk-or-v1-...                     # Your OpenRouter key
OPENAI_BASE_URL=https://openrouter.ai/api/v1    # Custom base URL
DEFAULT_LLM_MODEL=openai/gpt-4o-mini            # OpenRouter model format

Vector store backends

The default in-memory store re-embeds every session. For persistent storage:

# ChromaDB (pip install rag-playbook[chromadb])
VECTOR_STORE_PROVIDER=chromadb

# pgvector (pip install rag-playbook[pgvector])
VECTOR_STORE_PROVIDER=pgvector
PGVECTOR_URL=postgresql://user:pass@localhost:5432/ragdb

# Qdrant (pip install rag-playbook[qdrant])
VECTOR_STORE_PROVIDER=qdrant
QDRANT_URL=http://localhost:6333

Or pass a Settings object directly in code. See .env.example for all options.

Architecture

Document → Chunk → EmbeddedChunk → RetrievedChunk → RAGResult
              │         │                │               │
          Chunker    Embedder       VectorStore        LLM

Design patterns used:

  • Strategy — Swappable LLM and embedding providers
  • Repository — Vector store abstraction
  • Template Method — BaseRAGPattern with overridable pipeline steps
  • Decorator — CachedEmbedder wrapping any embedder with SHA-256 keyed cache
  • Factorycreate_pattern(), create_llm(), create_embedder(), etc.

See Architecture Guide for details.

Development

make install    # Install with dev dependencies
make test       # Run unit tests
make lint       # Lint with ruff
make format     # Auto-format with ruff
make type-check # Type check with mypy
make check      # Run all checks

See CONTRIBUTING.md for the full guide.

Project Structure

src/rag_playbook/
├── core/
│   ├── llm.py           # LLM client (OpenAI, Anthropic)
│   ├── embedder.py       # Embedding with caching
│   ├── vector_store.py   # Vector store abstraction
│   ├── chunker.py        # Document chunking strategies
│   ├── evaluator.py      # LLM-as-judge evaluation
│   ├── models.py         # Immutable pipeline data models
│   ├── config.py         # Settings via pydantic-settings
│   ├── cost.py           # Per-model cost tracking
│   └── prompts.py        # All prompt templates
├── patterns/
│   ├── base.py           # Template Method base class
│   ├── naive.py          # Pattern 01: Baseline
│   ├── hybrid_search.py  # Pattern 02: BM25 + vector
│   ├── reranking.py      # Pattern 03: LLM reranking
│   ├── parent_child.py   # Pattern 04: Context expansion
│   ├── query_decomposition.py  # Pattern 05: Sub-queries
│   ├── hyde.py           # Pattern 06: Hypothetical docs
│   ├── self_correcting.py     # Pattern 07: Faithfulness check
│   └── agentic.py        # Pattern 08: Tool-calling loop
├── cli/
│   ├── app.py            # Typer CLI root
│   ├── compare_cmd.py    # The killer feature
│   ├── run_cmd.py        # Single pattern execution
│   ├── recommend_cmd.py  # LLM-powered recommendation
│   ├── ingest_cmd.py     # Document ingestion pipeline
│   ├── patterns_cmd.py   # List patterns
│   └── formatters.py     # Rich terminal output
└── utils/
    ├── timer.py          # Timing context manager
    └── tokenizer.py      # tiktoken helpers

Author

Built by Aamir Shahzad — backend engineer building data systems and AI infrastructure.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_playbook-0.2.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_playbook-0.2.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file rag_playbook-0.2.0.tar.gz.

File metadata

  • Download URL: rag_playbook-0.2.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for rag_playbook-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9b08c759c8dbb1595a0d9f52bca287eec80e0dc8d773a505270d5822bb721fa8
MD5 105016820d8aa6c3b935b711a12ed55f
BLAKE2b-256 ddc411414bd092dcfd4bb563f673a82000b21c7f785692737e79276053d68232

See more details on using hashes here.

File details

Details for the file rag_playbook-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rag_playbook-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for rag_playbook-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0c5948a265bb1ca00fde0cd21429255aa9eeceb94fae1408c9c5c6d541309b4
MD5 81948678283cc5520d7863213063f163
BLAKE2b-256 26b662faa22a17ed3c75ab49bd587ddfa90ba2847e67fe526812d88afcc4533f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page