Production RAG pipelines without the abstraction tax

These details have not been verified by PyPI

Project links

Project description

rag-kit

Production RAG pipelines without the abstraction tax.

from ragkit import RAGPipeline
from ragkit.embedders import OpenAIEmbedder
from ragkit.generators import GroqGenerator

rag = RAGPipeline(
    embedder=OpenAIEmbedder(api_key="sk-..."),
    generator=GroqGenerator(api_key="gsk_..."),
)
rag.ingest("handbook.pdf")
result = rag.query("What is the refund policy?")
print(result.answer)
# → "Refunds are available within 7 days of purchase. Contact support@..."
print(result.sources)
# → [Chunk(text="Refunds are available...", metadata={"source": "handbook.pdf"})]

No LangChain. No magic. Every line of the pipeline is readable Python you can modify.

Before You Start

What you need

Requirement	Minimum	How to check
Python	3.10+	`python --version`
pip	any	`pip --version`
An LLM API key	Groq (free)	console.groq.com
An embedding API key	OpenAI or free local model	platform.openai.com

Don't have Python? Download it from python.org. Pick any version ≥ 3.10.

What is an API key?

An API key is a password that lets your code talk to an AI service (like Groq or OpenAI). You get one by creating a free account on their website.

Groq — free, no credit card needed. Go to console.groq.com → API Keys → Create. Looks like gsk_abc123...
OpenAI — needs a paid account for embeddings. Go to platform.openai.com → Create new secret key. Looks like sk-abc123...
No money? Use LocalEmbedder instead of OpenAIEmbedder — runs on your own computer, completely free. See Quick Start #3.

Set up your environment (recommended)

# Create a virtual environment so rag-kit doesn't conflict with other packages
python -m venv venv

# Activate it
source venv/bin/activate        # Mac / Linux
venv\Scripts\activate           # Windows

# Install rag-kit with the providers you want
pip install rag-kit[openai,groq]

# Put your API keys in a .env file (never commit this to git)
cp .env.example .env
# Open .env and fill in your keys

What is RAG?

LLMs like GPT-4 and Llama3 are trained on public internet data up to a cutoff date. They know nothing about:

Your company's internal documents
Data created after their training cutoff
Private knowledge bases

RAG (Retrieval-Augmented Generation) solves this by giving the LLM the right context at query time, instead of baking knowledge into model weights.

User asks: "What's our refund policy?"

Without RAG:
  LLM: "I don't have information about your specific refund policy."
  (or worse — it hallucinates a plausible-sounding policy)

With RAG:
  1. Retrieve: find the paragraph about refunds from your policy PDF
  2. Generate: "Here is the relevant section: [paragraph]. Based on this, ..."
  LLM: "Refunds are available within 7 days. Contact support@yourco.com."

This is what every AI assistant with "chat with your docs" capability uses under the hood — Notion AI, GitHub Copilot's context, Cursor, Claude Projects, all of them.

How RAG Works — The Full Pipeline

Understanding this pipeline is more valuable than any certification.

INGESTION (run once per document)
──────────────────────────────────────────────────────────────────
                                                                  
  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
  │    LOADER    │──▶│   CHUNKER    │──▶│   EMBEDDER   │──▶│    STORE     │
  │              │   │              │   │              │   │              │
  │ PDF/TXT/URL  │   │ Split text   │   │ text → vec   │   │ Save vectors │
  │ → Document   │   │ into Chunks  │   │ (numbers)    │   │ to DB/memory │
  └──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘

QUERYING (run for every user question)
──────────────────────────────────────────────────────────────────
                                                                  
  User Question                                                   
       │                                                          
       ▼                                                          
  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
  │   EMBEDDER   │──▶│  RETRIEVER   │──▶│  (RERANKER)  │──▶│  GENERATOR  │
  │              │   │              │   │   optional   │   │              │
  │ query → vec  │   │ find similar │   │ LLM re-scores│   │ LLM answers  │
  │              │   │ chunks by    │   │ for precision│   │ using chunks │
  │              │   │ vector dist  │   │              │   │ as context   │
  └──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘
                                                                  │
                                                                  ▼
                                                            Answer + Sources

Let's go through each step.

Step 1: Loading

The loader converts any input (PDF, URL, plain text) into a uniform Document object. This abstraction means the rest of the pipeline doesn't care whether your document was a PDF or a webpage.

from ragkit.loaders import load_pdf, load_url, load_text

doc = load_pdf("report.pdf")
# doc.text = "Q1 revenue was $4.2M..."
# doc.source = "report.pdf"

doc = load_url("https://docs.yourapp.com/api")
# doc.text = "API Reference\n\nEndpoints:\n..."
# doc.source = "https://docs.yourapp.com/api"

Step 2: Chunking

Every LLM has a context window — the maximum amount of text it can read at one time. Think of it like working memory: a human can hold a few paragraphs in their head while answering a question, but not an entire book.

Most models allow 8,000–128,000 tokens (roughly 6,000–96,000 words). A 200-page PDF is ~100,000 words. Even if it fit, sending the entire document every time a user asks a question would be very slow and very expensive.

Chunking solves this: instead of sending everything, you only send the 3–5 most relevant pieces.

Chunking splits the document into small, overlapping pieces that can be retrieved and fed to the LLM individually.

Why overlap?

Original text: "...the payment is processed. Refunds take 3-5 business days to appear..."
                                            ↑ chunk boundary without overlap

Chunk 1: "...the payment is processed."
Chunk 2: "Refunds take 3-5 business days to appear..."

Without overlap, "Refunds take 3-5..." has no context for what payment method, what product, what country. With overlap:

Chunk 1: "...the payment is processed."
Chunk 2: "processed. Refunds take 3-5 business days to appear..."
             ↑ tail of chunk 1 gives context

Three chunking strategies:

Strategy	How it splits	Best for
`fixed`	Every N characters, always	Uniform text, simple baseline
`recursive`	Paragraphs → sentences → words → chars	Most documents (default)
`semantic`	Where meaning shifts (needs embedder)	High-precision knowledge bases

from ragkit.chunkers import recursive_chunker, fixed_chunker, semantic_chunker

chunks = recursive_chunker(doc, chunk_size=500, overlap=50)
# chunks[0].text = "First paragraph..."
# chunks[0].metadata = {"source": "report.pdf", "chunk_index": 0, "total_chunks": 42}

How to pick chunk_size:

Too small (< 100 chars): chunks lose context, embeddings become noisy
Too large (> 1000 chars): chunks cover multiple topics, retrieval is imprecise
Sweet spot: 300–700 characters for most documents

Step 3: Embedding

An embedding converts text into a list of numbers (a vector) that represents its meaning. Similar texts produce similar vectors.

"refund policy"   → [0.21, -0.54, 0.88, 0.12, ...]   (1536 numbers)
"money back"      → [0.19, -0.51, 0.85, 0.14, ...]   (similar direction)
"pizza recipes"   → [-0.72, 0.33, -0.41, 0.65, ...]  (different direction)

This is what makes semantic search work. "Refund" and "money back" don't share any words, but their embeddings are close — so a search for "refund policy" will find a chunk that says "money back guarantee."

Keyword search (like SQL LIKE '%refund%') would miss it.

from ragkit.embedders import OpenAIEmbedder, LocalEmbedder

# Cloud: best quality, small cost
embedder = OpenAIEmbedder(api_key="sk-...")
vector = embedder.embed("refund policy")
# → list of 1536 floats

# Local: free, private, slightly lower quality
embedder = LocalEmbedder(model_name="all-MiniLM-L6-v2")
vector = embedder.embed("refund policy")
# → list of 384 floats, runs on your CPU

Choosing an embedding model:

Model	Dims	Cost	Quality	Use when
`text-embedding-3-small`	1536	~$0.00002/1K tokens	Excellent	Default choice
`text-embedding-3-large`	3072	~$0.00013/1K tokens	Best	Legal/medical precision
`all-MiniLM-L6-v2` (local)	384	Free	Good	Privacy, no API key
`BAAI/bge-small-en` (local)	384	Free	Very good	Best free model

Step 4: Vector Store

Once you have vectors, you need to store them so you can search them later.

from ragkit.stores import MemoryStore, SupabaseStore

# Development: fast, in-process, no setup
store = MemoryStore()

# Production: persistent, searchable across restarts, scales to millions
store = SupabaseStore(url="https://xxxx.supabase.co", key="eyJ...")

How vector search works:

The store computes cosine similarity between your query vector and every stored chunk vector. Chunks most similar in direction to the query are returned.

query vector: [0.21, -0.54, 0.88, ...]

chunk A: [0.19, -0.51, 0.85, ...]  → similarity: 0.98  ✓ very similar
chunk B: [0.20, -0.52, 0.87, ...]  → similarity: 0.97  ✓ similar  
chunk C: [-0.72, 0.33, -0.41, ...] → similarity: 0.12  ✗ unrelated

MemoryStore does a linear scan (O(n)) — fine for < 10K chunks.
SupabaseStore uses an HNSW index — sub-10ms at millions of vectors.

What is HNSW?
Hierarchical Navigable Small World. A graph-based index where each node connects to its nearest neighbors. Search navigates the graph instead of scanning every vector. O(log n) instead of O(n). You never need to build or maintain it — pgvector handles it automatically.

Step 5: Retrieval

Given a query vector, return the most relevant chunks.

from ragkit.retrievers import topk_retriever, mmr_retriever

# Simple: return the 5 most similar chunks
chunks = topk_retriever(store, query_vector, top_k=5)

# Advanced: return 5 diverse chunks (avoids redundant results)
chunks = mmr_retriever(store, query_vector, top_k=5, lambda_mult=0.5)

Top-K vs MMR:

Top-K returns the 5 most similar chunks. If your document says "refund" 10 times across different sections, you'll get 5 near-duplicate chunks. The LLM gets confused by repetition and wastes context.

MMR (Maximal Marginal Relevance) picks chunks one at a time, penalizing choices that are too similar to what was already picked. Each selected chunk must contribute new information.

Query: "refund policy"

Top-K results:          MMR results:
1. "Refunds in 7 days"  1. "Refunds in 7 days"     ← most relevant
2. "Refunds in 7 days"  2. "Contact support for..."  ← new info
3. "7 day refund limit" 3. "Cancellations vs refunds" ← new info
4. "7 day refund limit" 4. "Razorpay processes..."   ← new info
5. "Refunds available"  5. "Exceptions to refunds"  ← new info

Use MMR when your documents have repetitive content. Use Top-K otherwise.

Step 6 (Optional): Reranking

Vector similarity measures "are these about the same topic?" — not "does this directly answer the question?"

The reranker reads each chunk and the query, then scores relevance directly. More expensive (1 LLM call per chunk), but significantly more precise.

from ragkit.rerankers import llm_reranker

# Initial retrieval: 8 candidates by vector similarity
candidates = topk_retriever(store, query_vector, top_k=8)

# Rerank: LLM scores each on 1-10 relevance, return top 3
final = llm_reranker(candidates, query="refund policy", llm=generator, top_k=3)

Use reranking when:

Answer accuracy matters more than speed/cost
You're seeing the LLM use slightly-wrong chunks
Your documents have many similar-sounding sections

Skip reranking when:

You're building a high-QPS API (latency will hurt)
Your queries are simple and retrieval quality is already good

Step 7: Generation

Feed the retrieved chunks + the user's question to an LLM.

from ragkit.generators import GroqGenerator

generator = GroqGenerator(api_key="gsk_...")
answer = generator.generate(
    query="What is the refund policy?",
    chunks=retrieved_chunks,
)

The generator formats chunks into a numbered context block and passes it to the LLM with a system prompt that says: "Answer using ONLY the provided context. Never make up information."

This grounding instruction is critical. Without it, the LLM will blend retrieved facts with its training data and hallucinate confidently.

The answer will cite [1], [2], etc. corresponding to the numbered chunks. Show these citations to your users so they can verify the source.

Installation

# Minimal (choose your providers):
pip install rag-kit[groq]       # + Groq LLM
pip install rag-kit[openai]     # + OpenAI embeddings + GPT
pip install rag-kit[anthropic]  # + Claude

# Add-ons:
pip install rag-kit[pdf]        # PDF loading
pip install rag-kit[url]        # URL/webpage loading
pip install rag-kit[supabase]   # Persistent vector store
pip install rag-kit[local]      # Local embeddings (no API key)

# Everything:
pip install rag-kit[all]

Quick Start

1. Basic (in-memory, no persistence)

import os
from ragkit import RAGPipeline
from ragkit.embedders import OpenAIEmbedder
from ragkit.generators import GroqGenerator

rag = RAGPipeline(
    embedder=OpenAIEmbedder(api_key=os.environ["OPENAI_API_KEY"]),
    generator=GroqGenerator(api_key=os.environ["GROQ_API_KEY"]),
)

rag.ingest("company_handbook.pdf")       # PDF
rag.ingest("https://docs.myapp.com")     # URL
rag.ingest_text("Prices: Pro = ₹399/mo") # Raw string

result = rag.query("What are the pricing tiers?")
print(result.answer)

for chunk in result.sources:
    print(f"  Source: {chunk.metadata['source']}")

2. Production (Supabase persistence)

from ragkit import RAGPipeline
from ragkit.embedders import OpenAIEmbedder
from ragkit.generators import GroqGenerator
from ragkit.stores import SupabaseStore

rag = RAGPipeline(
    embedder=OpenAIEmbedder(api_key="sk-..."),
    generator=GroqGenerator(api_key="gsk_..."),
    store=SupabaseStore(url="https://xxxx.supabase.co", key="eyJ..."),
)

First, run the setup SQL in your Supabase SQL editor:

from ragkit.stores.supabase import SETUP_SQL
print(SETUP_SQL)  # copy and run this in Supabase

3. Free (local embeddings, no API key for embedding)

from ragkit import RAGPipeline
from ragkit.embedders import LocalEmbedder
from ragkit.generators import GroqGenerator  # Groq free tier is generous

rag = RAGPipeline(
    embedder=LocalEmbedder("all-MiniLM-L6-v2"),  # runs on your CPU, free
    generator=GroqGenerator(api_key="gsk_..."),   # Groq free tier
)

4. Advanced (MMR + reranking for maximum quality)

from ragkit import RAGPipeline
from ragkit.embedders import OpenAIEmbedder
from ragkit.generators import GroqGenerator
from ragkit.stores import SupabaseStore

rag = RAGPipeline(
    embedder=OpenAIEmbedder(api_key="sk-..."),
    generator=GroqGenerator(api_key="gsk_...", model="llama3-70b-8192"),
    store=SupabaseStore(url="...", key="..."),
    chunker="recursive",
    chunk_size=600,
    chunk_overlap=80,
    retriever="mmr",          # diverse retrieval
    top_k=6,
    reranker=True,            # LLM re-scores for precision
)

API Reference

`RAGPipeline`

RAGPipeline(
    embedder,                # Required: OpenAIEmbedder | LocalEmbedder
    generator,               # Required: GroqGenerator | OpenAIGenerator | AnthropicGenerator
    store=None,              # MemoryStore() by default; pass SupabaseStore for persistence
    chunker="recursive",     # "fixed" | "recursive" | "semantic"
    chunk_size=500,          # characters per chunk
    chunk_overlap=50,        # overlap between consecutive chunks
    retriever="topk",        # "topk" | "mmr"
    top_k=5,                 # number of chunks to retrieve
    min_score=0.0,           # discard chunks below this similarity (0.0–1.0)
    reranker=None,           # True to enable LLM reranking
)

Method	Returns	Description
`.ingest(source)`	`int`	Load, chunk, embed, store a file/URL. Returns chunk count.
`.ingest_text(text, source_label)`	`int`	Same but from a raw string.
`.query(question)`	`QueryResult`	Embed query, retrieve, generate, return answer + sources.

`QueryResult`

result.answer   # str — the LLM's answer, grounded in retrieved chunks
result.sources  # list[Chunk] — the chunks that were used

`Chunk`

chunk.text                         # str — the text content
chunk.metadata["source"]           # str — file path or URL
chunk.metadata["chunk_index"]      # int — position in original document
chunk.metadata["total_chunks"]     # int — total chunks in this document
chunk.metadata["strategy"]         # str — "fixed" | "recursive" | "semantic"

Common Patterns

Show citations in a chat UI

result = rag.query(user_message)

response_parts = [result.answer, "\n\n**Sources:**"]
for i, chunk in enumerate(result.sources, 1):
    source = chunk.metadata.get("source", "unknown")
    preview = chunk.text[:120].strip().replace("\n", " ")
    response_parts.append(f"[{i}] {source}: _{preview}..._")

final_response = "\n".join(response_parts)

Ingest only new documents (avoid duplicates)

already_ingested = {"report_q1.pdf", "handbook.pdf"}

for file in Path("docs").glob("*.pdf"):
    if file.name not in already_ingested:
        n = rag.ingest(str(file))
        print(f"Ingested {file.name}: {n} chunks")

Filter by source document

With SupabaseStore, you can search only within a specific document:

results = store.search(
    query_embedding,
    top_k=5,
    filter={"source": "hr-policy.pdf"},
)

Streaming responses

# GroqGenerator supports streaming
from groq import Groq

client = Groq(api_key="gsk_...")
stream = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[...],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Choosing Your Stack

Need	Pick
Fastest setup	`MemoryStore` + `OpenAIEmbedder` + `GroqGenerator`
Zero API cost	`MemoryStore` + `LocalEmbedder` + `GroqGenerator` (free tier)
Production persistence	`SupabaseStore` + any embedder/generator
Maximum accuracy	`SupabaseStore` + `OpenAIEmbedder` + `mmr` retriever + reranker + GPT-4o
Private / on-premise	`MemoryStore` + `LocalEmbedder` + local Ollama generator

What You've Learned

If you read this far and ran the examples, you understand:

RAG architecture — loader → chunker → embedder → store → retriever → generator
Chunking strategies — fixed, recursive, semantic; why overlap matters
Embeddings — what they are, how cosine similarity works, how to pick a model
Vector search — how HNSW indexing works at scale
Retrieval strategies — Top-K vs MMR, when diversity matters
Reranking — LLM-as-judge, when precision > speed
Generation — how to write grounding prompts, how to show citations

This is the complete knowledge stack behind every "chat with your docs" product, every enterprise knowledge base, and every AI assistant with document understanding. No paid course required.

What's Next

Now that you understand RAG, the natural next steps:

Agents — instead of one retrieval+generation step, let the LLM decide when to retrieve and what to do with the result. See agent-loop.
Memory — give your RAG system episodic memory (remember past conversations) and semantic memory (retrieve relevant facts from prior sessions). See mem-store.
Evals — measure whether your RAG pipeline is actually answering correctly. See eval-bench.

Troubleshooting

These are the errors every beginner hits. Fixes are here so you don't lose an hour to them.

`ModuleNotFoundError: No module named 'ragkit'`

You haven't installed the library yet, or your virtual environment isn't activated.

# Make sure your venv is active first
source venv/bin/activate       # Mac/Linux
venv\Scripts\activate          # Windows

# Then install
pip install rag-kit[openai,groq]

`ModuleNotFoundError: No module named 'openai'` (or `groq`, `supabase`, etc.)

rag-kit has zero mandatory dependencies. You only get the extras you ask for.

pip install rag-kit[openai]     # for OpenAIEmbedder / OpenAIGenerator
pip install rag-kit[groq]       # for GroqGenerator
pip install rag-kit[supabase]   # for SupabaseStore
pip install rag-kit[pdf]        # for load_pdf()
pip install rag-kit[url]        # for load_url()
pip install rag-kit[local]      # for LocalEmbedder
pip install rag-kit[all]        # everything at once

`AuthenticationError` / `401 Unauthorized`

Your API key is wrong or not set.

# Bad — key hardcoded with a typo or expired key
embedder = OpenAIEmbedder(api_key="sk-abc123WRONG")

# Good — read from environment variable
import os
embedder = OpenAIEmbedder(api_key=os.environ["OPENAI_API_KEY"])

Double-check:

You copied the full key (they're long — don't cut it off)
The key is for the right service (OpenAI key ≠ Groq key)

Your .env file is loaded before your script runs:

export OPENAI_API_KEY=sk-...   # Mac/Linux — run this in your terminal first
set OPENAI_API_KEY=sk-...      # Windows CMD
$env:OPENAI_API_KEY="sk-..."   # Windows PowerShell

`SyntaxError` or `TypeError` on Python 3.9 or below

rag-kit uses list[str] and dict | None type hints, which require Python 3.10+.

python --version   # must show 3.10, 3.11, 3.12, or 3.13

# If you're on 3.9 or below, upgrade Python at python.org

`result.answer` is "I couldn't find relevant information"

This means no chunks passed the similarity threshold. Three possible causes:

1. Your document wasn't ingested yet.

rag.ingest("your_file.pdf")   # do this before rag.query()
result = rag.query("your question")

2. The question phrasing is too different from the document language.

Try rephrasing. "What is the money-back guarantee?" retrieves better than "refund" if the document uses the phrase "money-back guarantee."

3. min_score is set too high.

# Default min_score is 0.0 — everything passes through
# If you set it high (e.g. 0.8), lower it while debugging
rag = RAGPipeline(..., min_score=0.0)

`SupabaseStore` not finding results after ingestion

Make sure you ran the setup SQL first. Open your Supabase SQL editor and run:

from ragkit.stores.supabase import SETUP_SQL
print(SETUP_SQL)

Copy the output and paste it into Supabase → SQL Editor → Run. This creates the table, HNSW index, and match_chunks function that the store depends on.

Still stuck?

Open an issue at github.com/iamadhitya1/rag-kit/issues with:

Your Python version (python --version)
The full error message (copy-paste, don't screenshot)
The 5 lines of code that triggered it

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragkit_adhitya-0.1.0.tar.gz (39.4 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragkit_adhitya-0.1.0-py3-none-any.whl (34.4 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file ragkit_adhitya-0.1.0.tar.gz.

File metadata

Download URL: ragkit_adhitya-0.1.0.tar.gz
Upload date: Jun 9, 2026
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragkit_adhitya-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4254343b8652bc92dbe2c59a99baacb0422a16d7088620c0cd968ffd61b50b5b`
MD5	`43c06691cd01d9d97e9b850b261ca508`
BLAKE2b-256	`54a16dae56abcfebc13a978cff7b25404259684d57e9267716607ba495d3ebf2`

See more details on using hashes here.

File details

Details for the file ragkit_adhitya-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragkit_adhitya-0.1.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragkit_adhitya-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed6d2506ef067078efc38eecc6213125ac36b9d2a1598d43e0af6714145dcc3f`
MD5	`a02ce2901dbe021f1f8c2077df4cee02`
BLAKE2b-256	`789c7473280fd86c1c4c596490845b072f1e10f532b17d4ae8ba8630723dd5b5`

See more details on using hashes here.

ragkit-adhitya 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rag-kit

Before You Start

What you need

What is an API key?

Set up your environment (recommended)

What is RAG?

How RAG Works — The Full Pipeline

Step 1: Loading

Step 2: Chunking

Step 3: Embedding

Step 4: Vector Store

Step 5: Retrieval

Step 6 (Optional): Reranking

Step 7: Generation

Installation

Quick Start

1. Basic (in-memory, no persistence)

2. Production (Supabase persistence)

3. Free (local embeddings, no API key for embedding)

4. Advanced (MMR + reranking for maximum quality)

API Reference

RAGPipeline

QueryResult

Chunk

Common Patterns

Show citations in a chat UI

Ingest only new documents (avoid duplicates)

Filter by source document

Streaming responses

Choosing Your Stack

What You've Learned

What's Next

Troubleshooting

ModuleNotFoundError: No module named 'ragkit'

ModuleNotFoundError: No module named 'openai' (or groq, supabase, etc.)

AuthenticationError / 401 Unauthorized

SyntaxError or TypeError on Python 3.9 or below

result.answer is "I couldn't find relevant information"

SupabaseStore not finding results after ingestion

Still stuck?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`RAGPipeline`

`QueryResult`

`Chunk`

`ModuleNotFoundError: No module named 'ragkit'`

`ModuleNotFoundError: No module named 'openai'` (or `groq`, `supabase`, etc.)

`AuthenticationError` / `401 Unauthorized`

`SyntaxError` or `TypeError` on Python 3.9 or below

`result.answer` is "I couldn't find relevant information"

`SupabaseStore` not finding results after ingestion