Automatic RAG Pattern Optimization Engine
Project description
ragit
A Python toolkit for building Retrieval-Augmented Generation (RAG) applications. Ragit provides document loading, chunking, vector search, and LLM integration out of the box, allowing you to build document Q&A systems and code generators with minimal boilerplate.
Table of Contents
- Installation
- Configuration
- Tutorial: Using Ragit
- Tutorial: Platform Integration
- Advanced: Hyperparameter Optimization
- API Reference
- License
Installation
pip install ragit
Ragit requires an Ollama-compatible API for embeddings and LLM inference. You can use:
- A local Ollama instance (https://ollama.ai)
- A cloud-hosted Ollama API
- Any OpenAI-compatible API endpoint
Configuration
Ragit reads configuration from environment variables. Create a .env file in your project root:
# LLM API (cloud or local)
OLLAMA_BASE_URL=https://your-ollama-api.com
OLLAMA_API_KEY=your-api-key
# Embedding API (can be different from LLM)
OLLAMA_EMBEDDING_URL=http://localhost:11434
# Default models
RAGIT_DEFAULT_LLM_MODEL=llama3.1:8b
RAGIT_DEFAULT_EMBEDDING_MODEL=mxbai-embed-large
A common setup is to use a cloud API for LLM inference (faster, more capable models) while running embeddings locally (lower latency, no API costs for indexing).
Tutorial: Using Ragit
This section covers the core functionality of ragit: loading documents, creating a RAG assistant, and querying your knowledge base.
Loading Documents
Ragit provides several functions for loading and chunking documents.
Loading a single file:
from ragit import load_text
doc = load_text("docs/api-reference.md")
print(doc.id) # "api-reference"
print(doc.content) # Full file contents
Loading a directory:
from ragit import load_directory
# Load all markdown files
docs = load_directory("docs/", "*.md")
# Load recursively
docs = load_directory("docs/", "**/*.md", recursive=True)
# Load multiple file types
txt_docs = load_directory("docs/", "*.txt")
rst_docs = load_directory("docs/", "*.rst")
all_docs = txt_docs + rst_docs
Custom chunking:
For fine-grained control over how documents are split:
from ragit import chunk_text, chunk_by_separator, chunk_rst_sections
# Fixed-size chunks with overlap
chunks = chunk_text(
text,
chunk_size=512, # Characters per chunk
chunk_overlap=50, # Overlap between chunks
doc_id="my-doc"
)
# Split by paragraph
chunks = chunk_by_separator(text, separator="\n\n")
# Split RST documents by section headers
chunks = chunk_rst_sections(rst_content, doc_id="tutorial")
The RAGAssistant Class
The RAGAssistant class is the main interface for RAG operations. It handles document indexing, retrieval, and generation in a single object.
from ragit import RAGAssistant
# Create from a directory
assistant = RAGAssistant("docs/")
# Create from a single file
assistant = RAGAssistant("docs/tutorial.rst")
# Create from Document objects
from ragit import Document
docs = [
Document(id="intro", content="Introduction to the API..."),
Document(id="auth", content="Authentication uses JWT tokens..."),
Document(id="endpoints", content="Available endpoints: /users, /items..."),
]
assistant = RAGAssistant(docs)
Configuration options:
assistant = RAGAssistant(
"docs/",
embedding_model="mxbai-embed-large", # Model for embeddings
llm_model="llama3.1:70b", # Model for generation
chunk_size=512, # Characters per chunk
chunk_overlap=50, # Overlap between chunks
)
Asking Questions
The ask() method retrieves relevant context and generates an answer:
assistant = RAGAssistant("docs/")
answer = assistant.ask("How do I authenticate API requests?")
print(answer)
Customizing the query:
answer = assistant.ask(
"How do I authenticate API requests?",
top_k=5, # Number of chunks to retrieve
temperature=0.3, # Lower = more focused answers
system_prompt="You are a technical documentation assistant. "
"Answer concisely and include code examples."
)
Generating Code
The generate_code() method is optimized for producing clean, runnable code:
assistant = RAGAssistant("framework-docs/")
code = assistant.generate_code(
"Create a REST API endpoint for user registration",
language="python"
)
print(code)
The output is clean code without markdown formatting. The assistant uses your documentation as context to generate framework-specific, idiomatic code.
Custom Retrieval
For advanced use cases, you can access the retrieval and generation steps separately:
assistant = RAGAssistant("docs/")
# Step 1: Retrieve relevant chunks
results = assistant.retrieve("authentication", top_k=5)
for chunk, score in results:
print(f"Score: {score:.3f}")
print(f"Content: {chunk.content[:200]}...")
print()
# Step 2: Get formatted context string
context = assistant.get_context("authentication", top_k=3)
# Step 3: Generate with custom prompt
prompt = f"""Based on this documentation:
{context}
Write a Python function that validates a JWT token."""
response = assistant.generate(
prompt,
system_prompt="You are an expert Python developer.",
temperature=0.2
)
Tutorial: Platform Integration
This section shows how to integrate ragit into web applications and other platforms.
Flask Integration
from flask import Flask, request, jsonify
from ragit import RAGAssistant
app = Flask(__name__)
# Initialize once at startup
assistant = RAGAssistant("docs/")
@app.route("/ask", methods=["POST"])
def ask():
data = request.get_json()
question = data.get("question", "")
if not question:
return jsonify({"error": "question is required"}), 400
answer = assistant.ask(question, top_k=3)
return jsonify({"answer": answer})
@app.route("/search", methods=["GET"])
def search():
query = request.args.get("q", "")
top_k = int(request.args.get("top_k", 5))
results = assistant.retrieve(query, top_k=top_k)
return jsonify({
"results": [
{"content": chunk.content, "score": score}
for chunk, score in results
]
})
if __name__ == "__main__":
app.run(debug=True)
FastAPI Integration
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from ragit import RAGAssistant
app = FastAPI()
# Initialize once at startup
assistant = RAGAssistant("docs/")
class Question(BaseModel):
question: str
top_k: int = 3
temperature: float = 0.7
class Answer(BaseModel):
answer: str
@app.post("/ask", response_model=Answer)
async def ask(q: Question):
if not q.question.strip():
raise HTTPException(status_code=400, detail="question is required")
answer = assistant.ask(
q.question,
top_k=q.top_k,
temperature=q.temperature
)
return Answer(answer=answer)
@app.get("/search")
async def search(q: str, top_k: int = 5):
results = assistant.retrieve(q, top_k=top_k)
return {
"results": [
{"content": chunk.content, "score": score}
for chunk, score in results
]
}
Command-Line Tools
Build CLI tools using argparse or click:
#!/usr/bin/env python3
import argparse
from ragit import RAGAssistant
def main():
parser = argparse.ArgumentParser(description="Query documentation")
parser.add_argument("question", help="Question to ask")
parser.add_argument("--docs", default="docs/", help="Documentation path")
parser.add_argument("--top-k", type=int, default=3, help="Context chunks")
args = parser.parse_args()
assistant = RAGAssistant(args.docs)
answer = assistant.ask(args.question, top_k=args.top_k)
print(answer)
if __name__ == "__main__":
main()
Usage:
python ask.py "How do I configure logging?"
python ask.py "What are the API rate limits?" --docs api-docs/ --top-k 5
Batch Processing
Process multiple questions or generate reports:
from ragit import RAGAssistant
assistant = RAGAssistant("docs/")
questions = [
"What authentication methods are supported?",
"How do I handle errors?",
"What are the rate limits?",
]
# Process questions
results = {}
for question in questions:
results[question] = assistant.ask(question)
# Generate a report
with open("qa-report.md", "w") as f:
f.write("# Documentation Q&A Report\n\n")
for question, answer in results.items():
f.write(f"## {question}\n\n")
f.write(f"{answer}\n\n")
Advanced: Hyperparameter Optimization
Ragit includes tools to find the optimal RAG configuration for your specific documents and use case.
from ragit import RagitExperiment, Document, BenchmarkQuestion
# Your documents
documents = [
Document(id="auth", content="Authentication uses Bearer tokens..."),
Document(id="api", content="The API supports GET, POST, PUT, DELETE..."),
]
# Benchmark questions with expected answers
benchmark = [
BenchmarkQuestion(
question="What authentication method does the API use?",
ground_truth="The API uses Bearer token authentication."
),
BenchmarkQuestion(
question="What HTTP methods are supported?",
ground_truth="GET, POST, PUT, and DELETE methods are supported."
),
]
# Run optimization
experiment = RagitExperiment(documents, benchmark)
results = experiment.run(max_configs=20)
# Get the best configuration
best = results[0]
print(f"Best config: chunk_size={best.config.chunk_size}, "
f"chunk_overlap={best.config.chunk_overlap}, "
f"top_k={best.config.top_k}")
print(f"Score: {best.score:.3f}")
The experiment tests different combinations of chunk sizes, overlaps, and retrieval parameters to find what works best for your content.
Performance Features
Ragit includes several optimizations for production workloads:
Connection Pooling
OllamaProvider uses HTTP connection pooling via requests.Session() for faster sequential requests:
from ragit.providers import OllamaProvider
provider = OllamaProvider()
# All requests reuse the same connection pool
for text in texts:
provider.embed(text, model="mxbai-embed-large")
# Explicitly close when done (optional, auto-closes on garbage collection)
provider.close()
Async Parallel Embedding
For large batches, use embed_batch_async() with trio for 5-10x faster embedding:
import trio
from ragit.providers import OllamaProvider
provider = OllamaProvider()
async def embed_documents():
texts = ["doc1...", "doc2...", "doc3...", ...] # hundreds of texts
embeddings = await provider.embed_batch_async(
texts,
model="mxbai-embed-large",
max_concurrent=10 # Adjust based on server capacity
)
return embeddings
# Run with trio
results = trio.run(embed_documents)
Embedding Cache
Repeated embedding calls are cached automatically (2048 entries LRU):
from ragit.providers import OllamaProvider
provider = OllamaProvider(use_cache=True) # Default
# First call hits the API
provider.embed("Hello world", model="mxbai-embed-large")
# Second call returns cached result instantly
provider.embed("Hello world", model="mxbai-embed-large")
# View cache statistics
print(OllamaProvider.embedding_cache_info())
# {'hits': 1, 'misses': 1, 'maxsize': 2048, 'currsize': 1}
# Clear cache if needed
OllamaProvider.clear_embedding_cache()
Pre-normalized Embeddings
Vector similarity uses pre-normalized embeddings, making cosine similarity a simple dot product (O(1) per comparison).
API Reference
Document Loading
| Function | Description |
|---|---|
load_text(path) |
Load a single text file as a Document |
load_directory(path, pattern, recursive=False) |
Load files matching a glob pattern |
chunk_text(text, chunk_size, chunk_overlap, doc_id) |
Split text into overlapping chunks |
chunk_document(doc, chunk_size, chunk_overlap) |
Split a Document into chunks |
chunk_by_separator(text, separator, doc_id) |
Split text by a delimiter |
chunk_rst_sections(text, doc_id) |
Split RST by section headers |
RAGAssistant
| Method | Description |
|---|---|
retrieve(query, top_k=3) |
Return list of (Chunk, score) tuples |
get_context(query, top_k=3) |
Return formatted context string |
generate(prompt, system_prompt, temperature) |
Generate text without retrieval |
ask(question, system_prompt, top_k, temperature) |
Retrieve context and generate answer |
generate_code(request, language, top_k, temperature) |
Generate clean code |
Properties
| Property | Description |
|---|---|
assistant.num_documents |
Number of loaded documents |
assistant.num_chunks |
Number of indexed chunks |
assistant.embedding_model |
Current embedding model |
assistant.llm_model |
Current LLM model |
License
Apache-2.0 - RODMENA LIMITED
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragit-0.7.5.tar.gz.
File metadata
- Download URL: ragit-0.7.5.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387922cc83a16c39f52f800d2352b43c20cab6b5bf078edc75022b99c8d05060
|
|
| MD5 |
1db633c354608eff18f12a5a12f112e7
|
|
| BLAKE2b-256 |
d972c250310ff4e338d5019e591c1381ccab7a550d34d1824285c65a31528eb4
|
File details
Details for the file ragit-0.7.5-py3-none-any.whl.
File metadata
- Download URL: ragit-0.7.5-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ee1bae6ef9e68142c65d742e242a01bf8d98635802cf9658d3daf76b0f51515
|
|
| MD5 |
0b4e14e94333d38e929501bd00464207
|
|
| BLAKE2b-256 |
13bee2db95f8d4d662f177da1fbff0b39dd8e5747a032d875facfa391630739c
|