Retrieval-first, deterministic RAG infrastructure
Project description
#🪲 ScaraFlow
What is Scaraflow?
Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, low-variance, production-grade Retrieval-Augmented Generation.
Scaraflow is not:
- an agent framework
- a prompt playground
- a chain-orchestration SDK
Scaraflow focuses on one problem only:
Correct, explicit, and scalable retrieval for LLM systems
Why Scaraflow Exists
Most modern RAG frameworks optimize for:
- orchestration flexibility
- feature breadth
- rapid prototyping
Scaraflow optimizes for:
- retrieval correctness
- predictable latency
- streaming readiness
- infrastructure consistency
Scaraflow treats retrieval as infrastructure, not glue code.
Design Principles
- Retrieval before generation
- Explicit contracts over hidden magic
- Deterministic behavior
- Low-variance latency
- Streaming-ready by design
- Same semantics in notebooks, services, and production
Architecture Overview
scaraflow/
├── scara-core # strict contracts & invariants
├── scara-index # vector store backends (Qdrant)
├── scara-rag # deterministic RAG engine
├── scara-live # streaming / temporal RAG (planned)
├── scara-graph # graph-based RAG (planned)
└── scara-llm # thin LLM adapters (planned)
Installation
pip install scaraflow
Dependencies
qdrant-clientsentence-transformers- standard scientific Python stack
Quick Start Guide
1. In-Memory Setup (No Docker)
Ideal for testing and prototyping without external infrastructure.
import uuid
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
from scara_rag.engine import RAGEngine
from scara_rag.policies import RetrievalPolicy
# 1. Setup In-Process Qdrant
client = QdrantClient(":memory:")
store = QdrantVectorStore(
QdrantConfig(collection="demo", vector_dim=384),
client=client
)
# 2. Setup Embedder
model = SentenceTransformer("all-MiniLM-L6-v2")
class Embedder:
def embed(self, text):
return model.encode(text).tolist()
# 3. Initialize RAG Engine (with dummy LLM)
rag = RAGEngine(
embedder=Embedder(),
store=store,
llm=lambda prompt: f"Simulated answer based on:\n{prompt}",
)
# 4. Ingest Documents
documents = [
"Scaraflow is retrieval-first.",
"It prioritizes deterministic behavior.",
"Qdrant is the reference backend.",
]
ids = [str(uuid.uuid4()) for _ in documents]
vectors = model.encode(documents).tolist()
store.upsert(
ids=ids,
vectors=vectors,
metadata=[{"text": d} for d in documents],
)
# 5. Query
response = rag.query(
"What does Scaraflow prioritize?",
policy=RetrievalPolicy(top_k=2),
)
print(response.answer)
2. Production Setup (With Docker)
Run Qdrant in a container for persistence and performance.
docker run -p 6333:6333 qdrant/qdrant
Connect Scaraflow to the local Qdrant instance:
from qdrant_client import QdrantClient
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
# Connect to Qdrant on localhost
store = QdrantVectorStore(
QdrantConfig(
url="http://localhost:6333",
collection="prod_v1",
vector_dim=384,
)
)
# The rest of the setup (Embedder, RAGEngine) remains the same.
3. Cloud LLMs (OpenAI / Gemini)
Scaraflow is LLM-agnostic. You simply pass a callable that takes a string (prompt) and returns a string (answer).
Using OpenAI
pip install openai
from openai import OpenAI
from scara_rag.engine import RAGEngine
client = OpenAI(api_key="sk-...")
def openai_adapter(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
rag = RAGEngine(
embedder=Embedder(), # Defined in previous steps
store=store, # Defined in previous steps
llm=openai_adapter,
)
response = rag.query("How does Scaraflow handle retrieval?")
print(response.answer)
Using Google Gemini
pip install google-generativeai
import google.generativeai as genai
from scara_rag.engine import RAGEngine
genai.configure(api_key="AIza...")
model = genai.GenerativeModel('gemini-pro')
def gemini_adapter(prompt: str) -> str:
response = model.generate_content(prompt)
return response.text
rag = RAGEngine(
embedder=Embedder(),
store=store,
llm=gemini_adapter,
)
response = rag.query("Explain Scaraflow's design principles.")
print(response.answer)
4. Integration with FastAPI
Build a production API in seconds.
pip install fastapi uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from scara_rag.policies import RetrievalPolicy
app = FastAPI()
# Assume 'rag' is initialized globally as shown in previous steps
class QueryRequest(BaseModel):
question: str
top_k: int = 5
@app.post("/rag/query")
def query_rag(request: QueryRequest):
response = rag.query(
request.question,
policy=RetrievalPolicy(top_k=request.top_k)
)
return {
"answer": response.answer,
"context": [b.content for b in response.context],
"metadata": response.metadata
}
# Run with: uvicorn main:app --reload
Benchmarks
Documents : 10000
Queries : 100
Embedding Time : 6.47s
Indexing Time : 0.34s
Avg Latency : 7.92 ms
P95 Latency : 11.03 ms
Latency Std Dev : 1.24 ms
Benchmarks can be run using:
python testing/benchmarks.py
License
MIT License
Author
Built and maintained by Ganesh (K. S. N. Ganesh).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scaraflow-0.1.8.tar.gz.
File metadata
- Download URL: scaraflow-0.1.8.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67362b6372189142bccfa6eb4ee310072d2f01e49e4cf92f7e80e9470261339a
|
|
| MD5 |
6d7c4710baae199394b19a67d6e46861
|
|
| BLAKE2b-256 |
e626fc8519732e98e24a94bdb8e4b289824fd0fb8c42289809467f1072022d4a
|
File details
Details for the file scaraflow-0.1.8-py3-none-any.whl.
File metadata
- Download URL: scaraflow-0.1.8-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83b3f8d5aec167356d9edf3e8a74ca38feca8cf8beae54781af51faea0dec421
|
|
| MD5 |
d8cb153be140266d392f0221ca7e4527
|
|
| BLAKE2b-256 |
310a5eafecbf7fd97a229c04b28eefc042e96a97c51241c83dbfc7665092499b
|