Skip to main content

Retrieval-first, deterministic RAG infrastructure

Project description

#🪲 ScaraFlow

License: MIT
Python Numpy Build

What is Scaraflow?

Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, low-variance, production-grade Retrieval-Augmented Generation.

Scaraflow is not:

  • an agent framework
  • a prompt playground
  • a chain-orchestration SDK

Scaraflow focuses on one problem only:

Correct, explicit, and scalable retrieval for LLM systems


Why Scaraflow Exists

Most modern RAG frameworks optimize for:

  • orchestration flexibility
  • feature breadth
  • rapid prototyping

Scaraflow optimizes for:

  • retrieval correctness
  • predictable latency
  • streaming readiness
  • infrastructure consistency

Scaraflow treats retrieval as infrastructure, not glue code.


Design Principles

  • Retrieval before generation
  • Explicit contracts over hidden magic
  • Deterministic behavior
  • Low-variance latency
  • Streaming-ready by design
  • Same semantics in notebooks, services, and production

Architecture Overview

scaraflow/
├── scara-core        # strict contracts & invariants
├── scara-index       # vector store backends (Qdrant)
├── scara-rag         # deterministic RAG engine
├── scara-live        # streaming / temporal RAG (planned)
├── scara-graph       # graph-based RAG (planned)
└── scara-llm         # thin LLM adapters (planned)

Installation

pip install scaraflow

Dependencies

  • qdrant-client
  • sentence-transformers
  • standard scientific Python stack

Quick Start Guide

1. In-Memory Setup (No Docker)

Ideal for testing and prototyping without external infrastructure.

import uuid
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
from scara_rag.engine import RAGEngine
from scara_rag.policies import RetrievalPolicy

# 1. Setup In-Process Qdrant
client = QdrantClient(":memory:")
store = QdrantVectorStore(
    QdrantConfig(collection="demo", vector_dim=384),
    client=client
)

# 2. Setup Embedder
model = SentenceTransformer("all-MiniLM-L6-v2")

class Embedder:
    def embed(self, text):
        return model.encode(text).tolist()

# 3. Initialize RAG Engine (with dummy LLM)
rag = RAGEngine(
    embedder=Embedder(),
    store=store,
    llm=lambda prompt: f"Simulated answer based on:\n{prompt}",
)

# 4. Ingest Documents
documents = [
    "Scaraflow is retrieval-first.",
    "It prioritizes deterministic behavior.",
    "Qdrant is the reference backend.",
]
ids = [str(uuid.uuid4()) for _ in documents]
vectors = model.encode(documents).tolist()

store.upsert(
    ids=ids,
    vectors=vectors,
    metadata=[{"text": d} for d in documents],
)

# 5. Query
response = rag.query(
    "What does Scaraflow prioritize?",
    policy=RetrievalPolicy(top_k=2),
)

print(response.answer)

2. Production Setup (With Docker)

Run Qdrant in a container for persistence and performance.

docker run -p 6333:6333 qdrant/qdrant

Connect Scaraflow to the local Qdrant instance:

from qdrant_client import QdrantClient
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig

# Connect to Qdrant on localhost
store = QdrantVectorStore(
    QdrantConfig(
        url="http://localhost:6333",
        collection="prod_v1",
        vector_dim=384,
    )
)
# The rest of the setup (Embedder, RAGEngine) remains the same.

3. Cloud LLMs (OpenAI / Gemini)

Scaraflow is LLM-agnostic. You simply pass a callable that takes a string (prompt) and returns a string (answer).

Using OpenAI

pip install openai
from openai import OpenAI
from scara_rag.engine import RAGEngine

client = OpenAI(api_key="sk-...")

def openai_adapter(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

rag = RAGEngine(
    embedder=Embedder(), # Defined in previous steps
    store=store,         # Defined in previous steps
    llm=openai_adapter,
)

response = rag.query("How does Scaraflow handle retrieval?")
print(response.answer)

Using Google Gemini

pip install google-generativeai
import google.generativeai as genai
from scara_rag.engine import RAGEngine

genai.configure(api_key="AIza...")
model = genai.GenerativeModel('gemini-pro')

def gemini_adapter(prompt: str) -> str:
    response = model.generate_content(prompt)
    return response.text

rag = RAGEngine(
    embedder=Embedder(),
    store=store,
    llm=gemini_adapter,
)

response = rag.query("Explain Scaraflow's design principles.")
print(response.answer)

4. Integration with FastAPI

Build a production API in seconds.

pip install fastapi uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from scara_rag.policies import RetrievalPolicy

app = FastAPI()

# Assume 'rag' is initialized globally as shown in previous steps

class QueryRequest(BaseModel):
    question: str
    top_k: int = 5

@app.post("/rag/query")
def query_rag(request: QueryRequest):
    response = rag.query(
        request.question,
        policy=RetrievalPolicy(top_k=request.top_k)
    )
    return {
        "answer": response.answer,
        "context": [b.content for b in response.context],
        "metadata": response.metadata
    }

# Run with: uvicorn main:app --reload

Benchmarks

Documents        : 10000
Queries          : 100
Embedding Time   : 6.47s
Indexing Time    : 0.34s
Avg Latency      : 7.92 ms
P95 Latency      : 11.03 ms
Latency Std Dev  : 1.24 ms

Benchmarks can be run using:

python testing/benchmarks.py

License

MIT License


Author

Built and maintained by Ganesh (K. S. N. Ganesh).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaraflow-0.1.8.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scaraflow-0.1.8-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file scaraflow-0.1.8.tar.gz.

File metadata

  • Download URL: scaraflow-0.1.8.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.8.tar.gz
Algorithm Hash digest
SHA256 67362b6372189142bccfa6eb4ee310072d2f01e49e4cf92f7e80e9470261339a
MD5 6d7c4710baae199394b19a67d6e46861
BLAKE2b-256 e626fc8519732e98e24a94bdb8e4b289824fd0fb8c42289809467f1072022d4a

See more details on using hashes here.

File details

Details for the file scaraflow-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: scaraflow-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 83b3f8d5aec167356d9edf3e8a74ca38feca8cf8beae54781af51faea0dec421
MD5 d8cb153be140266d392f0221ca7e4527
BLAKE2b-256 310a5eafecbf7fd97a229c04b28eefc042e96a97c51241c83dbfc7665092499b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page