scaraflow

Retrieval-first, deterministic RAG infrastructure

These details have not been verified by PyPI

Project links

Project description

#🪲 ScaraFlow

What is Scaraflow?

Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, low-variance, production-grade Retrieval-Augmented Generation.

Scaraflow is not:

an agent framework
a prompt playground
a chain-orchestration SDK

Scaraflow focuses on one problem only:

Correct, explicit, and scalable retrieval for LLM systems

Why Scaraflow Exists

Most modern RAG frameworks optimize for:

orchestration flexibility
feature breadth
rapid prototyping

Scaraflow optimizes for:

retrieval correctness
predictable latency
streaming readiness
infrastructure consistency

Scaraflow treats retrieval as infrastructure, not glue code.

Design Principles

Retrieval before generation
Explicit contracts over hidden magic
Deterministic behavior
Low-variance latency
Streaming-ready by design
Same semantics in notebooks, services, and production

Architecture Overview

scaraflow/
├── scara-core        # strict contracts & invariants
├── scara-index       # vector store backends (Qdrant)
├── scara-rag         # deterministic RAG engine
├── scara-live        # streaming / temporal RAG (planned)
├── scara-graph       # graph-based RAG (planned)
└── scara-llm         # thin LLM adapters (planned)

Query Flow (Sequence Diagram)

sequenceDiagram
    actor User
    participant RAG as "RAGEngine"
    participant Emb as "Embedder"
    participant VS as "VectorStore (Qdrant)"
    participant RR as "Reranker"
    participant CTX as "Context Assembler"
    participant LLM as "LLM Callable"

    User->>RAG: query(question, policy, filters)
    RAG->>Emb: embed(question)
    Emb-->>RAG: vector
    RAG->>VS: search(vector, k, filters)
    VS-->>RAG: raw_results
    RAG->>RR: rerank(question, raw_results)
    RR-->>RAG: ranked_results
    RAG->>CTX: assemble_context(ranked_results, policy)
    CTX-->>RAG: context blocks
    alt not answerable
        RAG-->>User: "I don't know."
    else answerable
        RAG->>RAG: build prompt
        RAG->>LLM: llm(prompt)
        LLM-->>RAG: answer
        RAG-->>User: RAGResponse(answer, context, raw_results, prompt, metadata)
    end

Installation

pip install scaraflow

User guide:

docs/USER_GUIDE.md

Dependencies

qdrant-client
sentence-transformers
standard scientific Python stack

Quick Start Guide

1. In-Memory Setup (No Docker)

Ideal for testing and prototyping without external infrastructure.

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
from scara_rag.engine import RAGEngine
from scara_rag.policies import RetrievalPolicy

# 1. Setup In-Process Qdrant
client = QdrantClient(":memory:")
store = QdrantVectorStore(
    QdrantConfig(collection="demo", vector_dim=384),
    client=client
)

# 2. Setup Embedder
model = SentenceTransformer("all-MiniLM-L6-v2")

class Embedder:
    def embed(self, text):
        return model.encode(text).tolist()

# 3. Initialize RAG Engine (with dummy LLM)
rag = RAGEngine(
    embedder=Embedder(),
    store=store,
    llm=lambda prompt: f"Simulated answer based on:\n{prompt}",
)

# 4. Ingest Documents
documents = [
    "Scaraflow is retrieval-first.",
    "It prioritizes deterministic behavior.",
    "Qdrant is the reference backend.",
]
vectors = model.encode(documents).tolist()

store.upsert(
    vectors=vectors,
    metadata=[{"text": d} for d in documents],
)

# 5. Query
response = rag.query(
    "What does Scaraflow prioritize?",
    policy=RetrievalPolicy(top_k=2),
)

print(response.answer)

2. Production Setup (With Docker)

Run Qdrant in a container for persistence and performance.

docker run -p 6333:6333 qdrant/qdrant

Connect Scaraflow to the local Qdrant instance:

from qdrant_client import QdrantClient
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig

# Connect to Qdrant on localhost
store = QdrantVectorStore(
    QdrantConfig(
        url="http://localhost:6333",
        collection="prod_v1",
        vector_dim=384,
    )
)
# The rest of the setup (Embedder, RAGEngine) remains the same.

3. Cloud LLMs (OpenAI / Gemini)

Scaraflow is LLM-agnostic. You simply pass a callable that takes a string (prompt) and returns a string (answer).

Using OpenAI

pip install openai

from openai import OpenAI
from scara_rag.engine import RAGEngine

client = OpenAI(api_key="sk-...")

def openai_adapter(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

rag = RAGEngine(
    embedder=Embedder(), # Defined in previous steps
    store=store,         # Defined in previous steps
    llm=openai_adapter,
)

response = rag.query("How does Scaraflow handle retrieval?")
print(response.answer)

Using Google Gemini

pip install google-generativeai

import google.generativeai as genai
from scara_rag.engine import RAGEngine

genai.configure(api_key="AIza...")
model = genai.GenerativeModel('gemini-pro')

def gemini_adapter(prompt: str) -> str:
    response = model.generate_content(prompt)
    return response.text

rag = RAGEngine(
    embedder=Embedder(),
    store=store,
    llm=gemini_adapter,
)

response = rag.query("Explain Scaraflow's design principles.")
print(response.answer)

4. Integration with FastAPI

Build a production API in seconds.

pip install fastapi uvicorn

from fastapi import FastAPI
from pydantic import BaseModel
from scara_rag.policies import RetrievalPolicy

app = FastAPI()

# Assume 'rag' is initialized globally as shown in previous steps

class QueryRequest(BaseModel):
    question: str
    top_k: int = 5

@app.post("/rag/query")
def query_rag(request: QueryRequest):
    response = rag.query(
        request.question,
        policy=RetrievalPolicy(top_k=request.top_k)
    )
    return {
        "answer": response.answer,
        "context": [b.content for b in response.context],
        "metadata": response.metadata
    }

# Run with: uvicorn main:app --reload

Benchmarks

Latest run (2026-02-09, in-memory Qdrant, all-MiniLM-L6-v2, 10k docs, 100 queries):

Documents        : 10000
Queries          : 100
Embedding Time   : 4.07s
Indexing Time    : 0.36s
Avg Latency      : 15.06 ms
P95 Latency      : 17.75 ms
Latency Std Dev  : 2.71 ms

Benchmarks can be run using:

python testing/benchmarks.py

License

MIT License

Author

Built and maintained by Ganesh (K. S. N. Ganesh).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.9

Mar 24, 2026

0.1.8

Jan 10, 2026

0.1.7

Jan 10, 2026

0.1.6

Jan 7, 2026

0.1.5

Jan 7, 2026

0.1.4

Jan 7, 2026

0.1.3

Jan 6, 2026

0.1.2

Jan 6, 2026

0.1.1

Jan 6, 2026

0.1.0

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaraflow-0.1.9.tar.gz (20.1 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scaraflow-0.1.9-py3-none-any.whl (23.0 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file scaraflow-0.1.9.tar.gz.

File metadata

Download URL: scaraflow-0.1.9.tar.gz
Upload date: Mar 24, 2026
Size: 20.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`1e5a87557e931ebb2e93a407755cdc89cb56153b9036db9625b4570f727c8c14`
MD5	`0be8bc509e7560ff730eb39addc941a0`
BLAKE2b-256	`e5f511f40f38e5869c4e57e1284daea9d8fd2cd259537aa9717936b2c2c33f95`

See more details on using hashes here.

File details

Details for the file scaraflow-0.1.9-py3-none-any.whl.

File metadata

Download URL: scaraflow-0.1.9-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 23.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`657915270cbaca99489c0b39f390bd00128d808f2b6f56ba5fcf017c969c6ed0`
MD5	`7303f9a238e368325f7274739e4c8119`
BLAKE2b-256	`600add0b54e7e04c8e81b268287bc232e01bd1f58afc8f99cdca5ca400e989ad`

See more details on using hashes here.

scaraflow 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

#🪲 ScaraFlow

What is Scaraflow?

Why Scaraflow Exists

Design Principles

Architecture Overview

Query Flow (Sequence Diagram)

Installation

Quick Start Guide

1. In-Memory Setup (No Docker)

2. Production Setup (With Docker)

3. Cloud LLMs (OpenAI / Gemini)

Using OpenAI

Using Google Gemini

4. Integration with FastAPI

Benchmarks

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes