Skip to main content

Retrieval-first, deterministic RAG infrastructure

Project description

Scaraflow

Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, production-grade Retrieval-Augmented Generation.

Scaraflow is not an agent framework, not a prompt playground, and not a demo SDK.
It focuses on one thing and does it rigorously:

Correct, explicit, and scalable retrieval for LLM systems


Why Scaraflow

Most RAG frameworks prioritize orchestration and abstraction.
Scaraflow prioritizes retrieval correctness, predictability, and streaming readiness.

Design principles

  • Retrieval before generation
  • Explicit contracts over magic
  • Deterministic behavior
  • Low-variance latency
  • Streaming-ready by design
  • Infrastructure consistency across dev, notebooks, and production

Architecture Overview

scaraflow/
├── scara-core        # strict contracts & invariants
├── scara-index       # vector store backends (Qdrant)
├── scara-rag         # deterministic RAG engine
├── scara-live        # streaming / temporal RAG (planned)
├── scara-graph       # graph-based RAG (planned)
└── scara-llm         # thin LLM adapters

Installation

pip install scaraflow

Scaraflow requires a real vector database.
The recommended backend is Qdrant.


Quick Start

Scaraflow supports three official setups.


Option 1 — Docker (Local Qdrant)

docker run -p 6333:6333 qdrant/qdrant
from sentence_transformers import SentenceTransformer
from scaraflow.scara_index.qdrant_store import QdrantVectorStore
from scaraflow.scara_index.config import QdrantConfig
from scaraflow.scara_rag.engine import RAGEngine
from scaraflow.scara_rag.policies import RetrievalPolicy

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

store = QdrantVectorStore(
    QdrantConfig(
        url="http://localhost:6333",
        collection="quickstart",
        vector_dim=384,
    )
)

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda prompt: "Demo answer",
)

texts = [
    "Scaraflow is a retrieval-first RAG system.",
    "Qdrant provides Rust-based HNSW indexing.",
]

vectors = model.encode(texts).tolist()

store.upsert(
    ids=[0, 1],
    vectors=vectors,
    metadata=[{"src": "quickstart"} for _ in texts],
)

response = rag.query(
    "What is Scaraflow?",
    policy=RetrievalPolicy(top_k=2),
)

print(response.answer)

Option 2 — No Docker (In-Process Qdrant)

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scaraflow.scara_index.qdrant_store import QdrantVectorStore
from scaraflow.scara_index.config import QdrantConfig
from scaraflow.scara_rag.engine import RAGEngine

client = QdrantClient(path="./qdrant_data")

store = QdrantVectorStore(
    QdrantConfig(
        collection="local_demo",
        vector_dim=384,
    ),
    client=client,
)

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda _: "Demo answer",
)

store.upsert(
    ids=[0],
    vectors=[model.encode("Scaraflow works without Docker").tolist()],
    metadata=[{"mode": "local"}],
)

print(rag.query("How does Scaraflow run locally?").answer)

Option 3 — Qdrant Cloud / Remote Qdrant

store = QdrantVectorStore(
    QdrantConfig(
        url="https://YOUR_QDRANT_ENDPOINT",
        collection="prod_collection",
        vector_dim=384,
    )
)

Benchmarks

Scaraflow includes reproducible benchmarks measuring:

  • embedding time
  • indexing time
  • query latency (avg / p95)
  • variance

Example (CPU, 10k docs):

Embedding time: ~3.5s
Index time:     ~2.1s
Avg latency:    ~17ms
P95 latency:    ~20ms
Std dev:        low

License

MIT License


Author

Built and maintained by Ganesh (K. S. N. Ganesh)
Focus: retrieval systems, streaming RAG, and infrastructure-grade AI tooling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaraflow-0.1.1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scaraflow-0.1.1-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file scaraflow-0.1.1.tar.gz.

File metadata

  • Download URL: scaraflow-0.1.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f7b5439fec6d6dab683ed3355b11177f5b9f2c80a4498c4f36f8a971f046398b
MD5 004a0258c071a2fded7c8e5df1160d39
BLAKE2b-256 9fd41d8394ff2c063768e16703a1cabc669aa9953c13190d94e5b6cde9683d63

See more details on using hashes here.

File details

Details for the file scaraflow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scaraflow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b42cb0050c8dfbc5c4fd65fde29bceddc665d0ef7f2742d660716f4d883f2b1b
MD5 d426926b750a8e32074d098c439eae4a
BLAKE2b-256 7ab44c7eed6c29425074327cca89ae2aa36bff35b9b10f0fbb20142a97deabff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page