Skip to main content

Retrieval-first, deterministic RAG infrastructure

Project description

Scaraflow

Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, production-grade Retrieval-Augmented Generation.

Scaraflow is not an agent framework, not a prompt playground, and not a demo SDK.
It focuses on one thing and does it rigorously:

Correct, explicit, and scalable retrieval for LLM systems


Why Scaraflow

Most RAG frameworks prioritize orchestration and abstraction.
Scaraflow prioritizes retrieval correctness, predictability, and streaming readiness.

Design principles

  • Retrieval before generation
  • Explicit contracts over magic
  • Deterministic behavior
  • Low-variance latency
  • Streaming-ready by design
  • Infrastructure consistency across dev, notebooks, and production

Architecture Overview

scaraflow/
├── scara-core        # strict contracts & invariants
├── scara-index       # vector store backends (Qdrant)
├── scara-rag         # deterministic RAG engine
├── scara-live        # streaming / temporal RAG (planned)
├── scara-graph       # graph-based RAG (planned)
└── scara-llm         # thin LLM adapters

Installation

pip install scaraflow

Scaraflow requires a real vector database.
The recommended backend is Qdrant.


Quick Start

Scaraflow supports three official setups.


Option 1 — Docker (Local Qdrant)

docker run -p 6333:6333 qdrant/qdrant
from sentence_transformers import SentenceTransformer
from scaraflow.scara_index.qdrant_store import QdrantVectorStore
from scaraflow.scara_index.config import QdrantConfig
from scaraflow.scara_rag.engine import RAGEngine
from scaraflow.scara_rag.policies import RetrievalPolicy

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

store = QdrantVectorStore(
    QdrantConfig(
        url="http://localhost:6333",
        collection="quickstart",
        vector_dim=384,
    )
)

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda prompt: "Demo answer",
)

texts = [
    "Scaraflow is a retrieval-first RAG system.",
    "Qdrant provides Rust-based HNSW indexing.",
]

vectors = model.encode(texts).tolist()

store.upsert(
    ids=[0, 1],
    vectors=vectors,
    metadata=[{"src": "quickstart"} for _ in texts],
)

response = rag.query(
    "What is Scaraflow?",
    policy=RetrievalPolicy(top_k=2),
)

print(response.answer)

Option 2 — No Docker (In-Process Qdrant)

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scaraflow.scara_index.qdrant_store import QdrantVectorStore
from scaraflow.scara_index.config import QdrantConfig
from scaraflow.scara_rag.engine import RAGEngine

client = QdrantClient(path="./qdrant_data")

store = QdrantVectorStore(
    QdrantConfig(
        collection="local_demo",
        vector_dim=384,
    ),
    client=client,
)

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda _: "Demo answer",
)

store.upsert(
    ids=[0],
    vectors=[model.encode("Scaraflow works without Docker").tolist()],
    metadata=[{"mode": "local"}],
)

print(rag.query("How does Scaraflow run locally?").answer)

Option 3 — Qdrant Cloud / Remote Qdrant

store = QdrantVectorStore(
    QdrantConfig(
        url="https://YOUR_QDRANT_ENDPOINT",
        collection="prod_collection",
        vector_dim=384,
    )
)

Benchmarks

Scaraflow includes reproducible benchmarks measuring:

  • embedding time
  • indexing time
  • query latency (avg / p95)
  • variance

Example (CPU, 10k docs):

Embedding time: ~3.5s
Index time:     ~2.1s
Avg latency:    ~17ms
P95 latency:    ~20ms
Std dev:        low

License

MIT License


Author

Built and maintained by Ganesh (K. S. N. Ganesh)
Focus: retrieval systems, streaming RAG, and infrastructure-grade AI tooling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaraflow-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scaraflow-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file scaraflow-0.1.0.tar.gz.

File metadata

  • Download URL: scaraflow-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 63f19415fe4357241338bab4708a7745953d7d2e50821a7886439d1159391dd7
MD5 379ace971b84edcd0bd5df78e4e3ab50
BLAKE2b-256 942bb5e8dead23f0769c25d333ec472ad1ceab7834e4dc9ceaeff31c9c017afc

See more details on using hashes here.

File details

Details for the file scaraflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scaraflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 258618082e4eda71ff455d4dd59d97cf2b0ac1a2b892185e66d9ea6d3ab887c4
MD5 82c547d36edf3c9eaf5e682c04043b85
BLAKE2b-256 287e5d14d37a08a9f6f0628ed6daaf3f102c91d15900aecbe90a7109a23d878d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page