Skip to main content

Retrieval-first, deterministic RAG infrastructure

Project description

Scaraflow

Scaraflow is a retrieval-first RAG infrastructure designed for deterministic, production-grade Retrieval-Augmented Generation.

Scaraflow is not an agent framework, not a prompt playground, and not a demo SDK.
It focuses on one thing and does it rigorously:

Correct, explicit, and scalable retrieval for LLM systems


Why Scaraflow

Most RAG frameworks prioritize orchestration and abstraction.
Scaraflow prioritizes retrieval correctness, predictability, and streaming readiness.

Design principles

  • Retrieval before generation
  • Explicit contracts over magic
  • Deterministic behavior
  • Low-variance latency
  • Streaming-ready by design
  • Infrastructure consistency across dev, notebooks, and production

Architecture Overview

scaraflow/
├── scara-core        # strict contracts & invariants
├── scara-index       # vector store backends (Qdrant)
├── scara-rag         # deterministic RAG engine
├── scara-live        # streaming / temporal RAG (planned)
├── scara-graph       # graph-based RAG (planned)
└── scara-llm         # thin LLM adapters

Installation

pip install scaraflow

Scaraflow requires a real vector database.
The recommended backend is Qdrant.


Quick Start

Scaraflow supports three official setups.


Option 1 — Docker (Local Qdrant)

docker run -p 6333:6333 qdrant/qdrant
from sentence_transformers import SentenceTransformer
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
from scara_rag.engine import RAGEngine
from scara_rag.policies import RetrievalPolicy

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

store = QdrantVectorStore(
    QdrantConfig(
        url="http://localhost:6333",
        collection="quickstart",
        vector_dim=384,
    )
)

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda prompt: "Demo answer",
)

texts = [
    "Scaraflow is a retrieval-first RAG system.",
    "Qdrant provides Rust-based HNSW indexing.",
]

vectors = model.encode(texts).tolist()

store.upsert(
    ids=[0, 1],
    vectors=vectors,
    metadata=[{"src": "quickstart"} for _ in texts],
)

response = rag.query(
    "What is Scaraflow?",
    policy=RetrievalPolicy(top_k=2),
)

print(response.answer)

Option 2 — No Docker (In-Process Qdrant)

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from scara_index.qdrant_store import QdrantVectorStore
from scara_index.config import QdrantConfig
from scara_rag.engine import RAGEngine

client = QdrantClient(path="./qdrant_data")

store = QdrantVectorStore(
    QdrantConfig(
        collection="local_demo",
        vector_dim=384,
    ),
    client=client,
)

model = SentenceTransformer("all-MiniLM-L6-v2")
embedder = type("E", (), {"embed": lambda t: model.encode(t).tolist()})

rag = RAGEngine(
    embedder=embedder,
    store=store,
    llm=lambda _: "Demo answer",
)

store.upsert(
    ids=[0],
    vectors=[model.encode("Scaraflow works without Docker").tolist()],
    metadata=[{"mode": "local"}],
)

print(rag.query("How does Scaraflow run locally?").answer)

Option 3 — Qdrant Cloud / Remote Qdrant

store = QdrantVectorStore(
    QdrantConfig(
        url="https://YOUR_QDRANT_ENDPOINT",
        collection="prod_collection",
        vector_dim=384,
    )
)

Benchmarks

Scaraflow includes reproducible benchmarks measuring:

  • embedding time
  • indexing time
  • query latency (avg / p95)
  • variance

Example (CPU, 10k docs):

Embedding time: ~3.5s
Index time:     ~2.1s
Avg latency:    ~17ms
P95 latency:    ~20ms
Std dev:        low

License

MIT License


Author

Built and maintained by Ganesh (K. S. N. Ganesh)
Focus: retrieval systems, streaming RAG, and infrastructure-grade AI tooling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaraflow-0.1.2.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scaraflow-0.1.2-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file scaraflow-0.1.2.tar.gz.

File metadata

  • Download URL: scaraflow-0.1.2.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3693111b16cd742e302bb5a91080d572c036a9b74b7710fdf43a3bbce7652166
MD5 18caed9d1428ad934456f322162c71ee
BLAKE2b-256 31422354731b07e8606acae656a239a8aca46ffd1e9c82bca86a12da28c04756

See more details on using hashes here.

File details

Details for the file scaraflow-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: scaraflow-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scaraflow-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a861ef47f0a454142d41071102735fe1c12d8d80790ceb5fb31eca06e6aacea9
MD5 9eec186b185ea50ed163ce6ae3adbbd0
BLAKE2b-256 f6a36dea682aa073c39bbfc7ada2657d1626611e90cb79ddd3f119c638a191af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page