Skip to main content

A lightweight Python starter for retrieval-augmented generation workflows.

Project description

rag-starter

GitHub Sponsor   Buy Me a Coffee   Ko-Fi   PayPal

RAG boilerplate with vector DB adapters.

rag-starter is a lightweight Python starter for retrieval-augmented generation workflows. It gives you a small but clean foundation for:

  • chunking documents.
  • generating embeddings.
  • indexing vectors through adapter classes.
  • retrieving relevant context for prompts.
  • swapping vector backends without rewriting your pipeline.

This repo is intentionally minimal. It is designed as a starter, not a full framework.

Features

  • Small, readable Python package structure.
  • Adapter interface for vector stores.
  • In-memory adapter included for local development and tests.
  • Optional adapter stubs for Chroma, Qdrant, and Pinecone.
  • Simple hashing embedder for demos and bootstrapping.
  • Retriever and RAG pipeline helpers.
  • Example script and tests.

Project structure

rag-starter/
├── examples/
│   └── basic_usage.py
├── src/
│   └── rag_starter/
│       ├── adapters/
│       │   ├── base.py
│       │   ├── chroma.py
│       │   ├── inmemory.py
│       │   ├── pinecone.py
│       │   └── qdrant.py
│       ├── chunking.py
│       ├── document.py
│       ├── embedder.py
│       ├── pipeline.py
│       ├── prompts.py
│       ├── retriever.py
│       └── utils.py
├── tests/
│   ├── test_chunking.py
│   └── test_pipeline.py
├── pyproject.toml
└── README.md

Install

pip install -e .

Optional extras:

pip install -e .[chroma]
pip install -e .[qdrant]
pip install -e .[pinecone]

Quick start

from rag_starter.adapters.inmemory import InMemoryVectorStore
from rag_starter.chunking import chunk_text
from rag_starter.document import Document, Chunk
from rag_starter.embedder import HashingEmbedder
from rag_starter.pipeline import RAGPipeline

source = Document(
    id="doc-1",
    text="RAG combines retrieval with generation. Vector databases help store embeddings.",
    metadata={"title": "RAG Notes"},
)

chunks = [
    Chunk(id=f"chunk-{i}", document_id=source.id, text=text, metadata=source.metadata)
    for i, text in enumerate(chunk_text(source.text, chunk_size=60, overlap=10), start=1)
]

embedder = HashingEmbedder(dimensions=64)
store = InMemoryVectorStore()
pipeline = RAGPipeline(store=store, embedder=embedder)

pipeline.index_chunks(chunks)
result = pipeline.retrieve("What helps store embeddings?", top_k=2)

for item in result.matches:
    print(item.score, item.chunk.text)

Adapter model

All vector database backends follow the same interface defined in VectorStoreAdapter.

Core methods:

  • upsert(items)
  • query(vector, top_k, filters=None)
  • delete(ids)
  • clear()

The included InMemoryVectorStore is useful for:

  • local development.
  • tests.
  • learning the architecture.
  • quickly bootstrapping a prototype.

The optional adapters are intentionally thin wrappers so you can extend them to fit your preferred backend configuration.

What this starter does not try to do

This starter does not include:

  • model serving.
  • background ingestion workers.
  • file loaders for every format.
  • advanced ranking pipelines.
  • production auth and tenancy layers.

Those are highly project-specific and are better layered on once your retrieval path is clear.

Development

Run tests:

python -m unittest discover -s tests -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_starter-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_starter-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file rag_starter-0.1.0.tar.gz.

File metadata

  • Download URL: rag_starter-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rag_starter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8f253a1f3d019ba9725097df6df667f3e5d5795674f4bcec5472d9893f63bcd7
MD5 073d1b9748008fe0395f690d6745ec38
BLAKE2b-256 a527534ad3dc4725f3a98872de7c24af057ed3cf754cf891b507679781604fff

See more details on using hashes here.

File details

Details for the file rag_starter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rag_starter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rag_starter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7bc90aba8f4aa47632ecd408d2899571c044ddfc1a430d70e9bba7460bfb0bd
MD5 8c6762e8ba67347a4940744083771fcd
BLAKE2b-256 1eecaeb7cea8407927603f4ec9c88c2c2e76a7fa0526bd494943fc5fb39ba304

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page