A lightweight Python starter for retrieval-augmented generation workflows.
Project description
rag-starter
RAG boilerplate with vector DB adapters.
rag-starter is a lightweight Python starter for retrieval-augmented generation workflows. It gives you a small but clean foundation for:
- chunking documents.
- generating embeddings.
- indexing vectors through adapter classes.
- retrieving relevant context for prompts.
- swapping vector backends without rewriting your pipeline.
This repo is intentionally minimal. It is designed as a starter, not a full framework.
Features
- Small, readable Python package structure.
- Adapter interface for vector stores.
- In-memory adapter included for local development and tests.
- Optional adapter stubs for Chroma, Qdrant, and Pinecone.
- Simple hashing embedder for demos and bootstrapping.
- Retriever and RAG pipeline helpers.
- Example script and tests.
Project structure
rag-starter/
├── examples/
│ └── basic_usage.py
├── src/
│ └── rag_starter/
│ ├── adapters/
│ │ ├── base.py
│ │ ├── chroma.py
│ │ ├── inmemory.py
│ │ ├── pinecone.py
│ │ └── qdrant.py
│ ├── chunking.py
│ ├── document.py
│ ├── embedder.py
│ ├── pipeline.py
│ ├── prompts.py
│ ├── retriever.py
│ └── utils.py
├── tests/
│ ├── test_chunking.py
│ └── test_pipeline.py
├── pyproject.toml
└── README.md
Install
pip install -e .
Optional extras:
pip install -e .[chroma]
pip install -e .[qdrant]
pip install -e .[pinecone]
Quick start
from rag_starter.adapters.inmemory import InMemoryVectorStore
from rag_starter.chunking import chunk_text
from rag_starter.document import Document, Chunk
from rag_starter.embedder import HashingEmbedder
from rag_starter.pipeline import RAGPipeline
source = Document(
id="doc-1",
text="RAG combines retrieval with generation. Vector databases help store embeddings.",
metadata={"title": "RAG Notes"},
)
chunks = [
Chunk(id=f"chunk-{i}", document_id=source.id, text=text, metadata=source.metadata)
for i, text in enumerate(chunk_text(source.text, chunk_size=60, overlap=10), start=1)
]
embedder = HashingEmbedder(dimensions=64)
store = InMemoryVectorStore()
pipeline = RAGPipeline(store=store, embedder=embedder)
pipeline.index_chunks(chunks)
result = pipeline.retrieve("What helps store embeddings?", top_k=2)
for item in result.matches:
print(item.score, item.chunk.text)
Adapter model
All vector database backends follow the same interface defined in VectorStoreAdapter.
Core methods:
upsert(items)query(vector, top_k, filters=None)delete(ids)clear()
The included InMemoryVectorStore is useful for:
- local development.
- tests.
- learning the architecture.
- quickly bootstrapping a prototype.
The optional adapters are intentionally thin wrappers so you can extend them to fit your preferred backend configuration.
What this starter does not try to do
This starter does not include:
- model serving.
- background ingestion workers.
- file loaders for every format.
- advanced ranking pipelines.
- production auth and tenancy layers.
Those are highly project-specific and are better layered on once your retrieval path is clear.
Development
Run tests:
python -m unittest discover -s tests -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_starter-0.1.0.tar.gz.
File metadata
- Download URL: rag_starter-0.1.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f253a1f3d019ba9725097df6df667f3e5d5795674f4bcec5472d9893f63bcd7
|
|
| MD5 |
073d1b9748008fe0395f690d6745ec38
|
|
| BLAKE2b-256 |
a527534ad3dc4725f3a98872de7c24af057ed3cf754cf891b507679781604fff
|
File details
Details for the file rag_starter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rag_starter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7bc90aba8f4aa47632ecd408d2899571c044ddfc1a430d70e9bba7460bfb0bd
|
|
| MD5 |
8c6762e8ba67347a4940744083771fcd
|
|
| BLAKE2b-256 |
1eecaeb7cea8407927603f4ec9c88c2c2e76a7fa0526bd494943fc5fb39ba304
|