Plug-and-play RAG pipeline library for Python. Load, chunk, embed, store, retrieve, and generate — all in one clean API.
Project description
rag-bridge-kit
rag-bridge-kit is a plug-and-play Retrieval Augmented Generation pipeline library for Python.
Load, chunk, embed, store, retrieve, and generate — all in one clean API.
Why rag-kit?
- Zero config — works out of the box with sensible defaults.
- Modular — swap any component (loader, chunker, embedder, store, generator).
- Lightweight — no heavy dependencies by default.
- Production-ready — batch embedding, error handling, type hints everywhere.
- Extensible — bring your own components by extending base classes.
Install
pip install -e .
With OpenAI support:
pip install -e ".[openai]"
With PDF support:
pip install -e ".[pdf]"
With ChromaDB (persistent vector store):
pip install -e ".[chromadb]"
With local sentence-transformers (no API key needed):
pip install -e ".[sentence-transformers]"
Install everything:
pip install -e ".[all]"
For development:
pip install -e ".[dev,all]"
Quick Start
from rag_bridge_kit import RAGPipeline
pipeline = RAGPipeline()
# Ingest documents
pipeline.ingest_texts([
"Python is a high-level programming language.",
"Machine learning is a subset of AI.",
"RAG combines retrieval with generation.",
])
# Query
result = pipeline.query("What is RAG?")
print(result.answer)
print(f"Chunks retrieved: {len(result.retrieved_chunks)}")
Load from Files
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import TextLoader
pipeline = RAGPipeline(loader=TextLoader("docs/"))
stats = pipeline.ingest()
print(f"Ingested {stats.documents_loaded} docs, {stats.chunks_stored} chunks")
result = pipeline.query("What is the refund policy?")
print(result.answer)
Load PDFs
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import PDFLoader
pipeline = RAGPipeline(loader=PDFLoader("reports/"))
pipeline.ingest()
result = pipeline.query("What were Q4 earnings?")
Load CSVs
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import CSVLoader
pipeline = RAGPipeline(
loader=CSVLoader("faq.csv", content_columns=["question", "answer"])
)
pipeline.ingest()
result = pipeline.query("How do I reset my password?")
Load Markdown (split by headings)
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import MarkdownLoader
pipeline = RAGPipeline(
loader=MarkdownLoader("docs/", split_by_heading=True, heading_level=2)
)
pipeline.ingest()
result = pipeline.query("How to install?")
Choose Your Chunking Strategy
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.chunkers import FixedChunker, SentenceChunker, RecursiveChunker
# Fixed-size character chunks
pipeline = RAGPipeline(chunker=FixedChunker(chunk_size=512, chunk_overlap=64))
# Sentence-based chunks
pipeline = RAGPipeline(chunker=SentenceChunker(max_chunk_size=512, sentence_overlap=1))
# Recursive splitting (like LangChain)
pipeline = RAGPipeline(chunker=RecursiveChunker(chunk_size=512, chunk_overlap=64))
Use OpenAI Embeddings + Generation
import os
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.embedders import OpenAIEmbedder
from rag_bridge_kit.generators import OpenAIGenerator
api_key = os.environ["OPENAI_API_KEY"]
pipeline = RAGPipeline(
embedder=OpenAIEmbedder(api_key=api_key),
generator=OpenAIGenerator(api_key=api_key, model="gpt-4o-mini"),
)
pipeline.ingest_texts(["Your documents here..."])
result = pipeline.query("Your question here?")
print(result.answer)
Use Local Embeddings (SentenceTransformers)
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.embedders import SentenceTransformerEmbedder
pipeline = RAGPipeline(
embedder=SentenceTransformerEmbedder(model_name="all-MiniLM-L6-v2"),
)
pipeline.ingest_texts(["Your documents..."])
result = pipeline.query("Your question?")
Persistent Storage with ChromaDB
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.stores import ChromaStore
pipeline = RAGPipeline(
store=ChromaStore(collection_name="my-docs", persist_directory="./chroma_db"),
)
# Data persists across restarts!
pipeline.ingest_texts(["Important document content..."])
Retrieve Without Generating
pipeline = RAGPipeline()
pipeline.ingest_texts(["Doc 1...", "Doc 2..."])
# Just get the relevant chunks
chunks = pipeline.retrieve("search query", top_k=3)
for chunk in chunks:
print(f"Score: {chunk.score:.4f} | {chunk.content[:80]}...")
Architecture
┌─────────────────────────────────────────────────────────â”
│ RAGPipeline │
├─────────────────────────────────────────────────────────┤
│ │
│ INGEST: Loader → Chunker → Embedder → Store │
│ │
│ QUERY: Embedder → Store (search) → Generator │
│ │
├─────────────────────────────────────────────────────────┤
│ Loaders: TextLoader, PDFLoader, CSVLoader, │
│ MarkdownLoader │
│ │
│ Chunkers: FixedChunker, SentenceChunker, │
│ RecursiveChunker │
│ │
│ Embedders: DefaultEmbedder, OpenAIEmbedder, │
│ SentenceTransformerEmbedder │
│ │
│ Stores: MemoryStore, ChromaStore │
│ │
│ Generators: DefaultGenerator, OpenAIGenerator │
└─────────────────────────────────────────────────────────┘
CLI
rag-bridge-kit info
rag-bridge-kit ingest ./docs --glob "*.txt"
rag-bridge-kit query ./docs -q "What is RAG?" --top-k 3
Environment Variables
| Variable | Default | Description |
|---|---|---|
RAGKIT_CHUNK_SIZE |
512 |
Default chunk size |
RAGKIT_CHUNK_OVERLAP |
64 |
Default chunk overlap |
RAGKIT_TOP_K |
5 |
Default number of results |
RAGKIT_SIMILARITY_THRESHOLD |
0.0 |
Minimum similarity score |
RAGKIT_EMBEDDING_BATCH_SIZE |
64 |
Batch size for embeddings |
Run Tests
pip install -e ".[dev]"
python -m pytest
Publish to PyPI
python -m build
twine upload dist/*
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_bridge_kit-0.1.0.tar.gz.
File metadata
- Download URL: rag_bridge_kit-0.1.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a88d43752991cf9d4ec0eb50206b319824b767c9c7fa97d9fdf942a22e59b8e2
|
|
| MD5 |
78fb03dccdd5e17c6b97e8707627ef72
|
|
| BLAKE2b-256 |
39f021c0d15975feaf4f77f9f77470f20a5b8fd4d051484278a1e810d432afa6
|
Provenance
The following attestation bundles were made for rag_bridge_kit-0.1.0.tar.gz:
Publisher:
publish.yml on sohammmmm10/rag-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rag_bridge_kit-0.1.0.tar.gz -
Subject digest:
a88d43752991cf9d4ec0eb50206b319824b767c9c7fa97d9fdf942a22e59b8e2 - Sigstore transparency entry: 1789249306
- Sigstore integration time:
-
Permalink:
sohammmmm10/rag-kit@138f52ce69f931533fb2d22a89eaae4078a961be -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sohammmmm10
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@138f52ce69f931533fb2d22a89eaae4078a961be -
Trigger Event:
release
-
Statement type:
File details
Details for the file rag_bridge_kit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rag_bridge_kit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e258aa14f4f10cdd08f9ac37b976f22c5d301a3745d20a8b0b873b7855684e36
|
|
| MD5 |
a97541c1937e8a33bbe06a304defe715
|
|
| BLAKE2b-256 |
042d6bdc4a421f683e9c17631c6e5350aae9bfc1635278b9944dcd5d9b6bd89a
|
Provenance
The following attestation bundles were made for rag_bridge_kit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on sohammmmm10/rag-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rag_bridge_kit-0.1.0-py3-none-any.whl -
Subject digest:
e258aa14f4f10cdd08f9ac37b976f22c5d301a3745d20a8b0b873b7855684e36 - Sigstore transparency entry: 1789249335
- Sigstore integration time:
-
Permalink:
sohammmmm10/rag-kit@138f52ce69f931533fb2d22a89eaae4078a961be -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sohammmmm10
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@138f52ce69f931533fb2d22a89eaae4078a961be -
Trigger Event:
release
-
Statement type: