Skip to main content

Plug-and-play RAG pipeline library for Python. Load, chunk, embed, store, retrieve, and generate — all in one clean API.

Project description

rag-bridge-kit

rag-bridge-kit is a plug-and-play Retrieval Augmented Generation pipeline library for Python.

Load, chunk, embed, store, retrieve, and generate — all in one clean API.

Why rag-kit?

  • Zero config — works out of the box with sensible defaults.
  • Modular — swap any component (loader, chunker, embedder, store, generator).
  • Lightweight — no heavy dependencies by default.
  • Production-ready — batch embedding, error handling, type hints everywhere.
  • Extensible — bring your own components by extending base classes.

Install

pip install -e .

With OpenAI support:

pip install -e ".[openai]"

With PDF support:

pip install -e ".[pdf]"

With ChromaDB (persistent vector store):

pip install -e ".[chromadb]"

With local sentence-transformers (no API key needed):

pip install -e ".[sentence-transformers]"

Install everything:

pip install -e ".[all]"

For development:

pip install -e ".[dev,all]"

Quick Start

from rag_bridge_kit import RAGPipeline

pipeline = RAGPipeline()

# Ingest documents
pipeline.ingest_texts([
    "Python is a high-level programming language.",
    "Machine learning is a subset of AI.",
    "RAG combines retrieval with generation.",
])

# Query
result = pipeline.query("What is RAG?")
print(result.answer)
print(f"Chunks retrieved: {len(result.retrieved_chunks)}")

Load from Files

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import TextLoader

pipeline = RAGPipeline(loader=TextLoader("docs/"))
stats = pipeline.ingest()
print(f"Ingested {stats.documents_loaded} docs, {stats.chunks_stored} chunks")

result = pipeline.query("What is the refund policy?")
print(result.answer)

Load PDFs

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import PDFLoader

pipeline = RAGPipeline(loader=PDFLoader("reports/"))
pipeline.ingest()
result = pipeline.query("What were Q4 earnings?")

Load CSVs

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import CSVLoader

pipeline = RAGPipeline(
    loader=CSVLoader("faq.csv", content_columns=["question", "answer"])
)
pipeline.ingest()
result = pipeline.query("How do I reset my password?")

Load Markdown (split by headings)

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.loaders import MarkdownLoader

pipeline = RAGPipeline(
    loader=MarkdownLoader("docs/", split_by_heading=True, heading_level=2)
)
pipeline.ingest()
result = pipeline.query("How to install?")

Choose Your Chunking Strategy

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.chunkers import FixedChunker, SentenceChunker, RecursiveChunker

# Fixed-size character chunks
pipeline = RAGPipeline(chunker=FixedChunker(chunk_size=512, chunk_overlap=64))

# Sentence-based chunks
pipeline = RAGPipeline(chunker=SentenceChunker(max_chunk_size=512, sentence_overlap=1))

# Recursive splitting (like LangChain)
pipeline = RAGPipeline(chunker=RecursiveChunker(chunk_size=512, chunk_overlap=64))

Use OpenAI Embeddings + Generation

import os
from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.embedders import OpenAIEmbedder
from rag_bridge_kit.generators import OpenAIGenerator

api_key = os.environ["OPENAI_API_KEY"]

pipeline = RAGPipeline(
    embedder=OpenAIEmbedder(api_key=api_key),
    generator=OpenAIGenerator(api_key=api_key, model="gpt-4o-mini"),
)

pipeline.ingest_texts(["Your documents here..."])
result = pipeline.query("Your question here?")
print(result.answer)

Use Local Embeddings (SentenceTransformers)

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.embedders import SentenceTransformerEmbedder

pipeline = RAGPipeline(
    embedder=SentenceTransformerEmbedder(model_name="all-MiniLM-L6-v2"),
)

pipeline.ingest_texts(["Your documents..."])
result = pipeline.query("Your question?")

Persistent Storage with ChromaDB

from rag_bridge_kit import RAGPipeline
from rag_bridge_kit.stores import ChromaStore

pipeline = RAGPipeline(
    store=ChromaStore(collection_name="my-docs", persist_directory="./chroma_db"),
)

# Data persists across restarts!
pipeline.ingest_texts(["Important document content..."])

Retrieve Without Generating

pipeline = RAGPipeline()
pipeline.ingest_texts(["Doc 1...", "Doc 2..."])

# Just get the relevant chunks
chunks = pipeline.retrieve("search query", top_k=3)
for chunk in chunks:
    print(f"Score: {chunk.score:.4f} | {chunk.content[:80]}...")

Architecture

┌─────────────────────────────────────────────────────────┐
│                     RAGPipeline                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  INGEST:   Loader → Chunker → Embedder → Store          │
│                                                         │
│  QUERY:    Embedder → Store (search) → Generator         │
│                                                         │
├─────────────────────────────────────────────────────────┤
│  Loaders:    TextLoader, PDFLoader, CSVLoader,           │
│              MarkdownLoader                              │
│                                                         │
│  Chunkers:   FixedChunker, SentenceChunker,              │
│              RecursiveChunker                             │
│                                                         │
│  Embedders:  DefaultEmbedder, OpenAIEmbedder,            │
│              SentenceTransformerEmbedder                  │
│                                                         │
│  Stores:     MemoryStore, ChromaStore                     │
│                                                         │
│  Generators: DefaultGenerator, OpenAIGenerator            │
└─────────────────────────────────────────────────────────┘

CLI

rag-bridge-kit info
rag-bridge-kit ingest ./docs --glob "*.txt"
rag-bridge-kit query ./docs -q "What is RAG?" --top-k 3

Environment Variables

Variable Default Description
RAGKIT_CHUNK_SIZE 512 Default chunk size
RAGKIT_CHUNK_OVERLAP 64 Default chunk overlap
RAGKIT_TOP_K 5 Default number of results
RAGKIT_SIMILARITY_THRESHOLD 0.0 Minimum similarity score
RAGKIT_EMBEDDING_BATCH_SIZE 64 Batch size for embeddings

Run Tests

pip install -e ".[dev]"
python -m pytest

Publish to PyPI

python -m build
twine upload dist/*

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_bridge_kit-0.1.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_bridge_kit-0.1.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file rag_bridge_kit-0.1.0.tar.gz.

File metadata

  • Download URL: rag_bridge_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag_bridge_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a88d43752991cf9d4ec0eb50206b319824b767c9c7fa97d9fdf942a22e59b8e2
MD5 78fb03dccdd5e17c6b97e8707627ef72
BLAKE2b-256 39f021c0d15975feaf4f77f9f77470f20a5b8fd4d051484278a1e810d432afa6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag_bridge_kit-0.1.0.tar.gz:

Publisher: publish.yml on sohammmmm10/rag-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rag_bridge_kit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rag_bridge_kit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag_bridge_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e258aa14f4f10cdd08f9ac37b976f22c5d301a3745d20a8b0b873b7855684e36
MD5 a97541c1937e8a33bbe06a304defe715
BLAKE2b-256 042d6bdc4a421f683e9c17631c6e5350aae9bfc1635278b9944dcd5d9b6bd89a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag_bridge_kit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sohammmmm10/rag-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page