Portable executable RAG artifacts for Python
Project description
RagBucket
Portable Executable RAG Artifacts for Python
Build once. Load anywhere. Query forever.
╔══════════════════════════════════════════════════════════════╗
║ RAG systems were never meant to be locked to infrastructure ║
║ RagBucket sets them free. ║
╚══════════════════════════════════════════════════════════════╝
Problem · The .rag Format · Install · Quickstart · Providers · Roadmap
◈ The Problem
Every major ML format is portable by default:
model.pt · model.onnx · model.gguf · model.h5
You save them, share them, deploy them anywhere.
RAG systems can't do any of that.
A typical RAG pipeline is a fragile web of moving parts:
❌ Vector databases tied to specific infrastructure
❌ Embedding pipelines that must be rebuilt from scratch
❌ Chunking configs scattered across codebases
❌ Provider-specific integrations with zero portability
❌ Metadata that lives nowhere and everywhere at once
Every time you switch environments — laptop to server, dev to prod, team to team — you rebuild the whole thing. That's broken.
RagBucket fixes this.
It packages your entire RAG pipeline — vectors, chunks, config, and runtime metadata — into a single portable .rag artifact. Like a model checkpoint, but for retrieval intelligence.
◈ Introducing .rag
A .rag artifact is a self-contained, executable unit of retrieval intelligence. It is not a config file. It is not a directory. It is a complete, ready-to-run retrieval system.
| What it stores | How it stores it |
|---|---|
| Semantic embeddings | via Sentence Transformers |
| Vector index | via FAISS |
| Chunked knowledge | via LangChain splitters |
| Retrieval configuration | embedded in manifest |
| Runtime metadata | versioned artifact manifest |
Build it once.
Drop it anywhere.
Query it with one line of code.
◈ Full Architecture
◈ Installation
# Using uv (recommended)
uv add ragbucket
# Using pip
pip install ragbucket
Lightweight by default. Local embedding dependencies are only pulled in when you set
embedding_provider="local". Cloud providers add nothing to your base install.
◈ Quickstart
Step 1 — Build a Portable .rag Artifact
from ragbucket import RagBuilder, RagConfig
import os
from dotenv import load_dotenv
load_dotenv()
config = RagConfig(
# ── Embedding Provider ────────────────────────────────────
embedding_provider = "cohere",
embedding_model = "embed-english-v3.0",
embedding_api_key = os.getenv("COHERE_API_KEY"),
# ── Chunking ──────────────────────────────────────────────
chunk_size = 512,
chunk_overlap = 50,
# ── Retrieval ─────────────────────────────────────────────
top_k = 3,
)
builder = RagBuilder(config=config)
builder.build(
doc_path = "docs/",
op_path = "artifacts/demo.rag",
)
This generates a single portable artifact:
artifacts/
└── demo.rag ← your entire RAG pipeline, packaged
Containing:
demo.rag
├── vectors.faiss ← semantic vector index
├── chunks.json ← chunked document memory
└── manifest.json ← embedding config + metadata
Build once. Query anywhere.
Step 2 — Load and Query the Artifact
from ragbucket import RagRuntime
import os
from dotenv import load_dotenv
load_dotenv()
rag = RagRuntime(
# ── RAG Artifact ──────────────────────────────────────────
rag_path = "artifacts/demo.rag",
# ── Generation Provider ───────────────────────────────────
provider = "groq",
api_key = os.getenv("GROQ_API_KEY"),
model = "llama-3.1-8b-instant",
# ── Embedding Provider Key ────────────────────────────────
embedding_api_key = os.getenv("COHERE_API_KEY"),
# ── System Prompt ─────────────────────────────────────────
system_prompt = "You are a helpful assistant. Keep answers short and crisp.",
)
response = rag.ask("What are Anik's AI/ML skills?")
print(response)
That's it. No vector DB to spin up. No pipeline to reconstruct. Just load and ask.
◈ Multi-Provider Runtime
RagBucket cleanly separates retrieval from generation — meaning you can mix and match embedding providers with generation providers freely.
Generation Providers
| Provider | Example Model |
|---|---|
groq |
llama-3.1-8b-instant |
openai |
gpt-4o-mini |
gemini |
gemini-1.5-flash |
anthropic |
claude-3-haiku-20240307 |
# Swap providers without touching anything else
rag = RagRuntime(
rag_path = "demo.rag",
provider = "anthropic",
api_key = os.getenv("ANTHROPIC_API_KEY"),
model = "claude-3-haiku-20240307",
embedding_api_key = os.getenv("COHERE_API_KEY"),
)
Embedding Providers
| Provider | Example Model |
|---|---|
local |
BAAI/bge-small-en-v1.5 |
cohere |
embed-english-v3.0 |
openai |
text-embedding-3-small |
gemini |
models/embedding-001 |
voyage |
voyage-large-2 |
# Use any embedding provider at build time
config = RagConfig(
embedding_provider = "openai",
embedding_model = "text-embedding-3-small",
embedding_api_key = os.getenv("OPENAI_API_KEY"),
)
◈ Dynamic Retrieval Configuration
Every stage of the retrieval pipeline is configurable. Sane defaults are always applied automatically.
from ragbucket import RagConfig
config = RagConfig(
# Embedding system
embedding_provider = "local",
embedding_model = "sentence-transformers/all-MiniLM-L6-v2",
# Chunking
chunk_size = 1024,
chunk_overlap = 100,
# Retrieval
top_k = 5,
)
All missing values are filled using framework defaults. Nothing breaks if you leave something out.
◈ What a .rag Artifact Contains
demo.rag
│
├── vectors.faiss ← FAISS vector index (semantic search backbone)
├── chunks.json ← document chunks with source metadata
└── manifest.json ← embedding config, top_k, model info, version
The artifact is entirely self-describing. Anyone who receives a .rag file has everything needed to query it — no external config, no infrastructure dependencies, no guesswork.
◈ Technology Stack
| Component | Technology |
|---|---|
| Embeddings | Sentence Transformers |
| Vector Search | FAISS |
| Chunking | LangChain Text Splitters |
| Artifact Packaging | Python zipfile |
| Config Validation | Pydantic |
| Runtime | Pure Python |
◈ Philosophy
RAG systems should be as portable as model files. Not as fragile as microservice stacks.
RagBucket treats RAG systems as portable intelligence artifacts — not fragile infrastructure pipelines. This cleanly separates two concerns that have no business being coupled:
Retrieval memory → what you built → lives in the .rag file
Language generation → how you query it → any provider, any environment
Your retrieval knowledge travels with your code. Swap generation providers without rebuilding anything. Share a .rag file like you'd share a model checkpoint.
The result: reusable semantic memory that is fully decoupled from infrastructure.
◈ Links
| Resource | URL |
|---|---|
| Website | ragbucket.vercel.app |
| PyPI | pypi.org/project/ragbucket |
| GitHub | github.com/anikchand461/ragbucket |
◈ License
MIT License — see LICENSE for details.
◈ RagBucket
The portable runtime layer for Retrieval-Augmented Generation systems.
Built by Anik Chand · ragbucket.vercel.app
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragbucket-0.2.6.tar.gz.
File metadata
- Download URL: ragbucket-0.2.6.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd151443b4a2d23bd166d3364507ab4b66f4dd79c34862181184998e9095a608
|
|
| MD5 |
ef01d740df7c96394dc77f0094e7bdc0
|
|
| BLAKE2b-256 |
f74f02dc80d0f5c5fb338eefb2c6c9a8a6285ec0338e5a8cf9a49aea8bf0031f
|
File details
Details for the file ragbucket-0.2.6-py3-none-any.whl.
File metadata
- Download URL: ragbucket-0.2.6-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0daddebc94ec7c0876f4821f92baa2c83581bcb3887b84bffc5ee03afb38172
|
|
| MD5 |
1be901df7b523b71fe042cf832502a9e
|
|
| BLAKE2b-256 |
4df131530aa41c7bac23ff78e40dc3b1def58da853881d63e280a9013a5a3cc1
|