Lightweight embedded document store with TF-IDF search
Project description
RAGdb
RAGdb is a lightweight, embedded multimodal database for Retrieval-Augmented Generation (RAG) systems. It stores extracted text, metadata, and searchable vectors — all inside a single SQLite file.
⚡ RAGdb is the world’s first lightweight, SQLite-based, embedded multimodal RAG index with zero heavy dependencies and no servers.
No servers. No GPU. No vector database. Just:
pip install ragdb
…and you have a full local RAG index.
🌟 Why RAGdb?
- Embedded & portable — everything inside a single
.ragdbSQLite file - Multimodal — supports text, PDFs, Word, CSV, JSON, Excel, images, audio, video
- Fast local search using TF-IDF + cosine similarity
- Zero heavy ML dependencies (no PyTorch, no Transformers)
- No file storage — RAGdb stores extracted content, not raw files
- Natural-language ready — plug into GPT, Claude, Llama, or any LLM
- Works fully offline
- Small footprint — ideal for laptops, VMs, containers, or edge devices
RAGdb is the only open-source project that provides a full multimodal RAG index in a single file without requiring a server or vector database.
✨ What RAGdb Stores
RAGdb is a search index, not a backup system. It does not store your original file bytes.
For each ingested file, RAGdb stores:
- extracted text (where applicable)
- a TF-IDF vector
- metadata as JSON
- a short human-friendly preview
- absolute file path
Your actual files remain on disk or cloud — RAGdb holds only the RAG-ready representation.
📂 Supported Formats
📝 Full text extraction
.txt.pdf(via PyPDF2).docx.json.csv.xls,.xlsx
🖼 Images
.png,.jpg,.jpeg,.webp,.bmp,.gif
Stored data:
- size, width, height, mode
- OCR text (requires Tesseract installed)
- human-friendly preview
Version 0.2.0 will include a built-in OCR engine (no Tesseract required).
🔊 Audio (metadata-only)
.wav,.mp3,.ogg,.flac,.m4a
🎥 Video (metadata-only)
.mp4,.mov,.mkv,.avi,.webm
🚀 Installation
pip install ragdb
Or from source:
pip install -e .
Dependencies:
- numpy
- pillow
- PyPDF2
- pandas + openpyxl
- python-docx
- pytesseract (optional OCR)
- fastapi + uvicorn (optional API server)
🧠 Basic Usage
from ragdb import RAGdb
# Create or load database
db = RAGdb("knowledge.ragdb")
# Ingest an entire folder
db.ingest_folder("docs")
# Search your RAG database
results = db.search("machine learning tax changes")
for path, score, media_type, preview in results:
print(f"{score:.4f} {media_type} {path}")
print(" ", preview)
🤖 Using With an LLM (Natural Language RAG)
RAGdb handles retrieval. The LLM handles reasoning.
Typical pattern:
- User asks a natural-language question
- Query RAGdb for top-N relevant pieces
- Feed results into GPT/Claude/Llama
- Generate an answer grounded in retrieved context
This provides semantic behavior without embedding heavy ML models.
🌐 Optional: FastAPI Server
Expose your .ragdb file over HTTP:
export RAGDB_PATH=/path/to/file.ragdb
export RAGDB_API_KEY=secret-token
uvicorn ragdb.server:create_app --factory --reload
REST endpoints:
POST /ingestGET /documentsGET /search
📌 Notes on Word Files
Old .doc files are not supported.
Save them as .docx before ingestion.
📄 License
RAGdb is released under the MIT License.
💡 Coming Soon (v0.2.0)
- Built-in tiny OCR (no Tesseract required)
- Media extension (audio/video transcription, CLIP embeddings)
- Cloud embedding helpers
- Optional semantic search layer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragdb-0.1.2.tar.gz.
File metadata
- Download URL: ragdb-0.1.2.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a9a6ab85fbbaf1ac286788b1f78c9a3e9df58a1a6addc07745befa45d95b220
|
|
| MD5 |
972ce6452145d0e76136d307a073f18c
|
|
| BLAKE2b-256 |
84937ac88107def71d20dfa393fbf850a7d49811f5ac3f63facea1dbe09230c8
|
File details
Details for the file ragdb-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ragdb-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
638d042b33f588b53284980f759eda803e8651c5afabca09667b07b6f89e6086
|
|
| MD5 |
0e71d653df87379b41dfced6e66646fc
|
|
| BLAKE2b-256 |
f9c630e832d30b07bb899f4763eba92341b9931f26286a267cf5747317487009
|