Lightweight embedded document store with TF-IDF search

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3 :: Only
Topic
- Database
- Scientific/Engineering :: Information Analysis

Reason this release was yanked:

Not usable code

Project description

RAGdb

RAGdb is a lightweight, embedded multimodal database for Retrieval-Augmented Generation (RAG) systems. It stores extracted text, metadata, and searchable vectors — all inside a single SQLite file.

⚡ RAGdb is the world’s first lightweight, SQLite-based, embedded multimodal RAG index with zero heavy dependencies and no servers.

No servers. No GPU. No vector database. Just:

pip install ragdb

…and you have a full local RAG index.

🌟 Why RAGdb?

Embedded & portable — everything inside a single .ragdb SQLite file
Multimodal — supports text, PDFs, Word, CSV, JSON, Excel, images, audio, video
Fast local search using TF-IDF + cosine similarity
Zero heavy ML dependencies (no PyTorch, no Transformers)
No file storage — RAGdb stores extracted content, not raw files
Natural-language ready — plug into GPT, Claude, Llama, or any LLM
Works fully offline
Small footprint — ideal for laptops, VMs, containers, or edge devices

RAGdb is the only open-source project that provides a full multimodal RAG index in a single file without requiring a server or vector database.

✨ What RAGdb Stores

RAGdb is a search index, not a backup system. It does not store your original file bytes.

For each ingested file, RAGdb stores:

extracted text (where applicable)
a TF-IDF vector
metadata as JSON
a short human-friendly preview
absolute file path

Your actual files remain on disk or cloud — RAGdb holds only the RAG-ready representation.

📂 Supported Formats

📝 Full text extraction

.txt
.pdf (via PyPDF2)
.docx
.json
.csv
.xls, .xlsx

🖼 Images

.png, .jpg, .jpeg, .webp, .bmp, .gif

Stored data:

size, width, height, mode
OCR text (requires Tesseract installed)
human-friendly preview

Version 0.2.0 will include a built-in OCR engine (no Tesseract required).

🔊 Audio (metadata-only)

.wav, .mp3, .ogg, .flac, .m4a

🎥 Video (metadata-only)

.mp4, .mov, .mkv, .avi, .webm

🚀 Installation

pip install ragdb

Or from source:

pip install -e .

Dependencies:

numpy
pillow
PyPDF2
pandas + openpyxl
python-docx
pytesseract (optional OCR)
fastapi + uvicorn (optional API server)

🧠 Basic Usage

from ragdb import RAGdb

# Create or load database
db = RAGdb("knowledge.ragdb")

# Ingest an entire folder
db.ingest_folder("docs")

# Search your RAG database
results = db.search("machine learning tax changes")

for path, score, media_type, preview in results:
    print(f"{score:.4f}  {media_type}  {path}")
    print("   ", preview)

🤖 Using With an LLM (Natural Language RAG)

RAGdb handles retrieval. The LLM handles reasoning.

Typical pattern:

User asks a natural-language question
Query RAGdb for top-N relevant pieces
Feed results into GPT/Claude/Llama
Generate an answer grounded in retrieved context

This provides semantic behavior without embedding heavy ML models.

🌐 Optional: FastAPI Server

Expose your .ragdb file over HTTP:

export RAGDB_PATH=/path/to/file.ragdb
export RAGDB_API_KEY=secret-token

uvicorn ragdb.server:create_app --factory --reload

REST endpoints:

POST /ingest
GET /documents
GET /search

📌 Notes on Word Files

Old .doc files are not supported. Save them as .docx before ingestion.

📄 License

RAGdb is released under the MIT License.

💡 Coming Soon (v0.2.0)

Built-in tiny OCR (no Tesseract required)
Media extension (audio/video transcription, CLIP embeddings)
Cloud embedding helpers
Optional semantic search layer

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3 :: Only
Topic
- Database
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

1.0.6

Nov 20, 2025

1.0.5

Nov 20, 2025

1.0.4

Nov 20, 2025

1.0.3

Nov 20, 2025

0.1.2

Nov 17, 2025

This version

0.1.1 yanked

Nov 17, 2025

Reason this release was yanked:

Not usable code

0.1.0 yanked

Nov 17, 2025

Reason this release was yanked:

Incorrect and unusable code

ragdb 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

RAGdb

🌟 Why RAGdb?

✨ What RAGdb Stores

📂 Supported Formats

📝 Full text extraction

🖼 Images

🔊 Audio (metadata-only)

🎥 Video (metadata-only)

🚀 Installation

🧠 Basic Usage

🤖 Using With an LLM (Natural Language RAG)

🌐 Optional: FastAPI Server

📌 Notes on Word Files

📄 License

💡 Coming Soon (v0.2.0)

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed