Skip to main content

Lightweight embedded document store with TF-IDF search

Project description

RAGdb

RAGdb is a lightweight, embedded multimodal database for Retrieval-Augmented Generation (RAG) systems. It stores extracted text, metadata, and searchable vectors — all inside a single SQLite file.

RAGdb is the world’s first lightweight, SQLite-based, embedded multimodal RAG index with zero heavy dependencies and no servers.

No servers. No GPU. No vector database. Just:

pip install ragdb

…and you have a full local RAG index.


🌟 Why RAGdb?

  • Embedded & portable — everything inside a single .ragdb SQLite file
  • Multimodal — supports text, PDFs, Word, CSV, JSON, Excel, images, audio, video
  • Fast local search using TF-IDF + cosine similarity
  • Zero heavy ML dependencies (no PyTorch, no Transformers)
  • No file storage — RAGdb stores extracted content, not raw files
  • Natural-language ready — plug into GPT, Claude, Llama, or any LLM
  • Works fully offline
  • Small footprint — ideal for laptops, VMs, containers, or edge devices

RAGdb is the only open-source project that provides a full multimodal RAG index in a single file without requiring a server or vector database.


✨ What RAGdb Stores

RAGdb is a search index, not a backup system. It does not store your original file bytes.

For each ingested file, RAGdb stores:

  • extracted text (where applicable)
  • a TF-IDF vector
  • metadata as JSON
  • a short human-friendly preview
  • absolute file path

Your actual files remain on disk or cloud — RAGdb holds only the RAG-ready representation.


📂 Supported Formats

📝 Full text extraction

  • .txt
  • .pdf (via PyPDF2)
  • .docx
  • .json
  • .csv
  • .xls, .xlsx

🖼 Images

  • .png, .jpg, .jpeg, .webp, .bmp, .gif

Stored data:

  • size, width, height, mode
  • OCR text (requires Tesseract installed)
  • human-friendly preview

Version 0.2.0 will include a built-in OCR engine (no Tesseract required).

🔊 Audio (metadata-only)

  • .wav, .mp3, .ogg, .flac, .m4a

🎥 Video (metadata-only)

  • .mp4, .mov, .mkv, .avi, .webm

🚀 Installation

pip install ragdb

Or from source:

pip install -e .

Dependencies:

  • numpy
  • pillow
  • PyPDF2
  • pandas + openpyxl
  • python-docx
  • pytesseract (optional OCR)
  • fastapi + uvicorn (optional API server)

🧠 Basic Usage

from ragdb import RAGdb

# Create or load database
db = RAGdb("knowledge.ragdb")

# Ingest an entire folder
db.ingest_folder("docs")

# Search your RAG database
results = db.search("machine learning tax changes")

for path, score, media_type, preview in results:
    print(f"{score:.4f}  {media_type}  {path}")
    print("   ", preview)

🤖 Using With an LLM (Natural Language RAG)

RAGdb handles retrieval. The LLM handles reasoning.

Typical pattern:

  1. User asks a natural-language question
  2. Query RAGdb for top-N relevant pieces
  3. Feed results into GPT/Claude/Llama
  4. Generate an answer grounded in retrieved context

This provides semantic behavior without embedding heavy ML models.


🌐 Optional: FastAPI Server

Expose your .ragdb file over HTTP:

export RAGDB_PATH=/path/to/file.ragdb
export RAGDB_API_KEY=secret-token

uvicorn ragdb.server:create_app --factory --reload

REST endpoints:

  • POST /ingest
  • GET /documents
  • GET /search

📌 Notes on Word Files

Old .doc files are not supported. Save them as .docx before ingestion.


📄 License

RAGdb is released under the MIT License.


💡 Coming Soon (v0.2.0)

  • Built-in tiny OCR (no Tesseract required)
  • Media extension (audio/video transcription, CLIP embeddings)
  • Cloud embedding helpers
  • Optional semantic search layer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragdb-0.1.2.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragdb-0.1.2-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file ragdb-0.1.2.tar.gz.

File metadata

  • Download URL: ragdb-0.1.2.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ragdb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7a9a6ab85fbbaf1ac286788b1f78c9a3e9df58a1a6addc07745befa45d95b220
MD5 972ce6452145d0e76136d307a073f18c
BLAKE2b-256 84937ac88107def71d20dfa393fbf850a7d49811f5ac3f63facea1dbe09230c8

See more details on using hashes here.

File details

Details for the file ragdb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ragdb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ragdb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 638d042b33f588b53284980f759eda803e8651c5afabca09667b07b6f89e6086
MD5 0e71d653df87379b41dfced6e66646fc
BLAKE2b-256 f9c630e832d30b07bb899f4763eba92341b9931f26286a267cf5747317487009

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page