Skip to main content

Complete In-Device RAG chatbot library, no API keys needed.With 2 lines of implementation in applications.

Project description

Tiny RAG AI

A fully local RAG chatbot library. No API keys, no external servers, just 2 lines of code.
Runs entirely on-device in ~500MB of memory using the Qwen2.5-0.5B model.

What it is

  • Wraps the Qwen2.5-0.5B-Instruct-GGUF model.
  • Runs inference in about 330 MB of memory on-device.
  • Avoids heavy RAG pipelines by accepting documents directly through its parameters.

Why use it

  • Minimal setup and small footprint.
  • Focus on your app logic instead of infrastructure.

Installation

Step 1 — Install llama-cpp-python (pre-built, no compilation needed)

Pick the version that matches your hardware:

# CPU only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
 
# CUDA 12.1 (NVIDIA GPU)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
 
# Metal (macOS Apple Silicon)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

Step 2 — Install tiny-rag-ai

pip install tiny-rag-ai

Quick Start

import tiny_rag_ai
 
tiny_rag_ai.index("./docs")
 
answer = tiny_rag_ai.chat("What is your return policy?", use_case="customer support bot")
print(answer)

index() only needs to run once. After that, the FAISS index is saved to disk and reloaded automatically on the next run.


Framework Examples

Flask

import tiny_rag_ai
tiny_rag_ai.index("./docs")
 
@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    return jsonify({"answer": tiny_rag_ai.chat(data["message"], use_case="support bot")})

FastAPI

import tiny_rag_ai
tiny_rag_ai.index("./docs")
 
@app.post("/chat")
def chat(req: ChatRequest):
    return {"answer": tiny_rag_ai.chat(req.message, use_case="support bot")}

Django

# Call index() in AppConfig.ready(), then use tiny_rag_ai.chat() in your view as normal.
import tiny_rag_ai
answer = tiny_rag_ai.chat(request.POST["message"], use_case="support bot")

Deploying to Render (or any cloud server)

Set this environment variable to persist downloaded models across deploys:

TINY_AI_CACHE_DIR=/data/models

Mount a persistent disk at /data with a minimum of 1GB.


API Reference

tiny_rag_ai.index(folder_path, save_path, n_ctx, threads)

Parameter Default Description
folder_path required Path to your documents folder (PDF/TXT)
save_path ./tiny_ai_data Where to save the FAISS index and chunks
n_ctx 2048 Context window size for the LLM
threads 8 Number of CPU threads for inference

tiny_rag_ai.chat(query, use_case)

Parameter Default Description
query required The user's question
use_case required Describes the bot's role e.g. "customer support bot"

Stack

  • LLM: Qwen2.5 0.5B via llama-cpp-python
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • Vector store: FAISS
  • PDF loading: PyMuPDF

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_rag_ai-0.1.2.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiny_rag_ai-0.1.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file tiny_rag_ai-0.1.2.tar.gz.

File metadata

  • Download URL: tiny_rag_ai-0.1.2.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for tiny_rag_ai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4af759ecf9c235e59c058db448a26e881c64506e4cea4d372764bc5b4ef441f9
MD5 ac331c74dd5990cf143904fa8086a091
BLAKE2b-256 82e2be7abf457ea96b4197195ac6158d506472648f88be25f7de195cceef3f07

See more details on using hashes here.

File details

Details for the file tiny_rag_ai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tiny_rag_ai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for tiny_rag_ai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 be77da6563a66904e5a9ea6cd702e33a2b53c22b8c4e8d607cf84d8c6d90f796
MD5 cf0f2d99ebe7efa5719fb2f9bd86a784
BLAKE2b-256 89417aec32b599a6e438cde15f785d294e3f1813a1d6315304836776f45aab87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page