Complete In-Device RAG chatbot library, no API keys needed.With 2 lines of implementation in applications.

Project description

Tiny RAG AI

A fully local RAG chatbot library. No API keys, no external servers, just 2 lines of code.
Runs entirely on-device in ~500MB of memory using the Qwen2.5-0.5B model.

What it is

Wraps the Qwen2.5-0.5B-Instruct-GGUF model.
Runs inference in about 330 MB of memory on-device.
Avoids heavy RAG pipelines by accepting documents directly through its parameters.

Why use it

Minimal setup and small footprint.
Focus on your app logic instead of infrastructure.

Installation

Step 1 — Install llama-cpp-python (pre-built, no compilation needed)

Pick the version that matches your hardware:

# CPU only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
 
# CUDA 12.1 (NVIDIA GPU)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
 
# Metal (macOS Apple Silicon)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

Step 2 — Install tiny-rag-ai

pip install tiny-rag-ai

Quick Start

import tiny_rag_ai
 
tiny_rag_ai.index("./docs")
 
answer = tiny_rag_ai.chat("What is your return policy?", use_case="customer support bot")
print(answer)

index() only needs to run once. After that, the FAISS index is saved to disk and reloaded automatically on the next run.

Framework Examples

Flask

import tiny_rag_ai
tiny_rag_ai.index("./docs")
 
@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    return jsonify({"answer": tiny_rag_ai.chat(data["message"], use_case="support bot")})

FastAPI

import tiny_rag_ai
tiny_rag_ai.index("./docs")
 
@app.post("/chat")
def chat(req: ChatRequest):
    return {"answer": tiny_rag_ai.chat(req.message, use_case="support bot")}

Django

# Call index() in AppConfig.ready(), then use tiny_rag_ai.chat() in your view as normal.
import tiny_rag_ai
answer = tiny_rag_ai.chat(request.POST["message"], use_case="support bot")

Deploying to Render (or any cloud server)

Set this environment variable to persist downloaded models across deploys:

TINY_AI_CACHE_DIR=/data/models

Mount a persistent disk at /data with a minimum of 1GB.

API Reference

`tiny_rag_ai.index(folder_path, save_path, n_ctx, threads)`

Parameter	Default	Description
`folder_path`	required	Path to your documents folder (PDF/TXT)
`save_path`	`./tiny_ai_data`	Where to save the FAISS index and chunks
`n_ctx`	`2048`	Context window size for the LLM
`threads`	`8`	Number of CPU threads for inference

`tiny_rag_ai.chat(query, use_case)`

Parameter	Default	Description
`query`	required	The user's question
`use_case`	required	Describes the bot's role e.g. `"customer support bot"`

Stack

LLM: Qwen2.5 0.5B via llama-cpp-python
Embeddings: sentence-transformers (all-MiniLM-L6-v2)
Vector store: FAISS
PDF loading: PyMuPDF

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1.2

Mar 28, 2026

0.1.1

Mar 28, 2026

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_rag_ai-0.1.2.tar.gz (6.6 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tiny_rag_ai-0.1.2-py3-none-any.whl (6.9 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file tiny_rag_ai-0.1.2.tar.gz.

File metadata

Download URL: tiny_rag_ai-0.1.2.tar.gz
Upload date: Mar 28, 2026
Size: 6.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for tiny_rag_ai-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4af759ecf9c235e59c058db448a26e881c64506e4cea4d372764bc5b4ef441f9`
MD5	`ac331c74dd5990cf143904fa8086a091`
BLAKE2b-256	`82e2be7abf457ea96b4197195ac6158d506472648f88be25f7de195cceef3f07`

See more details on using hashes here.

File details

Details for the file tiny_rag_ai-0.1.2-py3-none-any.whl.

File metadata

Download URL: tiny_rag_ai-0.1.2-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 6.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for tiny_rag_ai-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be77da6563a66904e5a9ea6cd702e33a2b53c22b8c4e8d607cf84d8c6d90f796`
MD5	`cf0f2d99ebe7efa5719fb2f9bd86a784`
BLAKE2b-256	`89417aec32b599a6e438cde15f785d294e3f1813a1d6315304836776f45aab87`

See more details on using hashes here.

tiny-rag-ai 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Tiny RAG AI

What it is

Why use it

Installation

Step 1 — Install llama-cpp-python (pre-built, no compilation needed)

Step 2 — Install tiny-rag-ai

Quick Start

Framework Examples

Flask

FastAPI

Django

Deploying to Render (or any cloud server)

API Reference

tiny_rag_ai.index(folder_path, save_path, n_ctx, threads)

tiny_rag_ai.chat(query, use_case)

Stack

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tiny_rag_ai.index(folder_path, save_path, n_ctx, threads)`

`tiny_rag_ai.chat(query, use_case)`