Complete In-Device RAG chatbot library, no API keys needed.With 2 lines of implementation in applications.
Project description
Tiny RAG AI
A fully local RAG chatbot library. No API keys, no external servers, just 2 lines of code.
Runs entirely on-device in ~500MB of memory using the Qwen2.5-0.5B model.
What it is
- Wraps the Qwen2.5-0.5B-Instruct-GGUF model.
- Runs inference in about 330 MB of memory on-device.
- Avoids heavy RAG pipelines by accepting documents directly through its parameters.
Why use it
- Minimal setup and small footprint.
- Focus on your app logic instead of infrastructure.
Installation
Step 1 — Install llama-cpp-python (pre-built, no compilation needed)
Pick the version that matches your hardware:
# CPU only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
# CUDA 12.1 (NVIDIA GPU)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
# Metal (macOS Apple Silicon)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
Step 2 — Install tiny-rag-ai
pip install tiny-rag-ai
Quick Start
import tiny_rag_ai
tiny_rag_ai.index("./docs")
answer = tiny_rag_ai.chat("What is your return policy?", use_case="customer support bot")
print(answer)
index() only needs to run once. After that, the FAISS index is saved to disk and reloaded automatically on the next run.
Framework Examples
Flask
import tiny_rag_ai
tiny_rag_ai.index("./docs")
@app.route("/chat", methods=["POST"])
def chat():
data = request.get_json()
return jsonify({"answer": tiny_rag_ai.chat(data["message"], use_case="support bot")})
FastAPI
import tiny_rag_ai
tiny_rag_ai.index("./docs")
@app.post("/chat")
def chat(req: ChatRequest):
return {"answer": tiny_rag_ai.chat(req.message, use_case="support bot")}
Django
# Call index() in AppConfig.ready(), then use tiny_rag_ai.chat() in your view as normal.
import tiny_rag_ai
answer = tiny_rag_ai.chat(request.POST["message"], use_case="support bot")
Deploying to Render (or any cloud server)
Set this environment variable to persist downloaded models across deploys:
TINY_AI_CACHE_DIR=/data/models
Mount a persistent disk at /data with a minimum of 1GB.
API Reference
tiny_rag_ai.index(folder_path, save_path, n_ctx, threads)
| Parameter | Default | Description |
|---|---|---|
folder_path |
required | Path to your documents folder (PDF/TXT) |
save_path |
./tiny_ai_data |
Where to save the FAISS index and chunks |
n_ctx |
2048 |
Context window size for the LLM |
threads |
8 |
Number of CPU threads for inference |
tiny_rag_ai.chat(query, use_case)
| Parameter | Default | Description |
|---|---|---|
query |
required | The user's question |
use_case |
required | Describes the bot's role e.g. "customer support bot" |
Stack
- LLM: Qwen2.5 0.5B via llama-cpp-python
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Vector store: FAISS
- PDF loading: PyMuPDF
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tiny_rag_ai-0.1.2.tar.gz.
File metadata
- Download URL: tiny_rag_ai-0.1.2.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4af759ecf9c235e59c058db448a26e881c64506e4cea4d372764bc5b4ef441f9
|
|
| MD5 |
ac331c74dd5990cf143904fa8086a091
|
|
| BLAKE2b-256 |
82e2be7abf457ea96b4197195ac6158d506472648f88be25f7de195cceef3f07
|
File details
Details for the file tiny_rag_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tiny_rag_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be77da6563a66904e5a9ea6cd702e33a2b53c22b8c4e8d607cf84d8c6d90f796
|
|
| MD5 |
cf0f2d99ebe7efa5719fb2f9bd86a784
|
|
| BLAKE2b-256 |
89417aec32b599a6e438cde15f785d294e3f1813a1d6315304836776f45aab87
|