My own RAG library built on turbovec
Project description
TurboRag-ahx47
TurboRag is a fully offline, low‑CPU, low‑RAM RAG (Retrieval Augmented Generation) engine.
This package (turborag-ahx47) is a custom build that leverages:
- TurboVec – quantized (Q4) vector index (8× smaller than float32, faster than FAISS)
- llama-cpp-python – runs all models as Q4_K_M GGUF files on CPU
- Optional REST API – via FastAPI (not included in this core library)
- Multi‑language SDKs – Python only for now
Note: The original project name
turboragwas already taken on PyPI. This is the official library under the nameturborag-ahx47.
Features
- No GPU required, no internet at runtime – everything runs offline on CPU.
- Tiny memory footprint – Gemma Embedding 300M (≈150 MB) + Qwen 0.5B (300 MB).
- TurboVec Q4 index – 8× compression, fast brute‑force search.
- Built‑in SQLite document store – metadata and chunk storage.
- Easy‑to‑use Python API – add documents, ask questions, get answers with sources.
Installation
Prerequisites
- Python 3.10 or higher
- Rust (only if you want to build TurboVec from source – not required for normal use)
Install from PyPI
pip install turborag-ahx47
This will automatically install the required dependencies, including turbovec, numpy, and llama-cpp-python.
Optional: Build TurboVec from source (advanced)
If you need a custom version of TurboVec, you can build it manually:
git clone https://github.com/RyanCodrai/turbovec.git
cd turbovec/turbovec-python
pip install maturin
maturin develop --release
But for most users, the pre‑built turbovec wheel is sufficient.
Quick Start
1. Download required models
You need two GGUF models:
- Embedding model:
embeddinggemma-300m-q4_k_m.gguf(≈150 MB)
Download from: Hugging Face - LLM model (e.g., Qwen 0.5B):
qwen-0.5b-q4_k_m.gguf(≈300 MB)
Download from your preferred source.
Place them in a folder, e.g., models/.
2. Use the library
from turborag import TurboRag
# Create RAG instance
rag = TurboRag.create(
embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
llm_model="models/qwen-0.5b-q4_k_m.gguf",
)
# Add a document
rag.add_document("Paris is the capital of France.")
# Ask a question
answer, sources = rag.ask("What is the capital of France?")
print(answer) # "Paris"
print(sources) # List of source chunks
API Reference
TurboRag.create(embed_model, llm_model, **kwargs)
Class method to instantiate the RAG engine.
| Parameter | Type | Description |
|---|---|---|
embed_model |
str |
Path to the embedding GGUF file (Gemma 300M). |
llm_model |
str |
Path to the LLM GGUF file (e.g., Qwen 0.5B). |
chunk_size |
int |
(optional) Chunk size for splitting documents, default 512. |
chunk_overlap |
int |
(optional) Overlap between chunks, default 50. |
Returns: TurboRag instance.
rag.add_document(text, metadata=None)
Adds a document to the index.
| Parameter | Type | Description |
|---|---|---|
text |
str |
Document content. |
metadata |
dict |
(optional) Additional metadata. |
rag.ask(question, k=5)
Asks a question and retrieves an answer.
| Parameter | Type | Description |
|---|---|---|
question |
str |
User query. |
k |
int |
Number of chunks to retrieve (default 5). |
Returns: (answer, sources) where answer is a string and sources is a list of chunk texts.
rag.search(query, k=5)
Performs a pure vector search without generation.
| Parameter | Type | Description |
|---|---|---|
query |
str |
Search query. |
k |
int |
Number of results. |
Returns: List of tuples (chunk_text, score, metadata).
Advanced Usage
Using a custom document store
from turborag import TurboRag
from turborag.store import SQLiteDocStore
store = SQLiteDocStore("my_docs.db")
rag = TurboRag.create(
embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
llm_model="models/qwen-0.5b-q4_k_m.gguf",
doc_store=store,
)
Batching documents
docs = [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
]
rag.add_documents(docs) # list of strings
Changing the LLM at runtime
rag.set_llm_model("models/deepseek-1.3b-q4_k_m.gguf")
Dependencies
turbovec(quantized vector index)llama-cpp-python(GGUF inference)numpy(vector operations)sqlite3(built‑in, for docstore)
Troubleshooting
| Issue | Solution |
|---|---|
ImportError: cannot import name 'TurboRag' |
Make sure you have installed the package correctly. |
OSError: Llama model not found |
Provide the correct absolute or relative path to the GGUF file. |
turbovec.IdMapIndex not found |
Reinstall turbovec with pip install --upgrade turbovec. |
| High RAM usage | Reduce chunk_size or use a smaller LLM. |
License
This project is licensed under the MIT License.
Links
- PyPI package: turborag-ahx47
- Source code: GitHub (replace with your actual repo URL)
- Report issues: Issue tracker
Acknowledgements
- TurboVec – efficient quantized vector search
- llama.cpp – GGUF model inference
- Gemma embedding model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turborag_ahx47-0.1.1.tar.gz.
File metadata
- Download URL: turborag_ahx47-0.1.1.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c52dbe9d063a3b80299b9f5ac8687a814299d84a1e3d4f61ef0f67b95c8758c8
|
|
| MD5 |
999955c468d3999544053ee37f5f40fa
|
|
| BLAKE2b-256 |
31131860824d464de988c0d2d88cd84e92b773181208fdffb9eb1305d3e46406
|
File details
Details for the file turborag_ahx47-0.1.1-py3-none-any.whl.
File metadata
- Download URL: turborag_ahx47-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1cd27612e2f3a5146c5b220152cab2761630fdea81768b3a15d5ad7f8f75f18
|
|
| MD5 |
bfa48f83dc7d7f72392f672077826d21
|
|
| BLAKE2b-256 |
76dc1c8777d3390d1ca15ecfbb5be1f3d2e4af59ba5f84f4db057bae2cfd7e59
|