My own RAG library built on turbovec

Project description

TurboRag-ahx47

TurboRag is a fully offline, low‑CPU, low‑RAM RAG (Retrieval Augmented Generation) engine.
This package (turborag-ahx47) is a custom build that leverages:

TurboVec – quantized (Q4) vector index (8× smaller than float32, faster than FAISS)
llama-cpp-python – runs all models as Q4_K_M GGUF files on CPU
Optional REST API – via FastAPI (not included in this core library)
Multi‑language SDKs – Python only for now

Note: The original project name turborag was already taken on PyPI. This is the official library under the name turborag-ahx47.

Features

No GPU required, no internet at runtime – everything runs offline on CPU.
Tiny memory footprint – Gemma Embedding 300M (≈150 MB) + Qwen 0.5B (300 MB).
TurboVec Q4 index – 8× compression, fast brute‑force search.
Built‑in SQLite document store – metadata and chunk storage.
Easy‑to‑use Python API – add documents, ask questions, get answers with sources.

Installation

Prerequisites

Python 3.10 or higher
Rust (only if you want to build TurboVec from source – not required for normal use)

Install from PyPI

pip install turborag-ahx47

This will automatically install the required dependencies, including turbovec, numpy, and llama-cpp-python.

Optional: Build TurboVec from source (advanced)

If you need a custom version of TurboVec, you can build it manually:

git clone https://github.com/RyanCodrai/turbovec.git
cd turbovec/turbovec-python
pip install maturin
maturin develop --release

But for most users, the pre‑built turbovec wheel is sufficient.

Quick Start

1. Download required models

You need two GGUF models:

Embedding model: embeddinggemma-300m-q4_k_m.gguf (≈150 MB)
Download from: Hugging Face
LLM model (e.g., Qwen 0.5B): qwen-0.5b-q4_k_m.gguf (≈300 MB)
Download from your preferred source.

Place them in a folder, e.g., models/.

2. Use the library

from turborag import TurboRag

# Create RAG instance
rag = TurboRag.create(
    embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
    llm_model="models/qwen-0.5b-q4_k_m.gguf",
)

# Add a document
rag.add_document("Paris is the capital of France.")

# Ask a question
answer, sources = rag.ask("What is the capital of France?")
print(answer)  # "Paris"
print(sources)  # List of source chunks

API Reference

`TurboRag.create(embed_model, llm_model, **kwargs)`

Class method to instantiate the RAG engine.

Parameter	Type	Description
`embed_model`	`str`	Path to the embedding GGUF file (Gemma 300M).
`llm_model`	`str`	Path to the LLM GGUF file (e.g., Qwen 0.5B).
`chunk_size`	`int`	(optional) Chunk size for splitting documents, default 512.
`chunk_overlap`	`int`	(optional) Overlap between chunks, default 50.

Returns: TurboRag instance.

`rag.add_document(text, metadata=None)`

Adds a document to the index.

Parameter	Type	Description
`text`	`str`	Document content.
`metadata`	`dict`	(optional) Additional metadata.

`rag.ask(question, k=5)`

Asks a question and retrieves an answer.

Parameter	Type	Description
`question`	`str`	User query.
`k`	`int`	Number of chunks to retrieve (default 5).

Returns: (answer, sources) where answer is a string and sources is a list of chunk texts.

`rag.search(query, k=5)`

Performs a pure vector search without generation.

Parameter	Type	Description
`query`	`str`	Search query.
`k`	`int`	Number of results.

Returns: List of tuples (chunk_text, score, metadata).

Advanced Usage

Using a custom document store

from turborag import TurboRag
from turborag.store import SQLiteDocStore

store = SQLiteDocStore("my_docs.db")
rag = TurboRag.create(
    embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
    llm_model="models/qwen-0.5b-q4_k_m.gguf",
    doc_store=store,
)

Batching documents

docs = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain.",
]
rag.add_documents(docs)  # list of strings

Changing the LLM at runtime

rag.set_llm_model("models/deepseek-1.3b-q4_k_m.gguf")

Dependencies

turbovec (quantized vector index)
llama-cpp-python (GGUF inference)
numpy (vector operations)
sqlite3 (built‑in, for docstore)

Troubleshooting

Issue	Solution
`ImportError: cannot import name 'TurboRag'`	Make sure you have installed the package correctly.
`OSError: Llama model not found`	Provide the correct absolute or relative path to the GGUF file.
`turbovec.IdMapIndex not found`	Reinstall `turbovec` with `pip install --upgrade turbovec`.
High RAM usage	Reduce `chunk_size` or use a smaller LLM.

License

This project is licensed under the MIT License.

Acknowledgements

TurboVec – efficient quantized vector search
llama.cpp – GGUF model inference
Gemma embedding model

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turborag_ahx47-0.1.1.tar.gz (19.6 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turborag_ahx47-0.1.1-py3-none-any.whl (20.9 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file turborag_ahx47-0.1.1.tar.gz.

File metadata

Download URL: turborag_ahx47-0.1.1.tar.gz
Upload date: Jun 7, 2026
Size: 19.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for turborag_ahx47-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c52dbe9d063a3b80299b9f5ac8687a814299d84a1e3d4f61ef0f67b95c8758c8`
MD5	`999955c468d3999544053ee37f5f40fa`
BLAKE2b-256	`31131860824d464de988c0d2d88cd84e92b773181208fdffb9eb1305d3e46406`

See more details on using hashes here.

File details

Details for the file turborag_ahx47-0.1.1-py3-none-any.whl.

File metadata

Download URL: turborag_ahx47-0.1.1-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 20.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for turborag_ahx47-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1cd27612e2f3a5146c5b220152cab2761630fdea81768b3a15d5ad7f8f75f18`
MD5	`bfa48f83dc7d7f72392f672077826d21`
BLAKE2b-256	`76dc1c8777d3390d1ca15ecfbb5be1f3d2e4af59ba5f84f4db057bae2cfd7e59`

See more details on using hashes here.

turborag-ahx47 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

TurboRag-ahx47

Features

Installation

Prerequisites

Install from PyPI

Optional: Build TurboVec from source (advanced)

Quick Start

1. Download required models

2. Use the library

API Reference

TurboRag.create(embed_model, llm_model, **kwargs)

rag.add_document(text, metadata=None)

rag.ask(question, k=5)

rag.search(query, k=5)

Advanced Usage

Using a custom document store

Batching documents

Changing the LLM at runtime

Dependencies

Troubleshooting

License

Links

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TurboRag.create(embed_model, llm_model, **kwargs)`

`rag.add_document(text, metadata=None)`

`rag.ask(question, k=5)`

`rag.search(query, k=5)`