Skip to main content

My own RAG library built on turbovec

Project description

TurboRag-ahx47

TurboRag is a fully offline, low‑CPU, low‑RAM RAG (Retrieval Augmented Generation) engine.
This package (turborag-ahx47) is a custom build that leverages:

  • TurboVec – quantized (Q4) vector index (8× smaller than float32, faster than FAISS)
  • llama-cpp-python – runs all models as Q4_K_M GGUF files on CPU
  • Optional REST API – via FastAPI (not included in this core library)
  • Multi‑language SDKs – Python only for now

Note: The original project name turborag was already taken on PyPI. This is the official library under the name turborag-ahx47.


Features

  • No GPU required, no internet at runtime – everything runs offline on CPU.
  • Tiny memory footprint – Gemma Embedding 300M (≈150 MB) + Qwen 0.5B (300 MB).
  • TurboVec Q4 index – 8× compression, fast brute‑force search.
  • Built‑in SQLite document store – metadata and chunk storage.
  • Easy‑to‑use Python API – add documents, ask questions, get answers with sources.

Installation

Prerequisites

  • Python 3.10 or higher
  • Rust (only if you want to build TurboVec from source – not required for normal use)

Install from PyPI

pip install turborag-ahx47

This will automatically install the required dependencies, including turbovec, numpy, and llama-cpp-python.

Optional: Build TurboVec from source (advanced)

If you need a custom version of TurboVec, you can build it manually:

git clone https://github.com/RyanCodrai/turbovec.git
cd turbovec/turbovec-python
pip install maturin
maturin develop --release

But for most users, the pre‑built turbovec wheel is sufficient.


Quick Start

1. Download required models

You need two GGUF models:

  • Embedding model: embeddinggemma-300m-q4_k_m.gguf (≈150 MB)
    Download from: Hugging Face
  • LLM model (e.g., Qwen 0.5B): qwen-0.5b-q4_k_m.gguf (≈300 MB)
    Download from your preferred source.

Place them in a folder, e.g., models/.

2. Use the library

from turborag import TurboRag

# Create RAG instance
rag = TurboRag.create(
    embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
    llm_model="models/qwen-0.5b-q4_k_m.gguf",
)

# Add a document
rag.add_document("Paris is the capital of France.")

# Ask a question
answer, sources = rag.ask("What is the capital of France?")
print(answer)  # "Paris"
print(sources)  # List of source chunks

API Reference

TurboRag.create(embed_model, llm_model, **kwargs)

Class method to instantiate the RAG engine.

Parameter Type Description
embed_model str Path to the embedding GGUF file (Gemma 300M).
llm_model str Path to the LLM GGUF file (e.g., Qwen 0.5B).
chunk_size int (optional) Chunk size for splitting documents, default 512.
chunk_overlap int (optional) Overlap between chunks, default 50.

Returns: TurboRag instance.

rag.add_document(text, metadata=None)

Adds a document to the index.

Parameter Type Description
text str Document content.
metadata dict (optional) Additional metadata.

rag.ask(question, k=5)

Asks a question and retrieves an answer.

Parameter Type Description
question str User query.
k int Number of chunks to retrieve (default 5).

Returns: (answer, sources) where answer is a string and sources is a list of chunk texts.

rag.search(query, k=5)

Performs a pure vector search without generation.

Parameter Type Description
query str Search query.
k int Number of results.

Returns: List of tuples (chunk_text, score, metadata).


Advanced Usage

Using a custom document store

from turborag import TurboRag
from turborag.store import SQLiteDocStore

store = SQLiteDocStore("my_docs.db")
rag = TurboRag.create(
    embed_model="models/embeddinggemma-300m-q4_k_m.gguf",
    llm_model="models/qwen-0.5b-q4_k_m.gguf",
    doc_store=store,
)

Batching documents

docs = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain.",
]
rag.add_documents(docs)  # list of strings

Changing the LLM at runtime

rag.set_llm_model("models/deepseek-1.3b-q4_k_m.gguf")

Dependencies

  • turbovec (quantized vector index)
  • llama-cpp-python (GGUF inference)
  • numpy (vector operations)
  • sqlite3 (built‑in, for docstore)

Troubleshooting

Issue Solution
ImportError: cannot import name 'TurboRag' Make sure you have installed the package correctly.
OSError: Llama model not found Provide the correct absolute or relative path to the GGUF file.
turbovec.IdMapIndex not found Reinstall turbovec with pip install --upgrade turbovec.
High RAM usage Reduce chunk_size or use a smaller LLM.

License

This project is licensed under the MIT License.


Links


Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turborag_ahx47-0.1.1.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turborag_ahx47-0.1.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file turborag_ahx47-0.1.1.tar.gz.

File metadata

  • Download URL: turborag_ahx47-0.1.1.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for turborag_ahx47-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c52dbe9d063a3b80299b9f5ac8687a814299d84a1e3d4f61ef0f67b95c8758c8
MD5 999955c468d3999544053ee37f5f40fa
BLAKE2b-256 31131860824d464de988c0d2d88cd84e92b773181208fdffb9eb1305d3e46406

See more details on using hashes here.

File details

Details for the file turborag_ahx47-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: turborag_ahx47-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for turborag_ahx47-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d1cd27612e2f3a5146c5b220152cab2761630fdea81768b3a15d5ad7f8f75f18
MD5 bfa48f83dc7d7f72392f672077826d21
BLAKE2b-256 76dc1c8777d3390d1ca15ecfbb5be1f3d2e4af59ba5f84f4db057bae2cfd7e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page