Skip to main content

Ask questions about your documents using local LLMs — no cloud, no API keys

Project description

local-rag

PyPI version CI Python versions License: MIT

Ask questions about your documents using local LLMs — no cloud, no API keys, your data stays on your machine.

$ rag add research-paper.pdf annual-report.docx notes.md
Loading research-paper.pdf…
  128 chunks → embedding with nomic-embed-text…
  Added 128 new chunks
Loading annual-report.docx…
  94 chunks → embedding with nomic-embed-text…
  Added 94 new chunks

$ rag ask "What were the main revenue drivers in Q3?"

╭─ Answer ────────────────────────────────────────────────────────╮
│ Based on the annual report, the main revenue drivers in Q3      │
│ were cloud services (+34% YoY) and professional services…      │
│ [Source: annual-report.docx]                                    │
╰─────────────────────────────────────────────────────────────────╯

Features

  • Local-first — Ollama for embeddings + chat, ChromaDB for vector storage
  • Multiple formats — PDF, DOCX, Markdown, plain text, RST
  • Persistent store — add documents once, query forever
  • Source filtering — restrict questions to specific files
  • Smart chunking — overlapping word-based chunks for better context
  • Rich terminal UI — markdown-rendered answers, source tables

Requirements

  • Python ≥ 3.10
  • Ollama running locally
  • Embedding model: ollama pull nomic-embed-text
  • Chat model: ollama pull mistral

Installation

pip install local-rag

Or from source:

git clone https://github.com/dennisreichenberg/local-rag
cd local-rag
pip install -e ".[dev]"

Quick Start

# 1. Start Ollama (if not already running)
ollama serve

# 2. Pull required models
ollama pull nomic-embed-text
ollama pull mistral

# 3. Add documents
rag add report.pdf notes.md

# 4. Ask questions
rag ask "Summarize the key points"
rag ask "What are the risks mentioned?" --show-sources

Commands

rag add <files...>

Ingest one or more documents into the vector store. Supports .pdf, .docx, .txt, .md, .rst.

rag add report.pdf notes.md docs/
rag add report.pdf --embed-model nomic-embed-text --chunk-size 256

rag ask <question>

Ask a question. Retrieves the most relevant chunks and sends them to the LLM.

rag ask "What is the conclusion?"
rag ask "Explain the architecture" --chat-model llama3 --top-k 8
rag ask "What risks are mentioned?" --source report.pdf --show-sources

rag list

Show all ingested documents.

rag remove <source>

Remove a document (and all its chunks) from the store. Supports partial path matching.

rag remove report.pdf

rag clear

Remove everything from the store.

Options

Command Option Default Description
add --embed-model nomic-embed-text Ollama embedding model
add --chunk-size 512 Words per chunk
add --chunk-overlap 64 Overlap between chunks
ask --chat-model mistral Ollama chat model
ask --embed-model nomic-embed-text Ollama embedding model
ask --top-k 5 Chunks to retrieve
ask --source Filter by source file
ask --show-sources Show retrieved chunks
All --host http://localhost:11434 Ollama base URL

How it works

Document → Chunking → Ollama Embeddings → ChromaDB
                                              ↓
Question → Ollama Embeddings → Vector Search → Top-K Chunks → Ollama LLM → Answer
  1. Ingestion: documents are split into overlapping chunks, embedded via Ollama (nomic-embed-text), and stored in a local ChromaDB database (~/.local/share/local-rag/)
  2. Retrieval: your question is embedded, then the closest chunks are retrieved via cosine similarity
  3. Generation: the retrieved chunks + question are sent to an Ollama chat model, which answers strictly from the provided context

Data storage

All data is stored locally at ~/.local/share/local-rag/chroma/. No data leaves your machine.

To move or backup your store, copy that directory.

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_local_rag-0.1.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_local_rag-0.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file ollama_local_rag-0.1.0.tar.gz.

File metadata

  • Download URL: ollama_local_rag-0.1.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ollama_local_rag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 827332cb14c66f1b98e0c7178b26b1bb2c1480ba7ef9118a3c07c8004fdba1b1
MD5 d9609febda1563733535c7361c2e75d0
BLAKE2b-256 50ff728f71c3ba11dbafaa72f2467ec6888761057e95174d113a5ecea55970eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_local_rag-0.1.0.tar.gz:

Publisher: ci.yml on dennisreichenberg/local-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ollama_local_rag-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ollama_local_rag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d65d3e3bb64d81ecc77d2262911aa423584d2cfaa766538cdd1d4237719ffade
MD5 230eefcfd37ae5fa9c39a6c94926cff5
BLAKE2b-256 1cd40040658a0b17e015388872dfdad1fafac9fbe949cff823579c06be19476a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_local_rag-0.1.0-py3-none-any.whl:

Publisher: ci.yml on dennisreichenberg/local-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page