Local-first general-purpose RAG with an MCP server, CLI, and web console
Project description
localbrain
Local-first general-purpose RAG — point it at folders/files, index them, and search by meaning through an MCP server (for Claude Code etc.), a CLI, and a web console. Everything runs on your machine; generation is done by your MCP client (e.g. Claude), so localbrain only needs a small embedding model — no local LLM, no Ollama daemon required.
- 🔎 Semantic search + Cross-Encoder reranking
- 🧩 MCP tools (
search,add_path,reindex,query_insights, …) - 🖥️ Web console: source management · manual indexing (live progress) · search test · model swap
- ♻️ Incremental indexing (only changed files), swappable embedding model
- 📈 Query clustering insights (FAQs & knowledge gaps) — a self-improving loop
- 🔒 Fully local; pluggable providers (fastembed ONNX / sentence-transformers / Ollama)
Install
Installed as
localbrain-ragon PyPI; the command and import staylocalbrain.
Default (CPU, no extra setup)
pip install localbrain-rag
Uses fastembed (ONNX, multilingual e5) — works on CPU with no PyTorch. Good enough to start.
Best quality (GPU + bge-m3) — recommended
- Install a CUDA build of PyTorch matching your GPU (example: CUDA 12.6):
pip install torch --index-url https://download.pytorch.org/whl/cu126
- Install localbrain with sentence-transformers:
pip install "localbrain-rag[st]"
- Point the config at bge-m3 (see Configuration). Models auto-download on first use.
No NVIDIA GPU? Skip step 1 —
pip install "localbrain-rag[st]"installs a CPU PyTorch and still works (slower).
Quick start
# CLI
localbrain add-source "C:\Users\me\notes" --globs "*.md,*.txt"
localbrain index
localbrain search "what did we decide about delivery delays"
localbrain insights # FAQ clusters + knowledge gaps
localbrain stats
localbrain --version
# Web console → http://127.0.0.1:8765
localbrain-web
# MCP server (stdio) — register with Claude Code
localbrain-mcp
Configuration
Config lives at ~/.localbrain/config.json (override the dir with LOCALBRAIN_HOME).
Data (SQLite + Chroma vectors + model-by-model collections) also lives under ~/.localbrain.
{
"embedding": { "provider": "sentence-transformers", "model": "BAAI/bge-m3", "fp16": false },
"chunk": { "size": 1000, "overlap": 150 },
"rerank": { "enabled": true, "provider": "cross-encoder",
"model": "BAAI/bge-reranker-v2-m3", "candidate_k": 30, "fp16": false },
"search_k": 5
}
- Swap models freely — change
embedding.model, thenlocalbrain index --rebuild(text is kept, so it re-embeds without re-reading files). Each model uses its own vector collection (cosine distance). fp16: truehalves VRAM and speeds up inference on GPU (ignored on CPU). Handy for ~6 GB cards.- Reranking improves accuracy; scores become Cross-Encoder relevance (≈0.8+ strong match, ≈0 none).
Models & first run
First search/index downloads models from Hugging Face into the HF cache (HF_HOME):
bge-m3 (~2 GB) + bge-reranker-v2-m3 (~2 GB). Subsequent runs are cached/offline.
fastembed default models are much smaller.
⚠️ One process owns writes
The web server and CLI share the same on-disk vector store. ChromaDB does not reflect writes made by another process while a server is running. So:
- Index from the web console (Indexing tab), or
- stop
localbrain-web→ runlocalbrain index→ restart the server.
Don't run localbrain index while localbrain-web is up — the running server won't see the new docs.
Docker (optional, server scenario)
A container only sees mounted volumes, so the "browse & index any local folder" UX is limited —
use Docker to serve a mounted documents folder. GPU works via NVIDIA Container Toolkit (Windows: Docker
Desktop + WSL2). See Dockerfile / docker-compose.yml:
DOCS_DIR=/path/to/docs docker compose up --build # http://localhost:8765 ; add /docs as a source
Architecture
core/ pure library (single-responsibility modules: ingest, embed, rerank, store, search, insights)
services/ orchestration (indexing / search / insights / model)
adapters/ thin entry points: cli · mcp_server · web (all share core via context.py)
License
MIT — see LICENSE. Design notes in docs/spec/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localbrain_rag-0.1.0.tar.gz.
File metadata
- Download URL: localbrain_rag-0.1.0.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d31d3e555eb0dcbde239e2df5f6056f6bd2ec880c23347b56c1c3b98c4143b7
|
|
| MD5 |
18c70a4e91067de73217691fa2f40cd8
|
|
| BLAKE2b-256 |
8c265fcf83129f268d1cd90f8ad1764e3517c547d21f53cb13ab4a8123e71539
|
Provenance
The following attestation bundles were made for localbrain_rag-0.1.0.tar.gz:
Publisher:
release.yml on sinwoo0225/Localbrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localbrain_rag-0.1.0.tar.gz -
Subject digest:
1d31d3e555eb0dcbde239e2df5f6056f6bd2ec880c23347b56c1c3b98c4143b7 - Sigstore transparency entry: 1673958271
- Sigstore integration time:
-
Permalink:
sinwoo0225/Localbrain@7f81484fd40e38dd0c6ab251bb409ad82987083a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sinwoo0225
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f81484fd40e38dd0c6ab251bb409ad82987083a -
Trigger Event:
push
-
Statement type:
File details
Details for the file localbrain_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: localbrain_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9c6ba4ccffb428fd63e3584a293cfa1a0037baf23dc3405efa64bda04497711
|
|
| MD5 |
be14106baa067f93bbbe96e90cf2bcf5
|
|
| BLAKE2b-256 |
979da2b8a34cc5b8c3b1e78516b23b30a556496bab16a2a60b8fa02603d28a89
|
Provenance
The following attestation bundles were made for localbrain_rag-0.1.0-py3-none-any.whl:
Publisher:
release.yml on sinwoo0225/Localbrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localbrain_rag-0.1.0-py3-none-any.whl -
Subject digest:
a9c6ba4ccffb428fd63e3584a293cfa1a0037baf23dc3405efa64bda04497711 - Sigstore transparency entry: 1673958285
- Sigstore integration time:
-
Permalink:
sinwoo0225/Localbrain@7f81484fd40e38dd0c6ab251bb409ad82987083a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sinwoo0225
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f81484fd40e38dd0c6ab251bb409ad82987083a -
Trigger Event:
push
-
Statement type: