Skip to main content

Local-first general-purpose RAG with an MCP server, CLI, and web console

Project description

localbrain

Local-first general-purpose RAG — point it at folders/files, index them, and search by meaning through an MCP server (for Claude Code etc.), a CLI, and a web console. Everything runs on your machine; generation is done by your MCP client (e.g. Claude), so localbrain only needs a small embedding model — no local LLM, no Ollama daemon required.

  • 🔎 Semantic search + Cross-Encoder reranking
  • 🧩 MCP tools (search, add_path, reindex, query_insights, …)
  • 🖥️ Web console: source management · manual indexing (live progress) · search test · model swap
  • ♻️ Incremental indexing (only changed files), swappable embedding model
  • 📈 Query clustering insights (FAQs & knowledge gaps) — a self-improving loop
  • 🔒 Fully local; pluggable providers (fastembed ONNX / sentence-transformers / Ollama)

Install

Installed as localbrain-rag on PyPI; the command and import stay localbrain.

Default (CPU, no extra setup)

pip install localbrain-rag

Uses fastembed (ONNX, multilingual e5) — works on CPU with no PyTorch. Good enough to start.

Best quality (GPU + bge-m3) — recommended

  1. Install a CUDA build of PyTorch matching your GPU (example: CUDA 12.6):
    pip install torch --index-url https://download.pytorch.org/whl/cu126
    
  2. Install localbrain with sentence-transformers:
    pip install "localbrain-rag[st]"
    
  3. Point the config at bge-m3 (see Configuration). Models auto-download on first use.

No NVIDIA GPU? Skip step 1 — pip install "localbrain-rag[st]" installs a CPU PyTorch and still works (slower).

Quick start

# CLI
localbrain add-source "C:\Users\me\notes" --globs "*.md,*.txt"
localbrain index
localbrain search "what did we decide about delivery delays"
localbrain insights          # FAQ clusters + knowledge gaps
localbrain stats
localbrain --version

# Web console  →  http://127.0.0.1:8765
localbrain-web

# MCP server (stdio) — register with Claude Code
localbrain-mcp

Configuration

Config lives at ~/.localbrain/config.json (override the dir with LOCALBRAIN_HOME). Data (SQLite + Chroma vectors + model-by-model collections) also lives under ~/.localbrain.

{
  "embedding": { "provider": "sentence-transformers", "model": "BAAI/bge-m3", "fp16": false },
  "chunk": { "size": 1000, "overlap": 150 },
  "rerank": { "enabled": true, "provider": "cross-encoder",
              "model": "BAAI/bge-reranker-v2-m3", "candidate_k": 30, "fp16": false },
  "search_k": 5
}
  • Swap models freely — change embedding.model, then localbrain index --rebuild (text is kept, so it re-embeds without re-reading files). Each model uses its own vector collection (cosine distance).
  • fp16: true halves VRAM and speeds up inference on GPU (ignored on CPU). Handy for ~6 GB cards.
  • Reranking improves accuracy; scores become Cross-Encoder relevance (≈0.8+ strong match, ≈0 none).

Models & first run

First search/index downloads models from Hugging Face into the HF cache (HF_HOME): bge-m3 (~2 GB) + bge-reranker-v2-m3 (~2 GB). Subsequent runs are cached/offline. fastembed default models are much smaller.

⚠️ One process owns writes

The web server and CLI share the same on-disk vector store. ChromaDB does not reflect writes made by another process while a server is running. So:

  • Index from the web console (Indexing tab), or
  • stop localbrain-web → run localbrain index → restart the server.

Don't run localbrain index while localbrain-web is up — the running server won't see the new docs.

Docker (optional, server scenario)

A container only sees mounted volumes, so the "browse & index any local folder" UX is limited — use Docker to serve a mounted documents folder. GPU works via NVIDIA Container Toolkit (Windows: Docker Desktop + WSL2). See Dockerfile / docker-compose.yml:

DOCS_DIR=/path/to/docs docker compose up --build   # http://localhost:8765 ; add /docs as a source

Architecture

core/        pure library (single-responsibility modules: ingest, embed, rerank, store, search, insights)
services/    orchestration (indexing / search / insights / model)
adapters/    thin entry points: cli · mcp_server · web   (all share core via context.py)

License

MIT — see LICENSE. Design notes in docs/spec/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localbrain_rag-0.1.1.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localbrain_rag-0.1.1-py3-none-any.whl (41.1 kB view details)

Uploaded Python 3

File details

Details for the file localbrain_rag-0.1.1.tar.gz.

File metadata

  • Download URL: localbrain_rag-0.1.1.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localbrain_rag-0.1.1.tar.gz
Algorithm Hash digest
SHA256 07b1336d9e04094dfa34e18081110631519f82dfb85bdabb8de15027d4786b9a
MD5 38cfc9e6302b6d7db8e75828f6446930
BLAKE2b-256 3b1862a3f356d16163fc82f699c71bb7c9f5a68b46fd33a2321149f523677b1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for localbrain_rag-0.1.1.tar.gz:

Publisher: release.yml on sinwoo0225/Localbrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file localbrain_rag-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: localbrain_rag-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 41.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localbrain_rag-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90ca2d3cd5591514df17d11fb0a83b1edec4b3aaa9c6d02c2f51411979a327e9
MD5 c2642579ecab667b2dc137f8fc5282e1
BLAKE2b-256 3b27396527f4636a4b001ba7e2b192328c75a70004693029abe45d61cb0b06ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for localbrain_rag-0.1.1-py3-none-any.whl:

Publisher: release.yml on sinwoo0225/Localbrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page