Skip to main content

Local private RAG pipeline with GUI — LangChain + Ollama + ChromaDB + NFS

Reason this release was yanked:

Security: TLS private key included in package by mistake

Project description

ZettaBrain RAG

Local private RAG pipeline — your documents, your hardware, zero cloud.

Chat with your documents using a fully local AI. No API keys. No data leaving your machine. Runs on your own server or laptop with a secure HTTPS web GUI.


Quick Install

curl -fsSL https://zettabrain.app/install.sh | sudo bash

Alternative mirror:

curl -fsSL https://install.zettabrain.io | sudo bash

What the installer does:

  • Detects your OS (Ubuntu, Debian, Amazon Linux, RHEL, Fedora)
  • Installs Python 3.9+ and system dependencies
  • Installs zettabrain-rag via pipx (isolated, no virtualenv management needed)
  • Installs and starts Ollama
  • Pulls the nomic-embed-text embedding model (~275 MB)

Install via pipx (developers)

# Install pipx if you don't have it
apt install -y pipx          # Ubuntu / Debian
brew install pipx            # macOS

# Install ZettaBrain
pipx install zettabrain-rag

# Verify
zettabrain --version

First-time setup

1. Run setup wizard

sudo zettabrain-setup

Configures storage (Local / NFS / SMB), selects an LLM model based on your hardware, and enables HTTPS.

2. Launch the web GUI

zettabrain-server

Open https://local.zettabrain.app:7860 in your browser — trusted HTTPS, fully private.

3. Or use the CLI chat

zettabrain-chat

Commands

Command Description
sudo zettabrain-setup Storage wizard + model selection + TLS cert
zettabrain-server Launch secure HTTPS web GUI (port 7860)
zettabrain-chat Interactive RAG chat in the terminal
zettabrain-chat --rebuild Rebuild vector store then start chat
zettabrain-chat --debug Show retrieved chunks on every query
zettabrain-ingest Ingest documents into the vector store
zettabrain-ingest --folder /path Ingest a specific folder
zettabrain-ingest --file /path/doc.pdf Ingest a single file
zettabrain-ingest --stats Show what is in the vector store
zettabrain-ingest --clear Wipe the vector store
zettabrain-status Show install paths, cert info, and store statistics
sudo zettabrain-storage add Add a new storage source after initial setup
zettabrain-storage list List configured storage sources

CLI chat commands

While inside zettabrain-chat:

Type Action
Any question Query your documents
sources Show which document chunks were used
timing Show retrieve / generate time for all queries this session
debug on Show retrieved chunks on every query
debug off Hide debug output
quit Exit

System requirements

Minimum Recommended
RAM 8 GB 16 GB
CPU 4 cores / 2.5 GHz 8 cores / 3.0 GHz
Disk 20 GB free 50 GB free
OS Ubuntu 22.04 / Debian 12 Ubuntu 22.04 LTS
Python 3.9 3.11+

Why 8 GB minimum: llama3.1:8b (Q4) needs ~5 GB in RAM, plus ~2 GB for OS + Python + ChromaDB. Below 8 GB you will hit swap and responses can take 5+ minutes.


GPU & model selection

Ollama auto-detects your GPU on install — NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal). No configuration needed beyond having the correct drivers installed.

sudo zettabrain-setup detects your hardware and presents a model menu:

Hardware detected: NVIDIA GeForce RTX 3080 (10GB VRAM)
Recommended model: llama3.1:8b  (10GB VRAM detected: balanced quality/speed)

  Available models:
    1) llama3.2:3b    — fastest (~2GB)        good for quick Q&A
    2) llama3.1:8b    — balanced (~5GB)       recommended for most   ← default
    3) mistral:7b     — fast (~4GB)           strong reasoning
    4) llama3.1:13b   — better (~8GB)         needs 12GB+ VRAM/RAM
    5) qwen2.5:14b    — excellent (~9GB)      needs 16GB+ VRAM/RAM
    6) qwen2.5:32b    — best quality (~20GB)  needs 24GB+ VRAM/RAM
    7) Custom

You can also switch model at any time by editing /opt/zettabrain/src/zettabrain.env:

ZETTABRAIN_LLM_MODEL=qwen2.5:14b

Then restart the server: zettabrain-server

Performance reference

Approximate response time for a 300-token answer ("What is cloud computing?"):

Hardware Model Tokens/sec Response time
4-core CPU, 8 GB RAM llama3.2:3b 8–15 t/s 20–40 s
8-core CPU, 16 GB RAM llama3.1:8b 5–12 t/s 25–60 s
NVIDIA RTX 3060 (8 GB) llama3.1:8b 60–90 t/s 3–5 s
NVIDIA RTX 3080 (10 GB) llama3.1:8b 80–120 t/s 2–4 s
Apple M2 (16 GB) llama3.1:8b 30–50 t/s 6–10 s

The web UI and CLI both show per-query timing: retrieve time, generate time, and delta vs previous query.


Retrieval pipeline

ZettaBrain uses a hybrid retrieval approach for accuracy:

  1. Adaptive chunking — chunk size tuned per document type (PDF / DOCX / TXT) and text density
  2. MMR semantic search — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
  3. BM25 keyword search — exact term matching on the same corpus
  4. Merge & deduplicate — semantic results ranked first, duplicates removed by content hash
  5. Cross-encoder re-ranking — FlashRank (ms-marco-MiniLM-L-12-v2) picks the best chunks before sending to the LLM

Supported document formats

.pdf .txt .md .docx


Configuration

All settings can be overridden via environment variables or /opt/zettabrain/src/zettabrain.env:

Variable Default Description
ZETTABRAIN_DOCS /opt/zettabrain/data Documents folder
ZETTABRAIN_CHROMA /opt/zettabrain/src/zettabrain_vectorstore ChromaDB path
ZETTABRAIN_LLM_MODEL llama3.1:8b Ollama LLM model
ZETTABRAIN_EMBED_MODEL nomic-embed-text Ollama embedding model
ZETTABRAIN_CHUNK_SIZE 1000 (PDF) / 800 (TXT) Chunk size (adaptive)
ZETTABRAIN_CHUNK_OVERLAP 150 (PDF) / 100 (TXT) Chunk overlap (adaptive)
OLLAMA_HOST http://localhost:11434 Ollama API endpoint

Diagnostics

# Full status — version, certs, vector store stats
zettabrain-status

# Verify ChromaDB is working
python3 /opt/zettabrain/src/01_chromadb_setup.py

# Verify embedding model is working
python3 /opt/zettabrain/src/02_embeddings_test.py

# Check Ollama is running
curl http://localhost:11434

# List downloaded models
ollama list

# View server logs
journalctl -u zettabrain -f

Uninstall

pipx install

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain

One-line installer

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain /var/log/zettabrain-install.log
sudo systemctl disable --now zettabrain 2>/dev/null || true

Contributors

@olajide-zettabrain Creator & maintainer

License

MIT — © ZettaBrain

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zettabrain_rag-0.4.6.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zettabrain_rag-0.4.6-py3-none-any.whl (57.5 kB view details)

Uploaded Python 3

File details

Details for the file zettabrain_rag-0.4.6.tar.gz.

File metadata

  • Download URL: zettabrain_rag-0.4.6.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for zettabrain_rag-0.4.6.tar.gz
Algorithm Hash digest
SHA256 3999f7cc1fc2d9223075f6530eb114ebfff27c1a52e2254ff6b70d89f98d8f5f
MD5 8b3cdab6ca322c38da0e4c68e79edab6
BLAKE2b-256 934ab06940b84a0b2f8a066a94708a95f4e95efcfd9c67ad9f5089a9b7bc6120

See more details on using hashes here.

File details

Details for the file zettabrain_rag-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: zettabrain_rag-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 57.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for zettabrain_rag-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 630c8d08a1226d9d30dc7f062f4cdadfdb9b48638de72856849d50a21300798b
MD5 9ee51713ae164963152cc33aea6673e1
BLAKE2b-256 c5f81e72fe25b1bee7341f00385e87dc3a1a9b05e85d569cf2906edd08897f86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page