Skip to main content

Private AI document assistant — local RAG pipeline with web GUI. Zero cloud. Supports local, NFS, SMB and object storage.

Project description

ZettaBrain RAG

Private AI document assistant — your documents, your hardware, zero cloud.

ZettaBrain demo — install, setup, ingest, chat

Chat with your documents using a fully local AI. No API keys. No data leaving your machine. Runs on your own server or laptop with a secure HTTPS web GUI. Supports local disk, NFS, SMB and object storage.


Quick Install

curl -fsSL https://zettabrain.app/install.sh | sudo bash

Alternative mirror:

curl -fsSL https://install.zettabrain.io | sudo bash

What the installer does:

  • Detects your OS (Ubuntu, Debian, Amazon Linux, RHEL, Fedora)
  • Installs Python 3.9+ and system dependencies
  • Installs zettabrain-rag via pipx (isolated, no virtualenv management needed)
  • Installs and starts Ollama
  • Pulls the nomic-embed-text embedding model (~275 MB)

Install via pipx (developers)

# Install pipx if you don't have it
apt install -y pipx          # Ubuntu / Debian
brew install pipx            # macOS

# Install ZettaBrain
pipx install zettabrain-rag

# Verify
zettabrain --version

First-time setup

1. Run setup wizard

sudo zettabrain-setup

Configures storage (Local / NFS / SMB), selects an LLM model based on your hardware, and enables HTTPS.

2. Launch the web GUI

zettabrain-server

Open https://local.zettabrain.app:7860 in your browser — trusted HTTPS, fully private.

3. Or use the CLI chat

zettabrain-chat

Commands

Command Description
sudo zettabrain-setup Storage wizard + model selection + TLS cert
zettabrain-server Launch secure HTTPS web GUI (port 7860)
zettabrain-chat Interactive RAG chat in the terminal
zettabrain-chat --rebuild Rebuild vector store then start chat
zettabrain-chat --debug Show retrieved chunks on every query
zettabrain-ingest Ingest documents into the vector store
zettabrain-ingest --folder /path Ingest a specific folder
zettabrain-ingest --file /path/doc.pdf Ingest a single file
zettabrain-ingest --stats Show what is in the vector store
zettabrain-ingest --clear Wipe the vector store
zettabrain-status Show install paths, cert info, and store statistics
sudo zettabrain-storage add Add a new storage source after initial setup
zettabrain-storage list List configured storage sources

CLI chat commands

While inside zettabrain-chat:

Type Action
Any question Query your documents
sources Show which document chunks were used
timing Show retrieve / generate time for all queries this session
debug on Show retrieved chunks on every query
debug off Hide debug output
quit Exit

System requirements

Minimum Recommended
RAM 4 GB 8 GB (CPU) · 16 GB+ (GPU)
CPU 4 cores / 2.5 GHz 8 cores / 3.0 GHz
Disk 10 GB free 40 GB free
OS See below See below
Python 3.9 3.11+

Supported operating systems

Platform Versions
Ubuntu 20.04, 22.04, 24.04
Debian 11, 12
Amazon Linux 2, 2023
RHEL / CentOS Stream / Rocky / AlmaLinux 8, 9
Fedora 38+
Linux Mint / Pop!_OS Current releases
macOS 12 Monterey+ (via pipx install)
Windows 10 / 11 via WSL2, or pipx install for Python components

RAM depends on model: qwen3:0.6b runs on 2 GB; phi4:3.8b (CPU default) needs ~6 GB; GPU models from mistral:7b upward need 8–24 GB VRAM. See the performance table above for per-model requirements.


GPU & model selection

Ollama auto-detects your GPU on install — NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal). No configuration needed beyond having the correct drivers installed.

sudo zettabrain-setup detects your hardware and shows the right menu for your machine.

CPU-only (no GPU detected):

Hardware detected: CPU only
Recommended model: phi4:3.8b  (CPU-only: best reasoning for RAG without GPU)

  Available models (optimised for CPU):
    1) qwen3:0.6b      — instant  (~500MB)   quick lookups and routing
    2) gemma3:1b       — very fast (~815MB)  structured explanations
    3) tinyllama:1.1b  — very fast (~638MB)  basic Q&A, coherent chat
    4) phi4:3.8b       — moderate (~2.5GB)   best reasoning for RAG    ← recommended
    5) llama3.2:3b     — moderate (~2GB)     general purpose
    6) mistral:7b      — slow     (~4GB)     strong instruction (needs 12GB+ RAM)
    7) llama3.1:8b     — slow     (~5GB)     balanced quality (needs 16GB+ RAM)
    8) openhermes:7b   — slow     (~4GB)     best formatted RAG (needs 12GB+ RAM)
    9) Custom

GPU detected:

Hardware detected: NVIDIA GeForce RTX 3080 (10GB VRAM)
Recommended model: llama3.1:8b  (10GB VRAM: balanced quality/speed)

  Available models:
    1) phi4:3.8b         — fast on GPU    (~2.5GB)  best reasoning per GB
    2) mistral:7b        — fast on GPU    (~4GB)    strong instruction following
    3) openhermes:7b     — fast on GPU    (~4GB)    best formatted RAG responses
    4) llama3.1:8b       — fast on GPU    (~5GB)    balanced quality for most
    5) mistral-nemo:12b  — moderate       (~7GB)    better reasoning  (needs 8GB+ VRAM)
    6) qwen2.5:14b       — moderate       (~9GB)    excellent quality (needs 10GB+ VRAM)
    7) qwen2.5:32b       — slower         (~20GB)   best quality      (needs 24GB+ VRAM)
    8) Custom

You can switch model at any time by editing /opt/zettabrain/src/zettabrain.env:

ZETTABRAIN_LLM_MODEL=qwen2.5:14b

Then restart the server: zettabrain-server

Performance reference

Timings for a real compliance query against a 10-document financial services corpus:

"What is the pre-clearance process for personal securities trades and how long does approval last?"

Model Min RAM Retrieve Generate Total
qwen3:0.6b 2 GB ~1 s 15–40 s ~1 min
phi4-mini 6 GB ~1 s 120–300 s ~2–5 min
llama3.2:3b 6 GB ~1 s 90–180 s ~2–3 min
llama3.1:8b (CPU) 16 GB ~1 s 200–400 s ~4–7 min
mistral:7b (GPU 5–8 GB VRAM) 8 GB ~1 s 5–12 s ~6–13 s
llama3.1:8b (GPU 8–10 GB VRAM) 10 GB ~1 s 3–7 s ~4–8 s
qwen2.5:14b (GPU 16 GB VRAM) 20 GB ~1 s 4–10 s ~5–11 s
Apple M2 / M3 (16 GB unified) 16 GB ~1 s 10–20 s ~11–21 s

Retrieve covers: query embedding + ChromaDB MMR search + BM25 keyword search + FlashRank re-ranking.
Generate depends on model size and hardware. A GPU reduces CPU generate time by 30–60×.

The web UI shows per-query timing after every response: ⚡ 938ms retrieve · 🤖 6.3s generate.


Retrieval pipeline

ZettaBrain uses a hybrid retrieval approach for accuracy:

  1. Adaptive chunking — chunk size tuned per document type (PDF / DOCX / TXT) and text density
  2. MMR semantic search — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
  3. BM25 keyword search — exact term matching on the same corpus
  4. Merge & deduplicate — semantic results ranked first, duplicates removed by content hash
  5. Cross-encoder re-ranking — FlashRank (ms-marco-MiniLM-L-12-v2) picks the best chunks before sending to the LLM

Supported document formats

.pdf .txt .md .docx


Sample Test Data

Not ready to use your own documents yet? Download ready-made test datasets to evaluate ZettaBrain against realistic enterprise content.

Available datasets

Industry Documents Organisation (fictional)
Financial Services 10 DOCX files Apex Financial Group — trading policy, AML/KYC procedures, insider trading, risk framework, employee handbook
Healthcare 10 DOCX files Riverside Medical Center — HIPAA privacy & security, medication protocols, emergency response codes, clinical documentation

Download

File Size Link
Financial Services documents ~90 KB zettabrain-financial-test-docs.zip
Healthcare documents ~91 KB zettabrain-healthcare-test-docs.zip
Test prompts guide (40 prompts) ~7 KB RAG_Test_Prompts_Guide.md

The prompts guide includes 20 industry-specific prompts per dataset, cross-document summary prompts, and adversarial prompts that verify ZettaBrain correctly declines to answer questions not present in the documents.

Quick start with sample data

# Download and unzip the financial services dataset
curl -LO https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d ~/zettabrain-test

# Point ZettaBrain at the folder and ingest
zettabrain-ingest --folder ~/zettabrain-test/financial

# Start chatting
zettabrain-chat

Open the web GUI at https://local.zettabrain.app:7860 and paste prompts from the guide directly into the chat.

Sample prompts from the guide

Financial Services — Apex Financial Group

  • "What is the pre-clearance process for personal securities trades and how long does approval last?"
  • "When do I need to file a Suspicious Activity Report and what is the deadline for filing?"
  • "What is the maximum hotel rate I can expense in New York City?"
  • "What happens when a risk event has a financial impact of over $10 million — who needs to be notified and how quickly?"

Healthcare — Riverside Medical Center

  • "What should I do if I suspect a PHI breach — who do I contact and what is the notification timeline?"
  • "Which medications require an independent double-check by a second nurse before administration?"
  • "A patient received the wrong medication — what are the steps I need to take to report it?"
  • "What are the emergency response codes and what action should staff take for each?"

The full guide includes 20 prompts per dataset plus cross-document and adversarial prompts.


Configuration

All settings can be overridden via environment variables or /opt/zettabrain/src/zettabrain.env:

Variable Default Description
ZETTABRAIN_DOCS /opt/zettabrain/data Documents folder
ZETTABRAIN_CHROMA /opt/zettabrain/src/zettabrain_vectorstore ChromaDB path
ZETTABRAIN_LLM_MODEL llama3.1:8b Ollama LLM model
ZETTABRAIN_EMBED_MODEL nomic-embed-text Ollama embedding model
ZETTABRAIN_CHUNK_SIZE 1000 (PDF) / 800 (TXT) Chunk size (adaptive)
ZETTABRAIN_CHUNK_OVERLAP 150 (PDF) / 100 (TXT) Chunk overlap (adaptive)
OLLAMA_HOST http://localhost:11434 Ollama API endpoint

Diagnostics

# Full status — version, certs, vector store stats
zettabrain-status

# Verify ChromaDB is working
python3 /opt/zettabrain/src/01_chromadb_setup.py

# Verify embedding model is working
python3 /opt/zettabrain/src/02_embeddings_test.py

# Check Ollama is running
curl http://localhost:11434

# List downloaded models
ollama list

# View server logs
journalctl -u zettabrain -f

Uninstall

pipx install

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain

One-line installer

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain /var/log/zettabrain-install.log
sudo systemctl disable --now zettabrain 2>/dev/null || true

Contributors

@zettabrain Creator & maintainer

License

MIT — © ZettaBrain

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zettabrain_rag-0.5.26.tar.gz (67.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zettabrain_rag-0.5.26-py3-none-any.whl (68.8 kB view details)

Uploaded Python 3

File details

Details for the file zettabrain_rag-0.5.26.tar.gz.

File metadata

  • Download URL: zettabrain_rag-0.5.26.tar.gz
  • Upload date:
  • Size: 67.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for zettabrain_rag-0.5.26.tar.gz
Algorithm Hash digest
SHA256 a12c13c8ff2d9101b21354229494d34b775ca43655d33fa2622968e568196c67
MD5 ec1cb2243daa3c93dcb14169ce4d72ea
BLAKE2b-256 0324543aec121f9a61deccaf2e8bc8f55ce5432db41d7321f3b2b3217d19a08b

See more details on using hashes here.

File details

Details for the file zettabrain_rag-0.5.26-py3-none-any.whl.

File metadata

File hashes

Hashes for zettabrain_rag-0.5.26-py3-none-any.whl
Algorithm Hash digest
SHA256 f3180a45b738077276ec6cba8b9c5ec71634e87d845c672919ab306fd08f75cf
MD5 d22cb8199f2cc4451c740ff36d2d808f
BLAKE2b-256 696b612b37e3a6d6c53732620d7f67881b65c5a3f04a99183ab59474c253c394

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page