Private AI document assistant — local RAG pipeline with web GUI. Zero cloud. Supports local, NFS, SMB and object storage.
Project description
ZettaBrain RAG
Private AI document assistant — your documents, your hardware, zero cloud.
Chat with your documents using a fully local AI. No API keys. No data leaving your machine. Runs on your own server or laptop with a secure HTTPS web GUI. Supports local disk, NFS, SMB and object storage.
Quick Install
curl -fsSL https://zettabrain.app/install.sh | sudo bash
Alternative mirror:
curl -fsSL https://install.zettabrain.io | sudo bash
What the installer does:
- Detects your OS (Ubuntu, Debian, Amazon Linux, RHEL, Fedora)
- Installs Python 3.9+ and system dependencies
- Installs
zettabrain-ragvia pipx (isolated, no virtualenv management needed) - Installs and starts Ollama
- Pulls the
nomic-embed-textembedding model (~275 MB)
Install via pipx (developers)
# Install pipx if you don't have it
apt install -y pipx # Ubuntu / Debian
brew install pipx # macOS
# Install ZettaBrain
pipx install zettabrain-rag
# Verify
zettabrain --version
First-time setup
1. Run setup wizard
sudo zettabrain-setup
Configures storage (Local / NFS / SMB), selects an LLM model based on your hardware, and enables HTTPS.
2. Launch the web GUI
zettabrain-server
Open https://local.zettabrain.app:7860 in your browser — trusted HTTPS, fully private.
3. Or use the CLI chat
zettabrain-chat
Commands
| Command | Description |
|---|---|
sudo zettabrain-setup |
Storage wizard + model selection + TLS cert |
zettabrain-server |
Launch secure HTTPS web GUI (port 7860) |
zettabrain-chat |
Interactive RAG chat in the terminal |
zettabrain-chat --rebuild |
Rebuild vector store then start chat |
zettabrain-chat --debug |
Show retrieved chunks on every query |
zettabrain-ingest |
Ingest documents into the vector store |
zettabrain-ingest --folder /path |
Ingest a specific folder |
zettabrain-ingest --file /path/doc.pdf |
Ingest a single file |
zettabrain-ingest --stats |
Show what is in the vector store |
zettabrain-ingest --clear |
Wipe the vector store |
zettabrain-status |
Show install paths, cert info, and store statistics |
sudo zettabrain-storage add |
Add a new storage source after initial setup |
zettabrain-storage list |
List configured storage sources |
CLI chat commands
While inside zettabrain-chat:
| Type | Action |
|---|---|
| Any question | Query your documents |
sources |
Show which document chunks were used |
timing |
Show retrieve / generate time for all queries this session |
debug on |
Show retrieved chunks on every query |
debug off |
Hide debug output |
quit |
Exit |
System requirements
| Minimum | Recommended | |
|---|---|---|
| RAM | 4 GB | 8 GB (CPU) · 16 GB+ (GPU) |
| CPU | 4 cores / 2.5 GHz | 8 cores / 3.0 GHz |
| Disk | 10 GB free | 40 GB free |
| OS | See below | See below |
| Python | 3.9 | 3.11+ |
Supported operating systems
| Platform | Versions |
|---|---|
| Ubuntu | 20.04, 22.04, 24.04 |
| Debian | 11, 12 |
| Amazon Linux | 2, 2023 |
| RHEL / CentOS Stream / Rocky / AlmaLinux | 8, 9 |
| Fedora | 38+ |
| Linux Mint / Pop!_OS | Current releases |
| macOS | 12 Monterey+ (via pipx install) |
| Windows | 10 / 11 via WSL2, or pipx install for Python components |
RAM depends on model:
qwen3:0.6bruns on 2 GB;phi4:3.8b(CPU default) needs ~6 GB; GPU models frommistral:7bupward need 8–24 GB VRAM. See the performance table above for per-model requirements.
GPU & model selection
Ollama auto-detects your GPU on install — NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal). No configuration needed beyond having the correct drivers installed.
sudo zettabrain-setup detects your hardware and shows the right menu for your machine.
CPU-only (no GPU detected):
Hardware detected: CPU only
Recommended model: phi4:3.8b (CPU-only: best reasoning for RAG without GPU)
Available models (optimised for CPU):
1) qwen3:0.6b — instant (~500MB) quick lookups and routing
2) gemma3:1b — very fast (~815MB) structured explanations
3) tinyllama:1.1b — very fast (~638MB) basic Q&A, coherent chat
4) phi4:3.8b — moderate (~2.5GB) best reasoning for RAG ← recommended
5) llama3.2:3b — moderate (~2GB) general purpose
6) mistral:7b — slow (~4GB) strong instruction (needs 12GB+ RAM)
7) llama3.1:8b — slow (~5GB) balanced quality (needs 16GB+ RAM)
8) openhermes:7b — slow (~4GB) best formatted RAG (needs 12GB+ RAM)
9) Custom
GPU detected:
Hardware detected: NVIDIA GeForce RTX 3080 (10GB VRAM)
Recommended model: llama3.1:8b (10GB VRAM: balanced quality/speed)
Available models:
1) phi4:3.8b — fast on GPU (~2.5GB) best reasoning per GB
2) mistral:7b — fast on GPU (~4GB) strong instruction following
3) openhermes:7b — fast on GPU (~4GB) best formatted RAG responses
4) llama3.1:8b — fast on GPU (~5GB) balanced quality for most
5) mistral-nemo:12b — moderate (~7GB) better reasoning (needs 8GB+ VRAM)
6) qwen2.5:14b — moderate (~9GB) excellent quality (needs 10GB+ VRAM)
7) qwen2.5:32b — slower (~20GB) best quality (needs 24GB+ VRAM)
8) Custom
You can switch model at any time by editing /opt/zettabrain/src/zettabrain.env:
ZETTABRAIN_LLM_MODEL=qwen2.5:14b
Then restart the server: zettabrain-server
Performance reference
Timings for a real compliance query against a 10-document financial services corpus:
"What is the pre-clearance process for personal securities trades and how long does approval last?"
| Model | Min RAM | Retrieve | Generate | Total |
|---|---|---|---|---|
| qwen3:0.6b | 2 GB | ~1 s | 15–40 s | ~1 min |
| phi4-mini | 6 GB | ~1 s | 120–300 s | ~2–5 min |
| llama3.2:3b | 6 GB | ~1 s | 90–180 s | ~2–3 min |
| llama3.1:8b (CPU) | 16 GB | ~1 s | 200–400 s | ~4–7 min |
| mistral:7b (GPU 5–8 GB VRAM) | 8 GB | ~1 s | 5–12 s | ~6–13 s |
| llama3.1:8b (GPU 8–10 GB VRAM) | 10 GB | ~1 s | 3–7 s | ~4–8 s |
| qwen2.5:14b (GPU 16 GB VRAM) | 20 GB | ~1 s | 4–10 s | ~5–11 s |
| Apple M2 / M3 (16 GB unified) | 16 GB | ~1 s | 10–20 s | ~11–21 s |
Retrieve covers: query embedding + ChromaDB MMR search + BM25 keyword search + FlashRank re-ranking.
Generate depends on model size and hardware. A GPU reduces CPU generate time by 30–60×.
The web UI shows per-query timing after every response: ⚡ 938ms retrieve · 🤖 6.3s generate.
Retrieval pipeline
ZettaBrain uses a hybrid retrieval approach for accuracy:
- Adaptive chunking — chunk size tuned per document type (PDF / DOCX / TXT) and text density
- MMR semantic search — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
- BM25 keyword search — exact term matching on the same corpus
- Merge & deduplicate — semantic results ranked first, duplicates removed by content hash
- Cross-encoder re-ranking — FlashRank (
ms-marco-MiniLM-L-12-v2) picks the best chunks before sending to the LLM
Supported document formats
.pdf .txt .md .docx
Sample Test Data
Not ready to use your own documents yet? Download ready-made test datasets to evaluate ZettaBrain against realistic enterprise content.
Available datasets
| Industry | Documents | Organisation (fictional) |
|---|---|---|
| Financial Services | 10 DOCX files | Apex Financial Group — trading policy, AML/KYC procedures, insider trading, risk framework, employee handbook |
| Healthcare | 10 DOCX files | Riverside Medical Center — HIPAA privacy & security, medication protocols, emergency response codes, clinical documentation |
Download
| File | Size | Link |
|---|---|---|
| Financial Services documents | ~90 KB | zettabrain-financial-test-docs.zip |
| Healthcare documents | ~91 KB | zettabrain-healthcare-test-docs.zip |
| Test prompts guide (40 prompts) | ~7 KB | RAG_Test_Prompts_Guide.md |
The prompts guide includes 20 industry-specific prompts per dataset, cross-document summary prompts, and adversarial prompts that verify ZettaBrain correctly declines to answer questions not present in the documents.
Quick start with sample data
# Download and unzip the financial services dataset
curl -LO https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d ~/zettabrain-test
# Point ZettaBrain at the folder and ingest
zettabrain-ingest --folder ~/zettabrain-test/financial
# Start chatting
zettabrain-chat
Open the web GUI at https://local.zettabrain.app:7860 and paste prompts from the guide directly into the chat.
Sample prompts from the guide
Financial Services — Apex Financial Group
- "What is the pre-clearance process for personal securities trades and how long does approval last?"
- "When do I need to file a Suspicious Activity Report and what is the deadline for filing?"
- "What is the maximum hotel rate I can expense in New York City?"
- "What happens when a risk event has a financial impact of over $10 million — who needs to be notified and how quickly?"
Healthcare — Riverside Medical Center
- "What should I do if I suspect a PHI breach — who do I contact and what is the notification timeline?"
- "Which medications require an independent double-check by a second nurse before administration?"
- "A patient received the wrong medication — what are the steps I need to take to report it?"
- "What are the emergency response codes and what action should staff take for each?"
The full guide includes 20 prompts per dataset plus cross-document and adversarial prompts.
Configuration
All settings can be overridden via environment variables or /opt/zettabrain/src/zettabrain.env:
| Variable | Default | Description |
|---|---|---|
ZETTABRAIN_DOCS |
/opt/zettabrain/data |
Documents folder |
ZETTABRAIN_CHROMA |
/opt/zettabrain/src/zettabrain_vectorstore |
ChromaDB path |
ZETTABRAIN_LLM_MODEL |
llama3.1:8b |
Ollama LLM model |
ZETTABRAIN_EMBED_MODEL |
nomic-embed-text |
Ollama embedding model |
ZETTABRAIN_CHUNK_SIZE |
1000 (PDF) / 800 (TXT) |
Chunk size (adaptive) |
ZETTABRAIN_CHUNK_OVERLAP |
150 (PDF) / 100 (TXT) |
Chunk overlap (adaptive) |
OLLAMA_HOST |
http://localhost:11434 |
Ollama API endpoint |
Diagnostics
# Full status — version, certs, vector store stats
zettabrain-status
# Verify ChromaDB is working
python3 /opt/zettabrain/src/01_chromadb_setup.py
# Verify embedding model is working
python3 /opt/zettabrain/src/02_embeddings_test.py
# Check Ollama is running
curl http://localhost:11434
# List downloaded models
ollama list
# View server logs
journalctl -u zettabrain -f
Uninstall
pipx install
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
One-line installer
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain /var/log/zettabrain-install.log
sudo systemctl disable --now zettabrain 2>/dev/null || true
Contributors
| @zettabrain | Creator & maintainer |
License
MIT — © ZettaBrain
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zettabrain_rag-0.5.26.tar.gz.
File metadata
- Download URL: zettabrain_rag-0.5.26.tar.gz
- Upload date:
- Size: 67.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a12c13c8ff2d9101b21354229494d34b775ca43655d33fa2622968e568196c67
|
|
| MD5 |
ec1cb2243daa3c93dcb14169ce4d72ea
|
|
| BLAKE2b-256 |
0324543aec121f9a61deccaf2e8bc8f55ce5432db41d7321f3b2b3217d19a08b
|
File details
Details for the file zettabrain_rag-0.5.26-py3-none-any.whl.
File metadata
- Download URL: zettabrain_rag-0.5.26-py3-none-any.whl
- Upload date:
- Size: 68.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3180a45b738077276ec6cba8b9c5ec71634e87d845c672919ab306fd08f75cf
|
|
| MD5 |
d22cb8199f2cc4451c740ff36d2d808f
|
|
| BLAKE2b-256 |
696b612b37e3a6d6c53732620d7f67881b65c5a3f04a99183ab59474c253c394
|