Private AI document assistant — local RAG pipeline with web GUI. Zero cloud. Supports local, NFS, SMB and object storage.
Project description
ZettaBrain RAG
Chat with your documents using a fully local AI pipeline — no API keys, no cloud, no data leaving your machine.
ZettaBrain is a self-hosted RAG (Retrieval-Augmented Generation) assistant. Point it at a folder of documents and ask questions in plain language — through a web GUI or the terminal. It runs entirely on your own hardware using Ollama for inference and ChromaDB for vector storage. Supports PDF, DOCX, TXT, and Markdown.
Contents
- Quick Install
- First-time Setup
- Commands
- Models
- Retrieval Pipeline
- System Requirements
- Sample Test Data
- Configuration
- Diagnostics
- Uninstall
Quick Install
curl -fsSL https://zettabrain.app/install.sh | sudo bash
The installer detects your OS, installs Python 3.9+, pipx, and Ollama, then pulls the nomic-embed-text embedding model. Supported on Ubuntu, Debian, Amazon Linux, RHEL, Fedora, Rocky Linux, AlmaLinux, macOS, and Windows (WSL2).
Developers — install via pipx:
pipx install zettabrain-rag
First-time Setup
1. Run the setup wizard
sudo zettabrain-setup
Configures your document storage (local disk, NFS, SMB, or S3), selects an LLM matched to your hardware, and enables HTTPS.
2. Launch the web GUI
zettabrain-server
The wizard prints the exact URL at the end of setup:
| TLS option | URL |
|---|---|
| Caddy (Let's Encrypt) | https://your-domain.com:7860 |
| Self-signed | https://<machine-ip>:7860 (accept the one-time browser warning) |
| HTTP only | http://<machine-ip>:7860 |
3. Or chat in the terminal
zettabrain-chat
Commands
| Command | Description |
|---|---|
sudo zettabrain-setup |
Storage wizard, model selection, TLS setup |
zettabrain-server |
Launch the HTTPS web GUI (port 7860) |
zettabrain-chat |
Interactive RAG chat in the terminal |
zettabrain-ingest |
Ingest documents into the vector store |
zettabrain-ingest --folder /path |
Ingest a specific folder |
zettabrain-ingest --file /path/doc.pdf |
Ingest a single file |
zettabrain-ingest --stats |
Show vector store contents |
zettabrain-ingest --clear |
Wipe the vector store |
zettabrain-status |
Show version, cert info, and store stats |
sudo zettabrain-storage add |
Add a storage source after initial setup |
Inside zettabrain-chat:
| Command | Action |
|---|---|
| (any question) | Query your documents |
sources |
Show which chunks were retrieved |
timing |
Show retrieve / generate times for this session |
debug on / debug off |
Toggle chunk-level debug output |
quit |
Exit |
Models
sudo zettabrain-setup detects your hardware and recommends the best model. You can also select any Ollama model from the menu or enter a custom name.
CPU-only
| Model | Size | Speed | Best for |
|---|---|---|---|
qwen3:0.6b |
~500 MB | Instant | Quick lookups, routing |
gemma3:1b |
~815 MB | Very fast | Structured explanations |
tinyllama:1.1b |
~638 MB | Very fast | Basic Q&A |
phi4-mini ⭐ |
~2.5 GB | Moderate | Best RAG reasoning on CPU |
llama3.2:3b |
~2 GB | Moderate | General purpose |
mistral:7b |
~4 GB | Slow | Strong instruction (needs 12 GB+ RAM) |
llama3.1:8b |
~5 GB | Slow | Balanced quality (needs 16 GB+ RAM) |
GPU
| Model | VRAM | Speed | Best for |
|---|---|---|---|
phi4-mini |
~2.5 GB | Fast | Best reasoning per GB |
mistral:7b |
~4 GB | Fast | Strong instruction following |
openhermes |
~4 GB | Fast | Formatted RAG responses |
llama3.1:8b |
~5 GB | Fast | Balanced quality |
mistral-nemo:12b |
~7 GB | Moderate | Better reasoning |
qwen2.5:14b |
~9 GB | Moderate | Excellent quality |
qwen2.5:32b |
~20 GB | Slower | Best quality |
Switch model at any time by editing /opt/zettabrain/src/zettabrain.env:
ZETTABRAIN_LLM_MODEL=qwen2.5:14b
Then restart: zettabrain-server
Performance reference — compliance query against a 10-document financial corpus:
| Model | RAM | Retrieve | Generate | Total |
|---|---|---|---|---|
qwen3:0.6b (CPU) |
2 GB | ~1 s | 15–40 s | ~1 min |
phi4-mini (CPU) |
6 GB | ~1 s | 120–300 s | 2–5 min |
llama3.2:3b (CPU) |
6 GB | ~1 s | 90–180 s | 2–3 min |
llama3.1:8b (CPU) |
16 GB | ~1 s | 200–400 s | 4–7 min |
mistral:7b (GPU) |
8 GB | ~1 s | 5–12 s | 6–13 s |
llama3.1:8b (GPU) |
10 GB | ~1 s | 3–7 s | 4–8 s |
qwen2.5:14b (GPU) |
20 GB | ~1 s | 4–10 s | 5–11 s |
| Apple M2/M3 16 GB | 16 GB | ~1 s | 10–20 s | 11–21 s |
Retrieval Pipeline
ZettaBrain uses a five-stage hybrid retrieval pipeline:
- Adaptive chunking — chunk size tuned per document type and text density
- MMR semantic search — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
- BM25 keyword search — exact-term matching on the same corpus
- Merge & deduplicate — semantic results ranked first, duplicates removed by content hash
- Cross-encoder re-ranking — FlashRank (
ms-marco-MiniLM-L-12-v2) selects the best chunks
Supported formats: .pdf · .docx · .txt · .md
System Requirements
| Minimum | Recommended | |
|---|---|---|
| RAM | 4 GB | 8 GB (CPU) · 16 GB+ (GPU) |
| CPU | 4 cores / 2.5 GHz | 8 cores / 3.0 GHz |
| Disk | 10 GB free | 40 GB free |
| Python | 3.9 | 3.11+ |
Supported platforms
| Platform | Versions |
|---|---|
| Ubuntu | 20.04, 22.04, 24.04 |
| Debian | 11, 12 |
| Amazon Linux | 2, 2023 |
| RHEL / CentOS Stream / Rocky / AlmaLinux | 8, 9 |
| Fedora | 38+ |
| Linux Mint / Pop!_OS | Current releases |
| macOS | 12 Monterey+ (via pipx install) |
| Windows | 10 / 11 via WSL2 |
GPU is optional. Ollama auto-detects NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal).
Sample Test Data
Not ready to use your own documents? Download realistic enterprise datasets to evaluate ZettaBrain immediately.
| Dataset | Documents | Organisation |
|---|---|---|
| Financial Services | 10 DOCX · ~90 KB | Apex Financial Group — trading policy, AML/KYC, insider trading, risk framework |
| Healthcare | 10 DOCX · ~91 KB | Riverside Medical Center — HIPAA, medication protocols, emergency codes |
| Test prompts guide | 40 prompts · ~7 KB | 20 per dataset + cross-document + adversarial |
curl -LO https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d ~/zettabrain-test
zettabrain-ingest --folder ~/zettabrain-test/financial
zettabrain-chat
Sample prompts (financial)
- "What is the pre-clearance process for personal securities trades and how long does approval last?"
- "When do I need to file a Suspicious Activity Report and what is the deadline?"
- "What is the maximum hotel rate I can expense in New York City?"
Sample prompts (healthcare)
- "What should I do if I suspect a PHI breach — who do I contact and what is the timeline?"
- "Which medications require an independent double-check before administration?"
- "What are the emergency response codes and what action should staff take for each?"
Configuration
All settings can be set via environment variables or /opt/zettabrain/src/zettabrain.env:
| Variable | Default | Description |
|---|---|---|
ZETTABRAIN_DOCS |
/opt/zettabrain/data |
Documents folder |
ZETTABRAIN_CHROMA |
/opt/zettabrain/src/zettabrain_vectorstore |
ChromaDB path |
ZETTABRAIN_LLM_MODEL |
phi4-mini |
Ollama LLM model |
ZETTABRAIN_EMBED_MODEL |
nomic-embed-text |
Ollama embedding model |
ZETTABRAIN_CHUNK_SIZE |
1000 (PDF) / 800 (TXT) |
Chunk size |
ZETTABRAIN_CHUNK_OVERLAP |
150 (PDF) / 100 (TXT) |
Chunk overlap |
OLLAMA_HOST |
http://localhost:11434 |
Ollama API endpoint |
Diagnostics
zettabrain-status # version, certs, store stats
curl http://localhost:11434 # check Ollama is running
ollama list # list downloaded models
journalctl -u zettabrain -f # stream server logs (Linux)
tail -f /opt/zettabrain/logs/server.log # stream server logs (macOS)
Uninstall
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
sudo systemctl disable --now zettabrain 2>/dev/null || true
License
MIT © ZettaBrain
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zettabrain_rag-0.5.31.tar.gz.
File metadata
- Download URL: zettabrain_rag-0.5.31.tar.gz
- Upload date:
- Size: 68.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb21d110a53568b79b7b2e8f4a878090b7c2b39190895e4542a4cae699eff663
|
|
| MD5 |
ed8f5fcffaf2754ca2e5b9d0bab3a577
|
|
| BLAKE2b-256 |
5f1d71163109306e0cecfd8951900e44b83a9f04e837395e0cffebbd807119ab
|
File details
Details for the file zettabrain_rag-0.5.31-py3-none-any.whl.
File metadata
- Download URL: zettabrain_rag-0.5.31-py3-none-any.whl
- Upload date:
- Size: 70.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d23e9cdb66988b4576fb18ca94e830de0226e01996eba442270104c3441272d
|
|
| MD5 |
c2ba2e1e59e30256febde0de3185d967
|
|
| BLAKE2b-256 |
48fd9d6824de5705e0fe7e3c4bffccaea8fbb1e7b339e87d3cd4829b5e88aab5
|