Private AI document assistant — local RAG pipeline with web GUI. Zero cloud. Supports local, NFS, SMB and object storage.

These details have not been verified by PyPI

Project links

Project description

ZettaBrain RAG

Chat with your documents using a fully local AI pipeline — no API keys, no cloud, no data leaving your machine.

ZettaBrain — install, setup, ingest, chat

ZettaBrain is a self-hosted RAG (Retrieval-Augmented Generation) assistant. Point it at a folder of documents and ask questions in plain language — through a web GUI or the terminal. It runs entirely on your own hardware using Ollama for inference and ChromaDB for vector storage. Supports PDF, DOCX, TXT, and Markdown.

Quick Install
First-time Setup
Commands
Models
Retrieval Pipeline
System Requirements
Sample Test Data
Configuration
Diagnostics
Uninstall

Quick Install

curl -fsSL https://zettabrain.app/install.sh | sudo bash

The installer detects your OS, installs Python 3.9+, pipx, and Ollama, then pulls the nomic-embed-text embedding model. Supported on Ubuntu, Debian, Amazon Linux, RHEL, Fedora, Rocky Linux, AlmaLinux, macOS, and Windows (WSL2).

Developers — install via pipx:

pipx install zettabrain-rag

First-time Setup

1. Run the setup wizard

sudo zettabrain-setup

Configures your document storage (local disk, NFS, SMB, or S3), selects an LLM matched to your hardware, and enables HTTPS.

2. Launch the web GUI

zettabrain-server

The wizard prints the exact URL at the end of setup:

TLS option	URL
Caddy (Let's Encrypt)	`https://your-domain.com:7860`
Self-signed	`https://<machine-ip>:7860` (accept the one-time browser warning)
HTTP only	`http://<machine-ip>:7860`

3. Or chat in the terminal

zettabrain-chat

Commands

Command	Description
`sudo zettabrain-setup`	Storage wizard, model selection, TLS setup
`zettabrain-server`	Launch the HTTPS web GUI (port 7860)
`zettabrain-chat`	Interactive RAG chat in the terminal
`zettabrain-ingest`	Ingest documents into the vector store
`zettabrain-ingest --folder /path`	Ingest a specific folder
`zettabrain-ingest --file /path/doc.pdf`	Ingest a single file
`zettabrain-ingest --stats`	Show vector store contents
`zettabrain-ingest --clear`	Wipe the vector store
`zettabrain-status`	Show version, cert info, and store stats
`sudo zettabrain-storage add`	Add a storage source after initial setup

Inside zettabrain-chat:

Command	Action
(any question)	Query your documents
`sources`	Show which chunks were retrieved
`timing`	Show retrieve / generate times for this session
`debug on` / `debug off`	Toggle chunk-level debug output
`quit`	Exit

Models

sudo zettabrain-setup detects your hardware and recommends the best model. You can also select any Ollama model from the menu or enter a custom name.

CPU-only

Model	Size	Speed	Best for
`qwen3:0.6b`	~500 MB	Instant	Quick lookups, routing
`gemma3:1b`	~815 MB	Very fast	Structured explanations
`tinyllama:1.1b`	~638 MB	Very fast	Basic Q&A
`phi4-mini` ⭐	~2.5 GB	Moderate	Best RAG reasoning on CPU
`llama3.2:3b`	~2 GB	Moderate	General purpose
`mistral:7b`	~4 GB	Slow	Strong instruction (needs 12 GB+ RAM)
`llama3.1:8b`	~5 GB	Slow	Balanced quality (needs 16 GB+ RAM)

GPU

Model	VRAM	Speed	Best for
`phi4-mini`	~2.5 GB	Fast	Best reasoning per GB
`mistral:7b`	~4 GB	Fast	Strong instruction following
`openhermes`	~4 GB	Fast	Formatted RAG responses
`llama3.1:8b`	~5 GB	Fast	Balanced quality
`mistral-nemo:12b`	~7 GB	Moderate	Better reasoning
`qwen2.5:14b`	~9 GB	Moderate	Excellent quality
`qwen2.5:32b`	~20 GB	Slower	Best quality

Switch model at any time by editing /opt/zettabrain/src/zettabrain.env:

ZETTABRAIN_LLM_MODEL=qwen2.5:14b

Then restart: zettabrain-server

Performance reference — compliance query against a 10-document financial corpus:

Model	RAM	Retrieve	Generate	Total
`qwen3:0.6b` (CPU)	2 GB	~1 s	15–40 s	~1 min
`phi4-mini` (CPU)	6 GB	~1 s	120–300 s	2–5 min
`llama3.2:3b` (CPU)	6 GB	~1 s	90–180 s	2–3 min
`llama3.1:8b` (CPU)	16 GB	~1 s	200–400 s	4–7 min
`mistral:7b` (GPU)	8 GB	~1 s	5–12 s	6–13 s
`llama3.1:8b` (GPU)	10 GB	~1 s	3–7 s	4–8 s
`qwen2.5:14b` (GPU)	20 GB	~1 s	4–10 s	5–11 s
Apple M2/M3 16 GB	16 GB	~1 s	10–20 s	11–21 s

Retrieval Pipeline

ZettaBrain uses a five-stage hybrid retrieval pipeline:

Adaptive chunking — chunk size tuned per document type and text density
MMR semantic search — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
BM25 keyword search — exact-term matching on the same corpus
Merge & deduplicate — semantic results ranked first, duplicates removed by content hash
Cross-encoder re-ranking — FlashRank (ms-marco-MiniLM-L-12-v2) selects the best chunks

Supported formats: .pdf · .docx · .txt · .md

System Requirements

	Minimum	Recommended
RAM	4 GB	8 GB (CPU) · 16 GB+ (GPU)
CPU	4 cores / 2.5 GHz	8 cores / 3.0 GHz
Disk	10 GB free	40 GB free
Python	3.9	3.11+

Supported platforms

Platform	Versions
Ubuntu	20.04, 22.04, 24.04
Debian	11, 12
Amazon Linux	2, 2023
RHEL / CentOS Stream / Rocky / AlmaLinux	8, 9
Fedora	38+
Linux Mint / Pop!_OS	Current releases
macOS	12 Monterey+ (via `pipx install`)
Windows	10 / 11 via WSL2

GPU is optional. Ollama auto-detects NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal).

Sample Test Data

Not ready to use your own documents? Download realistic enterprise datasets to evaluate ZettaBrain immediately.

Dataset	Documents	Organisation
Financial Services	10 DOCX · ~90 KB	Apex Financial Group — trading policy, AML/KYC, insider trading, risk framework
Healthcare	10 DOCX · ~91 KB	Riverside Medical Center — HIPAA, medication protocols, emergency codes
Test prompts guide	40 prompts · ~7 KB	20 per dataset + cross-document + adversarial

curl -LO https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d ~/zettabrain-test
zettabrain-ingest --folder ~/zettabrain-test/financial
zettabrain-chat

Sample prompts (financial)

"What is the pre-clearance process for personal securities trades and how long does approval last?"
"When do I need to file a Suspicious Activity Report and what is the deadline?"
"What is the maximum hotel rate I can expense in New York City?"

Sample prompts (healthcare)

"What should I do if I suspect a PHI breach — who do I contact and what is the timeline?"
"Which medications require an independent double-check before administration?"
"What are the emergency response codes and what action should staff take for each?"

Configuration

All settings can be set via environment variables or /opt/zettabrain/src/zettabrain.env:

Variable	Default	Description
`ZETTABRAIN_DOCS`	`/opt/zettabrain/data`	Documents folder
`ZETTABRAIN_CHROMA`	`/opt/zettabrain/src/zettabrain_vectorstore`	ChromaDB path
`ZETTABRAIN_LLM_MODEL`	`phi4-mini`	Ollama LLM model
`ZETTABRAIN_EMBED_MODEL`	`nomic-embed-text`	Ollama embedding model
`ZETTABRAIN_CHUNK_SIZE`	`1000` (PDF) / `800` (TXT)	Chunk size
`ZETTABRAIN_CHUNK_OVERLAP`	`150` (PDF) / `100` (TXT)	Chunk overlap
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API endpoint

Diagnostics

zettabrain-status                              # version, certs, store stats
curl http://localhost:11434                    # check Ollama is running
ollama list                                    # list downloaded models
journalctl -u zettabrain -f                   # stream server logs (Linux)
tail -f /opt/zettabrain/logs/server.log       # stream server logs (macOS)

Uninstall

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
sudo systemctl disable --now zettabrain 2>/dev/null || true

License

MIT © ZettaBrain

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.31

May 21, 2026

0.5.30

May 20, 2026

0.5.29

May 20, 2026

0.5.28

May 20, 2026

0.5.27

May 20, 2026

0.5.26

May 20, 2026

0.5.25

May 20, 2026

0.5.24

May 19, 2026

0.5.23

May 18, 2026

0.5.22

May 18, 2026

0.5.21

May 12, 2026

0.5.20

May 12, 2026

0.5.19

May 12, 2026

0.5.18

May 8, 2026

0.5.16

May 8, 2026

0.5.15

May 8, 2026

0.5.14

May 8, 2026

0.5.13

May 8, 2026

0.5.12

May 8, 2026

0.5.11

May 8, 2026

0.5.10

May 8, 2026

0.5.9

May 8, 2026

0.5.8

May 8, 2026

0.5.7

May 8, 2026

0.5.6

May 8, 2026

0.5.5

May 7, 2026

0.5.4

May 7, 2026

0.5.2

May 6, 2026

0.5.1

May 6, 2026

0.5.0

May 6, 2026

0.4.9

May 6, 2026

0.4.8

May 5, 2026

0.4.7 yanked

May 5, 2026

Reason this release was yanked: