Local-first second brain RAG system — chat with your own data
Project description
MindVault
A local-first second brain that turns your AI conversation exports, Obsidian notes, and documents into a searchable, conversational memory system.
Everything runs on your machine. No data leaves.
What it does
- Ingests Claude, ChatGPT, and other AI conversation exports, Obsidian vaults, PDFs, and plain text files
- Indexes content into a multi-layer memory system (raw → compressed → structured → linked)
- Remembers entities, decisions, and goals extracted from every chat
- Retrieves using hybrid scoring — summaries first, raw text only when needed
- Chats interactively with six reasoning modes powered by a council of AI voices
- Searches the web automatically when your memory doesn't have a confident answer
- Saves sessions — resume any previous conversation exactly where you left off
Quick start
Prerequisites
- Python 3.11+
- Ollama with two models:
ollama pull nomic-embed-text # vector search
ollama pull llama3.2 # chat and summarization
ollama serve # start Ollama if not running
Install
git clone https://github.com/calebthecm/MindVault
cd MindVault
python -m venv .venv && source .venv/bin/activate
pip install -e . # installs deps + registers the mindvault CLI
Or without the CLI shortcut:
pip install -r requirements.txt
First run
mindvault setup # or: python -m mindvault setup
Add your data
Drop your AI export folder into the project directory. PDFs and .txt/.md files go anywhere — point the ingester at them manually.
| Provider | How to export |
|---|---|
| Claude | claude.ai → Settings → Export Data (folder starting with data-) |
| ChatGPT | chatgpt.com → Settings → Data Controls → Export Data |
Index and chat
mindvault ingest # index everything
mindvault chat # start talking to your brain
Running MindVault
Three equivalent ways to run it — use whichever you prefer:
# After pip install -e . (recommended)
mindvault
mindvault chat
mindvault ingest
# As a Python module (no install needed)
python -m mindvault
python -m mindvault chat
python -m mindvault ingest
# Legacy script (still works)
python mindvault.py
python mindvault.py chat
python mindvault.py ingest
Commands
mindvault chat (default)
mindvault chat interactive REPL
mindvault chat --resume resume last session
mindvault chat --resume <id> resume specific session
mindvault ingest auto-discover and index all exports
mindvault ingest ./folder/ index a specific folder
mindvault ingest --force re-index even if already processed
mindvault notes regenerate Obsidian notes
mindvault setup first-run configuration wizard
mindvault stats show index and session statistics
mindvault sessions list resumable sessions
mindvault consolidate merge near-duplicate memories
During a chat session:
Shift+Tab cycle reasoning mode
/help show all commands
/web <query> search the web (DuckDuckGo, no API key needed)
/search <term> search memory without LLM — shows scored results
/note <text> quick-capture a note (indexed on next ingest)
/forget <topic> suppress matching chunks from future retrieval
/mode [name] show or switch mode (CHAT, PLAN, DECIDE, DEBATE, REFLECT, EXPLORE)
/sources show which memories were used in the last answer
/remember <fact> save a fact to this session
/private toggle private vault inclusion
/resume interactive session picker
/clear clear conversation history
/quit, /exit end session (compresses and saves automatically)
Web search
MindVault searches the web automatically when memory confidence is low, or on demand:
/web what is the current price of ETH?
/web latest news on local SEO in 2025
Uses DuckDuckGo — no API key, no Docker, no setup. Configure in config.py:
WEB_SEARCH_AUTO_THRESHOLD = 0.45 # auto-search when best memory score is below this
WEB_SEARCH_MAX_RESULTS = 5 # results to include in context
Set WEB_SEARCH_AUTO_THRESHOLD = 0 to disable auto-search.
Reasoning modes
MindVault has six modes, cycled with Shift+Tab in the prompt bar.
| Mode | What it does |
|---|---|
| 💬 CHAT | Standard RAG — retrieve memories, synthesize an answer |
| 📋 PLAN | Break the task into structured, actionable steps |
| 🗳 DECIDE | Five-voice council votes; tally + majority verdict shown |
| ⚖ DEBATE | FOR vs AGAINST, then a moderated verdict |
| 🔍 REFLECT | Deep synthesis — what does your brain really know about this? |
| 🕸 EXPLORE | Graph traversal — follows memory links to surface surprises |
The council is five internal voices with distinct personalities:
| Voice | Orientation |
|---|---|
| 📊 The Analyst | Evidence-first, skeptical, quantitative |
| 🚀 The Visionary | Big-picture, creative, optimistic |
| 🔧 The Pragmatist | What's actionable right now |
| 😈 The Devil | Challenges every assumption, finds the flaw |
| 📜 The Historian | Patterns across time; what past memory reveals |
How it works
Memory layers
| Layer | What | Used for |
|---|---|---|
| Raw | Original text chunks | Fallback when summaries aren't confident enough |
| Compressed | LLM-generated summaries per session/document | Primary retrieval context |
| Structured | Extracted entities (persons, projects, decisions, goals) | Entity-boosted retrieval |
| Linked | Relationships between memories via shared entities + wikilinks | Graph traversal in EXPLORE mode |
| Web | Live DuckDuckGo results | Augments memory for current/unknown topics |
Retrieval scoring
score = 0.5 × embedding_similarity
+ 0.2 × entity_overlap
+ 0.2 × recency
+ 0.1 × importance
Compressed summaries are searched first. Raw chunks are only fetched when confidence drops below the threshold. EXPLORE mode additionally walks memory_links to pull in related neighbors.
Session lifecycle
- During chat: turns saved live + entities extracted per exchange (background)
- On exit: LLM compresses the session into a 2–4 sentence summary
- Summary embedded and stored in the compressed memory layer
- Resume anytime with
--resumeor/resumeduring chat
Configuration
All settings in mindvault/config.py:
| Variable | Default | What it controls |
|---|---|---|
LLM_MODEL |
llama3.2 |
Model for summarization, chat, extraction |
EMBEDDING_MODEL |
nomic-embed-text |
Vector search embeddings |
CHAT_TOP_K |
8 |
Chunks retrieved per query |
COMPRESSED_SCORE_THRESHOLD |
0.75 |
Below this, also fetch raw chunks |
WEB_SEARCH_AUTO_THRESHOLD |
0.45 |
Auto web search below this memory score (0 = off) |
SUGGEST_FOLLOWUPS |
True |
Suggest follow-up questions after each answer |
WRITE_SESSIONS_TO_VAULT |
True |
Write session summary notes to Obsidian on exit |
CHAT_INCLUDE_PRIVATE |
False |
Include private vault by default |
Storage
| Path | What |
|---|---|
brain.db |
SQLite: ingestion tracking, entities, links, importance scores |
.qdrant/ |
Qdrant: vector index (raw + compressed collections) |
sessions/ |
Compressed chat sessions (.json.gz) |
notes/ |
Quick-captured notes via /note (indexed on next ingest) |
My Brain/ |
Obsidian vault — business, projects, general knowledge |
Private Brain/ |
Obsidian vault — personal content (separate collection) |
data-*/ |
Export folders (excluded from git) |
Privacy
- All processing is local by default.
- Web search uses DuckDuckGo's anonymous API — no account, no tracking.
My BrainandPrivate Brainare in separate Qdrant collections — private content is never implicitly included in responses..gitignoreexcludes all personal data: vaults, exports, sessions, databases.
Requirements
qdrant-client vector database
httpx HTTP client (LLM + web requests)
python-dotenv .env file loading
pypdf PDF ingestion
prompt_toolkit TUI and interactive input
rich markdown rendering in terminal
ddgs web search (no API key)
trafilatura web page content extraction
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mindvault-0.5.461.tar.gz.
File metadata
- Download URL: mindvault-0.5.461.tar.gz
- Upload date:
- Size: 76.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
974f3b6dc7f1a30748031ce30b19eddffa386dde298f040c7b899bc7cbf4e9e4
|
|
| MD5 |
7456340bde2a5d134e3f90130bcf7105
|
|
| BLAKE2b-256 |
357b25678c1244d43cf9c23dda097e1fedd2d16a077e0d2cddf64bfaf25b2967
|
Provenance
The following attestation bundles were made for mindvault-0.5.461.tar.gz:
Publisher:
publish.yml on calebthecm/MindVault
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mindvault-0.5.461.tar.gz -
Subject digest:
974f3b6dc7f1a30748031ce30b19eddffa386dde298f040c7b899bc7cbf4e9e4 - Sigstore transparency entry: 1238286703
- Sigstore integration time:
-
Permalink:
calebthecm/MindVault@84370b722e73039a2c12b0812a4180e91ee410e6 -
Branch / Tag:
refs/tags/v0.5.461 - Owner: https://github.com/calebthecm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84370b722e73039a2c12b0812a4180e91ee410e6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mindvault-0.5.461-py3-none-any.whl.
File metadata
- Download URL: mindvault-0.5.461-py3-none-any.whl
- Upload date:
- Size: 94.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6d486446326897c8ed1200472ff663d6db9aab69263c7c34b02b45dfd58e831
|
|
| MD5 |
cf24c182d60fff5b5e51f5bee6688a41
|
|
| BLAKE2b-256 |
a7de4f56d5640b676d58be72a6966d44ebf2eecd3bad9ff334431f2ae9bd639a
|
Provenance
The following attestation bundles were made for mindvault-0.5.461-py3-none-any.whl:
Publisher:
publish.yml on calebthecm/MindVault
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mindvault-0.5.461-py3-none-any.whl -
Subject digest:
b6d486446326897c8ed1200472ff663d6db9aab69263c7c34b02b45dfd58e831 - Sigstore transparency entry: 1238286704
- Sigstore integration time:
-
Permalink:
calebthecm/MindVault@84370b722e73039a2c12b0812a4180e91ee410e6 -
Branch / Tag:
refs/tags/v0.5.461 - Owner: https://github.com/calebthecm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84370b722e73039a2c12b0812a4180e91ee410e6 -
Trigger Event:
push
-
Statement type: