Local-first second brain RAG system — chat with your own data

Project description

MindVault

A local-first second brain that turns your AI conversation exports, Obsidian notes, and documents into a searchable, conversational memory system.

Everything runs on your machine. No data leaves.

What it does

Ingests Anthropic conversation exports, Obsidian vaults, PDFs, and plain text files
Indexes content into a multi-layer memory system (raw → compressed → structured → linked)
Remembers entities, decisions, and goals extracted from every chat
Retrieves using hybrid scoring — summaries first, raw text only when needed
Chats interactively with six reasoning modes powered by a council of AI voices
Searches the web automatically when your memory doesn't have a confident answer
Saves sessions — resume any previous conversation exactly where you left off

Quick start

Prerequisites

Python 3.11+
Ollama with two models:

ollama pull nomic-embed-text   # vector search
ollama pull llama3.2           # chat and summarization
ollama serve                   # start Ollama if not running

Install

git clone https://github.com/calebthecm/MindVault
cd MindVault
python -m venv .venv && source .venv/bin/activate
pip install -e .               # installs deps + registers the mindvault CLI

Or without the CLI shortcut:

pip install -r requirements.txt

First run

mindvault setup                # or: python -m mindvault setup

Add your data

Drop your Anthropic export folder into the project directory (any folder starting with data-). PDFs and .txt/.md files go anywhere — point the ingester at them manually.

Anthropic export: claude.ai → Settings → Export Data

Index and chat

mindvault ingest               # index everything
mindvault chat                 # start talking to your brain

Running MindVault

Three equivalent ways to run it — use whichever you prefer:

# After pip install -e . (recommended)
mindvault
mindvault chat
mindvault ingest

# As a Python module (no install needed)
python -m mindvault
python -m mindvault chat
python -m mindvault ingest

# Legacy script (still works)
python mindvault.py
python mindvault.py chat
python mindvault.py ingest

Commands

mindvault                           chat (default)
mindvault chat                      interactive REPL
mindvault chat --resume             resume last session
mindvault chat --resume <id>        resume specific session
mindvault ingest                    auto-discover and index all exports
mindvault ingest ./folder/          index a specific folder
mindvault ingest --force            re-index even if already processed
mindvault notes                     regenerate Obsidian notes
mindvault setup                     first-run configuration wizard
mindvault stats                     show index and session statistics
mindvault sessions                  list resumable sessions
mindvault consolidate               merge near-duplicate memories

During a chat session:

Shift+Tab            cycle reasoning mode
/help                show all commands
/web <query>         search the web (DuckDuckGo, no API key needed)
/search <term>       search memory without LLM — shows scored results
/note <text>         quick-capture a note (indexed on next ingest)
/forget <topic>      suppress matching chunks from future retrieval
/mode [name]         show or switch mode (CHAT, PLAN, DECIDE, DEBATE, REFLECT, EXPLORE)
/sources             show which memories were used in the last answer
/remember <fact>     save a fact to this session
/private             toggle private vault inclusion
/resume              interactive session picker
/clear               clear conversation history
/quit, /exit         end session (compresses and saves automatically)

Web search

MindVault searches the web automatically when memory confidence is low, or on demand:

/web what is the current price of ETH?
/web latest news on local SEO in 2025

Uses DuckDuckGo — no API key, no Docker, no setup. Configure in config.py:

WEB_SEARCH_AUTO_THRESHOLD = 0.45   # auto-search when best memory score is below this
WEB_SEARCH_MAX_RESULTS    = 5      # results to include in context

Set WEB_SEARCH_AUTO_THRESHOLD = 0 to disable auto-search.

Reasoning modes

MindVault has six modes, cycled with Shift+Tab in the prompt bar.

Mode	What it does
💬 CHAT	Standard RAG — retrieve memories, synthesize an answer
📋 PLAN	Break the task into structured, actionable steps
🗳 DECIDE	Five-voice council votes; tally + majority verdict shown
⚖ DEBATE	FOR vs AGAINST, then a moderated verdict
🔍 REFLECT	Deep synthesis — what does your brain really know about this?
🕸 EXPLORE	Graph traversal — follows memory links to surface surprises

The council is five internal voices with distinct personalities:

Voice	Orientation
📊 The Analyst	Evidence-first, skeptical, quantitative
🚀 The Visionary	Big-picture, creative, optimistic
🔧 The Pragmatist	What's actionable right now
😈 The Devil	Challenges every assumption, finds the flaw
📜 The Historian	Patterns across time; what past memory reveals

How it works

Memory layers

Layer	What	Used for
Raw	Original text chunks	Fallback when summaries aren't confident enough
Compressed	LLM-generated summaries per session/document	Primary retrieval context
Structured	Extracted entities (persons, projects, decisions, goals)	Entity-boosted retrieval
Linked	Relationships between memories via shared entities + wikilinks	Graph traversal in EXPLORE mode
Web	Live DuckDuckGo results	Augments memory for current/unknown topics

Retrieval scoring

score = 0.5 × embedding_similarity
      + 0.2 × entity_overlap
      + 0.2 × recency
      + 0.1 × importance

Compressed summaries are searched first. Raw chunks are only fetched when confidence drops below the threshold. EXPLORE mode additionally walks memory_links to pull in related neighbors.

Session lifecycle

During chat: turns saved live + entities extracted per exchange (background)
On exit: LLM compresses the session into a 2–4 sentence summary
Summary embedded and stored in the compressed memory layer
Resume anytime with --resume or /resume during chat

Configuration

All settings in mindvault/config.py:

Variable	Default	What it controls
`LLM_MODEL`	`llama3.2`	Model for summarization, chat, extraction
`EMBEDDING_MODEL`	`nomic-embed-text`	Vector search embeddings
`CHAT_TOP_K`	`8`	Chunks retrieved per query
`COMPRESSED_SCORE_THRESHOLD`	`0.75`	Below this, also fetch raw chunks
`WEB_SEARCH_AUTO_THRESHOLD`	`0.45`	Auto web search below this memory score (0 = off)
`SUGGEST_FOLLOWUPS`	`True`	Suggest follow-up questions after each answer
`WRITE_SESSIONS_TO_VAULT`	`True`	Write session summary notes to Obsidian on exit
`CHAT_INCLUDE_PRIVATE`	`False`	Include private vault by default

Storage

Path	What
`brain.db`	SQLite: ingestion tracking, entities, links, importance scores
`.qdrant/`	Qdrant: vector index (raw + compressed collections)
`sessions/`	Compressed chat sessions (`.json.gz`)
`notes/`	Quick-captured notes via `/note` (indexed on next ingest)
`My Brain/`	Obsidian vault — business, projects, general knowledge
`Private Brain/`	Obsidian vault — personal content (separate collection)
`data-*/`	Export folders (excluded from git)

Privacy

All processing is local by default.
Web search uses DuckDuckGo's anonymous API — no account, no tracking.
My Brain and Private Brain are in separate Qdrant collections — private content is never implicitly included in responses.
.gitignore excludes all personal data: vaults, exports, sessions, databases.

Requirements

qdrant-client       vector database
httpx               HTTP client (LLM + web requests)
python-dotenv       .env file loading
pypdf               PDF ingestion
prompt_toolkit      TUI and interactive input
rich                markdown rendering in terminal
ddgs                web search (no API key)
trafilatura         web page content extraction

Project details

Release history Release notifications | RSS feed

0.5.461

Apr 5, 2026

0.5.460

Apr 5, 2026

This version

0.5.3

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mindvault-0.5.3.tar.gz (76.2 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mindvault-0.5.3-py3-none-any.whl (93.8 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file mindvault-0.5.3.tar.gz.

File metadata

Download URL: mindvault-0.5.3.tar.gz
Upload date: Apr 4, 2026
Size: 76.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mindvault-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`7538725b861ea365c5a165acec676cba4649422b10d95b8e608af2ef7bc641d9`
MD5	`1b455f6e36886b6eb1f9d55de53b120e`
BLAKE2b-256	`cd80b1071a4c7223a7e08b3397bb7051c46d8ac38828306fb2c9b2a8e2216f26`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mindvault-0.5.3.tar.gz:

Publisher: publish.yml on calebthecm/MindVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mindvault-0.5.3.tar.gz
- Subject digest: 7538725b861ea365c5a165acec676cba4649422b10d95b8e608af2ef7bc641d9
- Sigstore transparency entry: 1237478944
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: calebthecm/MindVault@111c8edbd741a72f56d3ad2860493f1d9a2c7df5
- Branch / Tag: refs/tags/v0.5.3
- Owner: https://github.com/calebthecm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@111c8edbd741a72f56d3ad2860493f1d9a2c7df5
- Trigger Event: push

File details

Details for the file mindvault-0.5.3-py3-none-any.whl.

File metadata

Download URL: mindvault-0.5.3-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 93.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mindvault-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`731808fc179761db2c63c38a49793fc43852fead81e2547b64b008826944f64f`
MD5	`c539fd102cd181aa776237878d2d66e0`
BLAKE2b-256	`0f12cd49915229ff9447a0e1b794eeb19847f612f727128b111f09acd9d89b3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mindvault-0.5.3-py3-none-any.whl:

Publisher: publish.yml on calebthecm/MindVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mindvault-0.5.3-py3-none-any.whl
- Subject digest: 731808fc179761db2c63c38a49793fc43852fead81e2547b64b008826944f64f
- Sigstore transparency entry: 1237478960
- Sigstore integration time: Apr 4, 2026
Source repository:
- Permalink: calebthecm/MindVault@111c8edbd741a72f56d3ad2860493f1d9a2c7df5
- Branch / Tag: refs/tags/v0.5.3
- Owner: https://github.com/calebthecm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@111c8edbd741a72f56d3ad2860493f1d9a2c7df5
- Trigger Event: push

mindvault 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MindVault

What it does

Quick start

Prerequisites

Install

First run

Add your data

Index and chat

Running MindVault

Commands

Web search

Reasoning modes

How it works

Memory layers

Retrieval scoring

Session lifecycle

Configuration

Storage

Privacy

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance