Turn 3 years of AI conversations into a queryable second brain - MCP Server
Project description
๐งฌ Intellectual DNA
Turn 3 years of AI conversations into a queryable second brain
376K messages ยท 118K embeddings ยท 31 MCP tools ยท 256ms queries
You: "What do I actually think about agency?"
Brain: Searching 118K embedded messages...
Your position evolved:
2023: "AI should do what I say"
2024: "AI should preserve my decision sovereignty"
2025: "100% human control, 100% machine execution"
Related SEED principle (AGENCY PRESERVATION):
"Maintain human decision-making control while automating everything else"
What is this?
Every conversation you have with an AI is a thought you externalized. Over 3 years, that's 376,000 thoughts โ but they're scattered across ChatGPT exports, Claude sessions, Gemini chats, and code editor transcripts.
Intellectual DNA turns that scattered history into a queryable knowledge system. Not a note-taking app โ a second brain that can:
- Find patterns you'd never think to search for
- Track how your thinking evolved on any topic
- Surface contradictions between what you say and what you do
- Cross-reference conversations with your GitHub commits, markdown docs, and more
It runs as an MCP server โ plug it into Claude Code, Claude Desktop, or any MCP-compatible client, and your entire intellectual history becomes context.
The Numbers
| Metric | Value |
|---|---|
| Conversation messages | 376,164 |
| Embedded vectors | 118,533 (768d, nomic-v1.5) |
| GitHub commits indexed | 2,217 across 146 repos |
| Markdown docs harvested | 5,524 |
| MCP tools exposed | 31 |
| Semantic query time | ~256ms |
| Vector DB size | 493MB (was 14GB before LanceDB migration) |
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP BRAIN SERVER โ
โ 31 tools ยท Claude Code / Desktop โ
โ semantic_search ยท thinking_trajectory ยท alignment_check ยท ... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LANCEDB VECTORS โ โ DUCKDB + PARQUET โ
โ 118K embeddings โ โ 376K messages ยท keyword search โ
โ 768d nomic-v1.5 โ โ columnar ยท compressed ยท portable โ
โ 493MB on disk โ โ serverless SQL analytics โ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA SOURCES (Immutable) โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ Claude โ โ ChatGPT โ โ Gemini โ โ Clawdbot โ โ
โ โ Code/ โ โ export โ โ sessions โ โ sessions โ โ
โ โ Desktop โ โ โ โ โ โ โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ GitHub โ โ Markdown โ โ Interpretation layers โ โ
โ โ 2.2K โ โ 5.5K docs โ โ focus ยท mood ยท themes โ โ
โ โ commits โ โ โ โ spend ยท velocity ยท ... โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AUTO-SYNC PIPELINE โ
โ Claude Code hook โ sync.py โ parquet โ embed โ ready โ
โ Hourly: clawdbot sessions ยท Nightly: all sources + vectors โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Design Decisions
Facts vs Interpretations
Raw data is immutable. Derived analysis lives in versioned layers. Wrong interpretation? Delete the version and rebuild. Source data stays clean forever.
data/
โโโ facts/ # NEVER modified โ append only
โ โโโ brain/ # L0 index โ L1 summary โ L2 content โ L3 raw
โ โโโ spend/ # raw โ daily โ monthly aggregation
โ โโโ sources/ # original parquets (symlinks)
โโโ interpretations/ # DERIVED โ versioned, rebuildable
โโโ focus/v1/
โโโ mood_patterns/
โโโ weekly_summaries/
Onion Skin Layers (L0โL3)
Progressive disclosure โ query only what you need:
| Layer | Contents | Use Case |
|---|---|---|
| L0 index | Event IDs, timestamps, source | Quick lookups, counts |
| L1 summary | 500-char preview + embedding | Semantic search, browsing |
| L2 content | Full text, has_code, has_url | Deep reading |
| L3 deep | Symlinks to original parquets | Source verification |
LanceDB over DuckDB VSS
Started with DuckDB for vector search. Discovered duplicate HNSW indexes created 46x storage overhead (14GB for 300MB of data). Migrated to LanceDB: 493MB, same vectors, native incremental indexing. Same 256ms query time.
MCP Tools (31)
๐ Search (7)
| Tool | Description |
|---|---|
semantic_search |
Vector similarity via LanceDB (768d nomic embeddings) |
search_conversations |
Keyword search via DuckDB SQL on parquet |
unified_search |
Cross-source: conversations + GitHub + markdown |
search_ip_docs |
Vector search on curated IP documents |
search_markdown |
Keyword search on 5.5K harvested markdown docs |
code_to_conversation |
Semantic search across commits + conversations |
find_user_questions |
Recent questions asked |
๐ง Synthesis (4)
| Tool | Description |
|---|---|
what_do_i_think |
Synthesize your views on any topic from all evidence |
find_precedent |
Find similar situations from the past |
alignment_check |
Check if a decision aligns with your principles |
thinking_trajectory |
Track how an idea evolved over months/years |
๐ฌ Conversation (5)
| Tool | Description |
|---|---|
get_conversation |
Full conversation by ID |
conversations_by_date |
What happened on a specific date |
what_was_i_thinking |
Month snapshot: themes, activity, concepts |
concept_velocity |
How often a term appears over time |
first_mention |
When a concept first appeared in your history |
๐ GitHub (4)
| Tool | Description |
|---|---|
github_project_timeline |
Repo creation, commits, activity windows |
conversation_project_context |
Conversations mentioning a project |
validate_date_with_github |
Verify conversation dates via commit timestamps |
code_to_conversation |
Bridge code changes to discussion context |
๐ Markdown Corpus (4)
| Tool | Description |
|---|---|
get_breakthrough_docs |
Documents tagged with high breakthrough energy |
get_deep_docs |
High depth-score documents |
get_project_docs |
All docs for a specific project |
get_open_todos |
Documents with open TODO items |
๐ Analysis (5)
| Tool | Description |
|---|---|
query_tool_stacks |
Technology stack patterns |
query_problem_resolution |
Debugging and problem-solving patterns |
query_spend |
Cost breakdown by source and time period |
query_timeline |
Cross-source timeline for any date |
query_conversation_summary |
Comprehensive conversation analysis |
โ๏ธ Meta (2)
| Tool | Description |
|---|---|
brain_stats |
Overview of all data sources and counts |
list_principles / get_principle |
Your foundational SEED principles |
Data Flow
Clawdbot Sessions Claude Code ChatGPT Export Gemini
~/.clawdbot/agents/ ~/.claude/projects/ conversations.json sessions
โ โ โ โ
โผ โผ โผ โผ
sync_clawdbot.py live/sync.py import pipeline import pipeline
โ โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
data/all_conversations.parquet (376K messages)
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โผ โผ
embed_new_messages.py build_*.py (88 pipelines)
โ โ
โผ โผ
vectors/brain.lance/ data/interpretations/
(118K vectors, 493MB) (focus, mood, themes, ...)
โ โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โผ
mcp_brain_server.py
(31 MCP tools)
โ
โผ
Claude Code ยท Claude Desktop
Any MCP-compatible client
Quick Start
Prerequisites
- Python 3.11+
- Apple Silicon Mac recommended (MPS acceleration for embeddings)
- mcporter or any MCP client
1. Clone & Setup
git clone https://github.com/mordechaipotash/intellectual-dna.git
cd intellectual-dna
# Create virtual environment
python -m venv mcp-env
source mcp-env/bin/activate
# Install dependencies
pip install duckdb lancedb nomic fastmcp pandas pyarrow
2. Prepare Your Data
The system expects conversation data in parquet format. Export your conversations:
# Import ChatGPT export
python -m pipelines import_chatgpt /path/to/conversations.json
# Import Claude Code sessions
python -m pipelines import_claude_code
# Or bring your own parquet with columns:
# [message_id, conversation_id, role, content, created, source]
3. Embed & Index
# Generate embeddings (uses nomic-embed-text-v1.5 locally)
python pipelines/embed_new_messages.py
# Check stats
python pipelines/embed_new_messages.py stats
4. Run the MCP Server
# Direct
python mordelab/02-monotropic-prosthetic/mcp_brain_server.py
# Or via mcporter config (~/.mcporter/mcporter.json):
{
"brain": {
"command": "python",
"args": ["mordelab/02-monotropic-prosthetic/mcp_brain_server.py"],
"lifecycle": "keep-alive"
}
}
5. Query Your Brain
# Semantic search
semantic_search("what do I think about productivity?", limit=10)
# Track idea evolution
thinking_trajectory("agency")
# Time-travel to any month
what_was_i_thinking("2024-08")
# Cross-source search
unified_search("database optimization")
Tech Stack
| Component | Choice | Why |
|---|---|---|
| Vector DB | LanceDB | 32x smaller than DuckDB VSS, native incremental, no index footguns |
| Embeddings | nomic-embed-text-v1.5 | 768d, runs locally on Apple Silicon via MPS |
| Analytics | DuckDB | Fast SQL on parquet, serverless, zero config |
| Storage | Parquet | Columnar, compressed, portable, ecosystem support |
| Interface | MCP (FastMCP) | Direct integration with Claude Code/Desktop |
| Automation | launchd + hooks | Native macOS scheduling, zero external deps |
| Pipelines | 88 Python scripts | Each pipeline is standalone, composable |
The SEED Principles
Eight foundational mental models extracted from 376K messages:
| Principle | Core Idea |
|---|---|
| INVERSION | Reverse the problem โ ask what prevents NOT-X |
| COMPRESSION | Reduce to essential while preserving decision quality |
| AGENCY | 100% human control, 100% machine execution |
| BOTTLENECK | Find the constraint, amplify it as leverage |
| TRANSLATION | Interface between infinite AI output and finite human comprehension |
| TEMPORAL | Human time is the ultimate scarce resource |
| SEEDS | Autonomous bounded systems with clear interfaces |
| COGNITIVE | Design systems that amplify your brain, not fight it |
Repository Structure
intellectual-dna/
โโโ mordelab/02-monotropic-prosthetic/
โ โโโ mcp_brain_server.py # MCP server (31 tools)
โ โโโ SEED-MORDETROPIC-128KB-MASTER.json # 8 principles
โโโ pipelines/ # 88 data pipelines
โ โโโ embed_new_messages.py # Parquet โ LanceDB vectors
โ โโโ sync_clawdbot.py # Clawdbot sessions โ parquet
โ โโโ sync_github.py # GitHub repos + commits
โ โโโ harvest_markdown.py # Markdown corpus builder
โ โโโ build_*.py # 50+ interpretation builders
โ โโโ rebuild.py # Unified orchestrator
โโโ live/
โ โโโ sync.py # Auto-sync from Claude Code
โ โโโ daily_briefing.py # Morning briefing agent
โโโ data/ # (gitignored)
โ โโโ facts/ # Immutable source data
โ โ โโโ brain/ # L0-L3 onion layers
โ โโโ interpretations/ # Derived, versioned analysis
โโโ vectors/ # (gitignored)
โ โโโ brain.lance/ # 118K vectors (493MB)
โโโ config.py # Central configuration
โโโ .claude/CLAUDE.md # Context engineering for Claude Code
Lessons Learned
-
DuckDB VSS has footguns โ Accidentally created duplicate HNSW indexes. 14GB for 300MB of data. LanceDB just works.
-
Facts vs Interpretations prevents rebuild nightmares โ Mixing raw data with derived analysis creates cascading corruption. Keep them separate.
-
Auto-sync beats manual export โ Claude Code stop hook triggers
sync.py. New conversations flow in automatically. Zero friction = actually gets used. -
Embeddings beat keywords โ "What was I thinking about agency?" finds relevant messages even when you never used that exact word.
-
88 pipelines > 1 monolith โ Each pipeline is a standalone script. Easy to run, debug, or replace individually.
Related Projects
- brain-canvas โ Give any LLM its own display (
npx brain-canvas) - youtube-transcription-pipeline โ 31K+ videos, transcribed
- seedgarden โ The SHELET Protocol for AI-human interfaces
Work With Me
Open to async contract work in context engineering, MCP server development, and AI orchestration systems.
Built by Mordechai Potash โ a monotropic polymath who needed a system that works with deep focus, not against it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz.
File metadata
- Download URL: iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz
- Upload date:
- Size: 232.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33d5f3cab84a1840dbb50ffa14e6bff395f6b45c9af3ec16626ad1dc3b362dbb
|
|
| MD5 |
1c8a79004a316da301477c7c5952ca14
|
|
| BLAKE2b-256 |
468fa1e94d47ca501e38597ded5bd5462dcb3cf8a9e20d3bee6ef62a68d3c729
|
File details
Details for the file iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f533b9da3171500c10b846797cca4b3d86d51d06c22ace14a2c365735780133
|
|
| MD5 |
f47383a210082df1c325b11e9151f203
|
|
| BLAKE2b-256 |
a5b99fa62b87882270048cfa628fce8bb0a91bb17b990765afaf50afab10e423
|