Turn 3 years of AI conversations into a queryable second brain - MCP Server

Project description

🧬 Intellectual DNA

Turn 3 years of AI conversations into a queryable second brain

376K messages · 118K embeddings · 31 MCP tools · 256ms queries

You: "What do I actually think about agency?"

Brain: Searching 118K embedded messages...

Your position evolved:
  2023: "AI should do what I say"
  2024: "AI should preserve my decision sovereignty"  
  2025: "100% human control, 100% machine execution"

Related SEED principle (AGENCY PRESERVATION):
"Maintain human decision-making control while automating everything else"

What is this?

Every conversation you have with an AI is a thought you externalized. Over 3 years, that's 376,000 thoughts — but they're scattered across ChatGPT exports, Claude sessions, Gemini chats, and code editor transcripts.

Intellectual DNA turns that scattered history into a queryable knowledge system. Not a note-taking app — a second brain that can:

Find patterns you'd never think to search for
Track how your thinking evolved on any topic
Surface contradictions between what you say and what you do
Cross-reference conversations with your GitHub commits, markdown docs, and more

It runs as an MCP server — plug it into Claude Code, Claude Desktop, or any MCP-compatible client, and your entire intellectual history becomes context.

The Numbers

Metric	Value
Conversation messages	376,164
Embedded vectors	118,533 (768d, nomic-v1.5)
GitHub commits indexed	2,217 across 146 repos
Markdown docs harvested	5,524
MCP tools exposed	31
Semantic query time	~256ms
Vector DB size	493MB (was 14GB before LanceDB migration)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     MCP BRAIN SERVER                            │
│              31 tools · Claude Code / Desktop                   │
│  semantic_search · thinking_trajectory · alignment_check · ...  │
└──────────────────────────┬──────────────────────────────────────┘
                           │
              ┌────────────┴────────────┐
              ▼                         ▼
┌──────────────────────┐  ┌──────────────────────────────────────┐
│    LANCEDB VECTORS   │  │         DUCKDB + PARQUET             │
│  118K embeddings     │  │  376K messages · keyword search      │
│  768d nomic-v1.5     │  │  columnar · compressed · portable    │
│  493MB on disk       │  │  serverless SQL analytics            │
└──────────────────────┘  └──────────────────────────────────────┘
              │                         │
              └────────────┬────────────┘
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  DATA SOURCES (Immutable)                       │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐      │
│  │ Claude    │ │ ChatGPT   │ │ Gemini    │ │ Clawdbot  │      │
│  │ Code/     │ │ export    │ │ sessions  │ │ sessions  │      │
│  │ Desktop   │ │           │ │           │ │           │      │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘      │
│  ┌───────────┐ ┌───────────┐ ┌───────────────────────┐        │
│  │ GitHub    │ │ Markdown  │ │ Interpretation layers  │        │
│  │ 2.2K      │ │ 5.5K docs │ │ focus · mood · themes  │        │
│  │ commits   │ │           │ │ spend · velocity · ...  │        │
│  └───────────┘ └───────────┘ └───────────────────────┘        │
└─────────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AUTO-SYNC PIPELINE                           │
│  Claude Code hook → sync.py → parquet → embed → ready          │
│  Hourly: clawdbot sessions · Nightly: all sources + vectors    │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions

Facts vs Interpretations

Raw data is immutable. Derived analysis lives in versioned layers. Wrong interpretation? Delete the version and rebuild. Source data stays clean forever.

data/
├── facts/          # NEVER modified — append only
│   ├── brain/      # L0 index → L1 summary → L2 content → L3 raw
│   ├── spend/      # raw → daily → monthly aggregation
│   └── sources/    # original parquets (symlinks)
└── interpretations/ # DERIVED — versioned, rebuildable
    ├── focus/v1/
    ├── mood_patterns/
    └── weekly_summaries/

Onion Skin Layers (L0–L3)

Progressive disclosure — query only what you need:

Layer	Contents	Use Case
L0 index	Event IDs, timestamps, source	Quick lookups, counts
L1 summary	500-char preview + embedding	Semantic search, browsing
L2 content	Full text, has_code, has_url	Deep reading
L3 deep	Symlinks to original parquets	Source verification

LanceDB over DuckDB VSS

Started with DuckDB for vector search. Discovered duplicate HNSW indexes created 46x storage overhead (14GB for 300MB of data). Migrated to LanceDB: 493MB, same vectors, native incremental indexing. Same 256ms query time.

MCP Tools (31)

🔍 Search (7)

Tool	Description
`semantic_search`	Vector similarity via LanceDB (768d nomic embeddings)
`search_conversations`	Keyword search via DuckDB SQL on parquet
`unified_search`	Cross-source: conversations + GitHub + markdown
`search_ip_docs`	Vector search on curated IP documents
`search_markdown`	Keyword search on 5.5K harvested markdown docs
`code_to_conversation`	Semantic search across commits + conversations
`find_user_questions`	Recent questions asked

🧠 Synthesis (4)

Tool	Description
`what_do_i_think`	Synthesize your views on any topic from all evidence
`find_precedent`	Find similar situations from the past
`alignment_check`	Check if a decision aligns with your principles
`thinking_trajectory`	Track how an idea evolved over months/years

💬 Conversation (5)

Tool	Description
`get_conversation`	Full conversation by ID
`conversations_by_date`	What happened on a specific date
`what_was_i_thinking`	Month snapshot: themes, activity, concepts
`concept_velocity`	How often a term appears over time
`first_mention`	When a concept first appeared in your history

🐙 GitHub (4)

Tool	Description
`github_project_timeline`	Repo creation, commits, activity windows
`conversation_project_context`	Conversations mentioning a project
`validate_date_with_github`	Verify conversation dates via commit timestamps
`code_to_conversation`	Bridge code changes to discussion context

📄 Markdown Corpus (4)

Tool	Description
`get_breakthrough_docs`	Documents tagged with high breakthrough energy
`get_deep_docs`	High depth-score documents
`get_project_docs`	All docs for a specific project
`get_open_todos`	Documents with open TODO items

📊 Analysis (5)

Tool	Description
`query_tool_stacks`	Technology stack patterns
`query_problem_resolution`	Debugging and problem-solving patterns
`query_spend`	Cost breakdown by source and time period
`query_timeline`	Cross-source timeline for any date
`query_conversation_summary`	Comprehensive conversation analysis

⚙️ Meta (2)

Tool	Description
`brain_stats`	Overview of all data sources and counts
`list_principles` / `get_principle`	Your foundational SEED principles

Data Flow

  Clawdbot Sessions          Claude Code          ChatGPT Export       Gemini
  ~/.clawdbot/agents/     ~/.claude/projects/     conversations.json   sessions
         │                       │                       │                │
         ▼                       ▼                       ▼                ▼
     sync_clawdbot.py        live/sync.py          import pipeline    import pipeline
         │                       │                       │                │
         └───────────────────────┴───────────────┬───────┘────────────────┘
                                                 ▼
                            data/all_conversations.parquet (376K messages)
                                                 │
                                    ┌────────────┴────────────┐
                                    ▼                         ▼
                           embed_new_messages.py      build_*.py (88 pipelines)
                                    │                         │
                                    ▼                         ▼
                           vectors/brain.lance/       data/interpretations/
                           (118K vectors, 493MB)      (focus, mood, themes, ...)
                                    │                         │
                                    └────────────┬────────────┘
                                                 ▼
                                        mcp_brain_server.py
                                         (31 MCP tools)
                                                 │
                                                 ▼
                                    Claude Code · Claude Desktop
                                      Any MCP-compatible client

Quick Start

Prerequisites

Python 3.11+
Apple Silicon Mac recommended (MPS acceleration for embeddings)
mcporter or any MCP client

1. Clone & Setup

git clone https://github.com/mordechaipotash/intellectual-dna.git
cd intellectual-dna

# Create virtual environment
python -m venv mcp-env
source mcp-env/bin/activate

# Install dependencies
pip install duckdb lancedb nomic fastmcp pandas pyarrow

2. Prepare Your Data

The system expects conversation data in parquet format. Export your conversations:

# Import ChatGPT export
python -m pipelines import_chatgpt /path/to/conversations.json

# Import Claude Code sessions
python -m pipelines import_claude_code

# Or bring your own parquet with columns: 
# [message_id, conversation_id, role, content, created, source]

3. Embed & Index

# Generate embeddings (uses nomic-embed-text-v1.5 locally)
python pipelines/embed_new_messages.py

# Check stats
python pipelines/embed_new_messages.py stats

4. Run the MCP Server

# Direct
python mordelab/02-monotropic-prosthetic/mcp_brain_server.py

# Or via mcporter config (~/.mcporter/mcporter.json):
{
  "brain": {
    "command": "python",
    "args": ["mordelab/02-monotropic-prosthetic/mcp_brain_server.py"],
    "lifecycle": "keep-alive"
  }
}

5. Query Your Brain

# Semantic search
semantic_search("what do I think about productivity?", limit=10)

# Track idea evolution
thinking_trajectory("agency")

# Time-travel to any month
what_was_i_thinking("2024-08")

# Cross-source search
unified_search("database optimization")

Tech Stack

Component	Choice	Why
Vector DB	LanceDB	32x smaller than DuckDB VSS, native incremental, no index footguns
Embeddings	nomic-embed-text-v1.5	768d, runs locally on Apple Silicon via MPS
Analytics	DuckDB	Fast SQL on parquet, serverless, zero config
Storage	Parquet	Columnar, compressed, portable, ecosystem support
Interface	MCP (FastMCP)	Direct integration with Claude Code/Desktop
Automation	launchd + hooks	Native macOS scheduling, zero external deps
Pipelines	88 Python scripts	Each pipeline is standalone, composable

The SEED Principles

Eight foundational mental models extracted from 376K messages:

Principle	Core Idea
INVERSION	Reverse the problem — ask what prevents NOT-X
COMPRESSION	Reduce to essential while preserving decision quality
AGENCY	100% human control, 100% machine execution
BOTTLENECK	Find the constraint, amplify it as leverage
TRANSLATION	Interface between infinite AI output and finite human comprehension
TEMPORAL	Human time is the ultimate scarce resource
SEEDS	Autonomous bounded systems with clear interfaces
COGNITIVE	Design systems that amplify your brain, not fight it

Repository Structure

intellectual-dna/
├── mordelab/02-monotropic-prosthetic/
│   ├── mcp_brain_server.py          # MCP server (31 tools)
│   └── SEED-MORDETROPIC-128KB-MASTER.json  # 8 principles
├── pipelines/                        # 88 data pipelines
│   ├── embed_new_messages.py         # Parquet → LanceDB vectors
│   ├── sync_clawdbot.py             # Clawdbot sessions → parquet
│   ├── sync_github.py               # GitHub repos + commits
│   ├── harvest_markdown.py          # Markdown corpus builder
│   ├── build_*.py                   # 50+ interpretation builders
│   └── rebuild.py                   # Unified orchestrator
├── live/
│   ├── sync.py                      # Auto-sync from Claude Code
│   └── daily_briefing.py            # Morning briefing agent
├── data/                             # (gitignored)
│   ├── facts/                        # Immutable source data
│   │   └── brain/                    # L0-L3 onion layers
│   └── interpretations/              # Derived, versioned analysis
├── vectors/                          # (gitignored)
│   └── brain.lance/                  # 118K vectors (493MB)
├── config.py                         # Central configuration
└── .claude/CLAUDE.md                 # Context engineering for Claude Code

Lessons Learned

DuckDB VSS has footguns — Accidentally created duplicate HNSW indexes. 14GB for 300MB of data. LanceDB just works.
Facts vs Interpretations prevents rebuild nightmares — Mixing raw data with derived analysis creates cascading corruption. Keep them separate.
Auto-sync beats manual export — Claude Code stop hook triggers sync.py. New conversations flow in automatically. Zero friction = actually gets used.
Embeddings beat keywords — "What was I thinking about agency?" finds relevant messages even when you never used that exact word.
88 pipelines > 1 monolith — Each pipeline is a standalone script. Easy to run, debug, or replace individually.

Related Projects

brain-canvas — Give any LLM its own display (npx brain-canvas)
youtube-transcription-pipeline — 31K+ videos, transcribed
seedgarden — The SHELET Protocol for AI-human interfaces

Work With Me

Open to async contract work in context engineering, MCP server development, and AI orchestration systems.

GitHub · Reddit

Built by Mordechai Potash — a monotropic polymath who needed a system that works with deep focus, not against it.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz (232.3 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz.

File metadata

Download URL: iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz
Upload date: Feb 11, 2026
Size: 232.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_mordechaipotash_intellectual_dna-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`33d5f3cab84a1840dbb50ffa14e6bff395f6b45c9af3ec16626ad1dc3b362dbb`
MD5	`1c8a79004a316da301477c7c5952ca14`
BLAKE2b-256	`468fa1e94d47ca501e38597ded5bd5462dcb3cf8a9e20d3bee6ef62a68d3c729`

See more details on using hashes here.

File details

Details for the file iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl.

File metadata

Download URL: iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_mordechaipotash_intellectual_dna-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f533b9da3171500c10b846797cca4b3d86d51d06c22ace14a2c365735780133`
MD5	`f47383a210082df1c325b11e9151f203`
BLAKE2b-256	`a5b99fa62b87882270048cfa628fce8bb0a91bb17b990765afaf50afab10e423`

See more details on using hashes here.

iflow-mcp_mordechaipotash-intellectual-dna 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧬 Intellectual DNA

What is this?

The Numbers

Architecture

Key Design Decisions

Facts vs Interpretations

Onion Skin Layers (L0–L3)

LanceDB over DuckDB VSS

MCP Tools (31)

🔍 Search (7)

🧠 Synthesis (4)

💬 Conversation (5)

🐙 GitHub (4)

📄 Markdown Corpus (4)

📊 Analysis (5)

⚙️ Meta (2)

Data Flow

Quick Start

Prerequisites

1. Clone & Setup

2. Prepare Your Data

3. Embed & Index

4. Run the MCP Server

5. Query Your Brain

Tech Stack

The SEED Principles

Repository Structure

Lessons Learned

Related Projects

Work With Me

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes