Transform chat history into structured, navigable memory graphs
Project description
🧠 Rebrain
Transform chat history into structured, personalized AI memory.
Rebrain processes your ChatGPT conversations through a 5-step pipeline, extracting observations, synthesizing learnings and cognitions, then building a user persona for hyper-personalized AI interactions.
🚀 Quick Start (Recommended)
Using UV - Zero Setup Required
# 1. Install UV (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Set your API key
export GEMINI_API_KEY=your_key_here
# 3. Process your conversations (place conversations.json in current directory)
uvx rebrain pipeline run
# 4. Start MCP server (auto-loads processed data)
uvx rebrain mcp --port 9999
# Advanced: Custom paths and config
uvx rebrain pipeline run --input /path/to/conversations.json --data-path ./my-data
uvx rebrain pipeline run --config custom.yaml # Custom clustering parameters
uvx rebrain mcp --data-path ./my-data --port 9999 --user-id myproject
That's it! No Python installation, no virtual environments, no dependencies to manage.
See INSTALL.md for detailed installation options.
🎯 For Developers
# Clone and setup
git clone https://github.com/yasinsb/rebrain.git
cd rebrain
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp env.template .env # Add your GEMINI_API_KEY
# Run pipeline (bash CLI)
scripts/pipeline/cli.sh all
# Or step by step
scripts/pipeline/cli.sh step1 # Transform & filter
scripts/pipeline/cli.sh step2 # Extract & cluster observations
scripts/pipeline/cli.sh step3 # Synthesize learnings
scripts/pipeline/cli.sh step4 # Synthesize cognitions
scripts/pipeline/cli.sh step5 # Build persona
# Load into memg-core
python scripts/load_memg.py
Output: data/persona/persona.md - ready for system prompts!
🤖 MCP Integration (Claude Desktop / Cursor)
HTTP Mode (Recommended for Stability)
Start the server:
# Start with default user_id="rebrain"
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999
# Or use custom user_id for multi-user setups
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999 --user-id myproject
# Using rebrain CLI (equivalent)
uvx rebrain mcp --data-path ./data --port 9999
Add to ~/.cursor/mcp.json or Claude Desktop config:
{
"mcpServers": {
"rebrain": {
"url": "http://localhost:9999/mcp"
}
}
}
Direct Mode (stdio)
⚠️ Known Issue: stdio mode has stability issues with Cursor/Claude Desktop. HTTP mode is recommended.
If you still want to try stdio mode:
{
"mcpServers": {
"rebrain": {
"command": "uvx",
"args": ["--from", "rebrain", "rebrain-mcp", "--data-path", "/absolute/path/to/your/data"],
"env": {
"GEMINI_API_KEY": "your_key_here"
}
}
}
}
User ID Configuration
The MCP server now supports configurable user_id for memory isolation:
- Default:
user_id="rebrain"- used if not specified - Custom: Pass
--user-id myprojectwhen starting the server - Multi-user: Each user_id maintains separate memory space
- Agent Usage: Agents don't need to provide user_id (uses server default)
Benefits:
- 💰 Process once (~$0.10-0.20), query forever for free (local memg-core)
- ⚡ Instant restarts - database persists, no reprocessing
- 🔒 100% local - no ongoing API costs, no cloud lock-in
Quick Start (Legacy)
1. Setup
git clone <repo-url>
cd rebrain
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure
cp env.template .env
# Edit .env: add GEMINI_API_KEY
2. Prepare Data
Export your ChatGPT conversations and place the JSON file at:
data/raw/conversations.json
3. Run Pipeline
# Full pipeline (5 steps)
scripts/pipeline/cli.sh all
# Or run individual steps
scripts/pipeline/cli.sh step1
scripts/pipeline/cli.sh step2
# ... etc
4. Check Results
# View pipeline status
scripts/pipeline/cli.sh status
# Read your persona
cat data/persona/persona.md
Pipeline Overview
Rebrain uses a 5-stage synthesis pipeline:
Raw Conversations (JSON export)
↓ Step 1: Transform, Filter & Truncate
Clean Conversations (date filtered, code removed, smart truncation)
↓ Step 2: Extract & Cluster Observations (AI + K-Means)
Clustered Observations (~40 clusters by category)
↓ Step 3: Synthesize Learnings (AI + K-Means)
Clustered Learnings (~10 clusters)
↓ Step 4: Synthesize Cognitions (AI)
High-Level Cognitions (~20 patterns)
↓ Step 5: Build Persona (AI)
User Persona (3 plain text sections)
Key Features:
- Smart Truncation: Progressive head+tail strategy (2K start + 3K end) for long conversations
- Clean Formatting: LLM-optimized input format (USER/ASSISTANT, no metadata noise)
- Privacy-First: Category-specific filtering at observation extraction
- Adaptive Clustering: Finds local optima with tolerance-based K-Means
- Flexible Models: Override per-task via prompt template metadata
- Provenance Tracking: Full lineage from conversation → observation → learning → cognition
- Dual Output: JSON (structured) + Markdown (human-readable)
Configuration
All pipeline parameters live in config/pipeline.yaml:
ingestion:
date_cutoff_days: 180
remove_code_blocks: true
observation_extraction:
max_concurrent: 20
batch_size: 40
learning_clustering:
target_clusters: 20
tolerance: 0.2
Model selection via prompt templates:
# rebrain/prompts/templates/persona_synthesis.yaml
metadata:
model_recommendation: "gemini-2.5-flash"
See config/README.md for details.
CLI Usage
# Check what's been generated
./cli.sh status
# Run individual steps
./cli.sh step1 -i data/raw/my_convos.json
./cli.sh step2 --cluster-only # Re-cluster existing observations
./cli.sh step3
# Clean outputs
./cli.sh clean --all
# Full help
./cli.sh help
See scripts/pipeline/README.md for details.
Project Structure
rebrain/
├── rebrain/ # Core library
│ ├── core/ # GenAI client
│ ├── ingestion/ # Data parsing, truncation & formatting
│ ├── operations/ # Embedder, clusterer, synthesizer
│ ├── prompts/ # Prompt templates (YAML)
│ ├── persona/ # Persona formatting
│ └── schemas/ # Pydantic models
├── config/ # Pipeline configuration
├── scripts/pipeline/ # 5-step pipeline + CLI
├── data/ # Raw → processed → persona
└── notebooks/ # Exploration & testing
Output
Persona (Step 5)
JSON (data/persona/persona.json):
{
"model": "gemini-2.5-flash",
"persona": {
"personal_profile": "...",
"communication_preferences": "...",
"professional_profile": "..."
}
}
Markdown (data/persona/persona.md):
# User Persona Information for AI
## Personal Profile
...
## Communication Preferences
...
## Professional Profile
...
Copy-paste ready for system prompts!
Development
# Install dev dependencies
pip install -r requirements_dev.txt
# Run with custom config
python scripts/pipeline/01_transform_filter.py --data-path ./data --config custom.yaml
# Check specific step
python scripts/pipeline/02_extract_cluster_observations.py --skip-cluster
Documentation
- Pipeline Details:
scripts/pipeline/README.md - Configuration:
config/README.md - Data Structure:
data/README.md - Model Override Pattern:
MODEL_OVERRIDE_PATTERN.md - Persona Builder:
PERSONA_BUILDER_REFACTOR.md
License
MIT License - see LICENSE file
Built by Yasin Salimibeni
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rebrain-0.1.7.tar.gz.
File metadata
- Download URL: rebrain-0.1.7.tar.gz
- Upload date:
- Size: 200.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
076254e39d3a8418439b3704f301ce19f2f2592affbba8e3170e05f1c7fead2b
|
|
| MD5 |
3db438115d861d5f8b3ff1e4569d1b15
|
|
| BLAKE2b-256 |
c7727003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321
|
Provenance
The following attestation bundles were made for rebrain-0.1.7.tar.gz:
Publisher:
publish.yml on yasinsb/rebrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rebrain-0.1.7.tar.gz -
Subject digest:
076254e39d3a8418439b3704f301ce19f2f2592affbba8e3170e05f1c7fead2b - Sigstore transparency entry: 621835735
- Sigstore integration time:
-
Permalink:
yasinsb/rebrain@3bf97161c135ef5f51cd4e96d51c0c7ef63ca8ac -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/yasinsb
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bf97161c135ef5f51cd4e96d51c0c7ef63ca8ac -
Trigger Event:
push
-
Statement type:
File details
Details for the file rebrain-0.1.7-py3-none-any.whl.
File metadata
- Download URL: rebrain-0.1.7-py3-none-any.whl
- Upload date:
- Size: 110.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76bcf5973dd30a73d7e6f99f50194c12f8c2bbc72caf7a0a6458ec234e9231ba
|
|
| MD5 |
2f427e31229f1f12ce0d755567371fb5
|
|
| BLAKE2b-256 |
058554866f357f20063dd4373e6bb0262b7aa51154fc4f118befbf82665073c9
|
Provenance
The following attestation bundles were made for rebrain-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on yasinsb/rebrain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rebrain-0.1.7-py3-none-any.whl -
Subject digest:
76bcf5973dd30a73d7e6f99f50194c12f8c2bbc72caf7a0a6458ec234e9231ba - Sigstore transparency entry: 621835744
- Sigstore integration time:
-
Permalink:
yasinsb/rebrain@3bf97161c135ef5f51cd4e96d51c0c7ef63ca8ac -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/yasinsb
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bf97161c135ef5f51cd4e96d51c0c7ef63ca8ac -
Trigger Event:
push
-
Statement type: