Skip to main content

Multi-modal temporal memory MCP server — RAG engine over conversation history with weight-decay model and LLM-driven consolidation

Project description

mcp-name: io.github.turbyho/mem-context

mem-context — Temporal Memory MCP Server

Multi-modal RAG engine for AI assistants. Stores conversation history, conclusions, diffs, error traces, and other development artifacts in LanceDB with vector search, multi-factor scoring, and an LLM-driven consolidation pipeline.

Python MCP License Tests

Why

AI assistants lose context between sessions. mem-context persists what matters — decisions, patterns, bugs, architecture choices — and surfaces them when relevant via vector search. Memories decay over time unless reinforced by repeated access, mimicking human memory.

Features

  • Vector search with dual backend — LanceDB ANN index for fast approximate nearest-neighbor queries. Primary embedding via Ollama mxbai-embed-large (1024d, ~670 MB). Local all-MiniLM-L6-v2 (384d) fallback when Ollama is unavailable — no GPU or network required. Embeddings are auto-padded to match schema dimension; switching backends is transparent.

  • Multi-factor relevance scoring — six independent factors combine into a single 0–1 relevance score. Each factor models a different aspect: vector_score (semantic similarity), weight_score (stored importance × time decay), recency_score (age in days), scope_score (project match), access_boost (usage reinforcement), type_boost (permanent > semantic > episodic). The model balances "what's relevant" with "what's still valid."

  • Weight decay with natural memory model — each memory type has a configurable decay_rate: 0.15/day for episodic (session captures fade fast), 0.03/day for semantic (extracted knowledge persists), 0 for permanent (never decays). Decay is exponential: weight × e^(−rate × days). Frequently accessed memories get a counteracting boost — the system reinforces what you use, archives what you don't.

  • Deduplication by cosine similarity — new memories are compared against existing ones before insertion. At similarity > 0.82, the new memory is merged into the existing one (weight boost + content update) instead of creating a duplicate. Prevents memory fragmentation from repeated captures of the same conclusion across sessions.

  • LLM-driven consolidation pipeline — 3-phase: extract (3 days), merge (7 days), archive (30 days). The server prepares candidates and prompts; the host model (Claude, DeepSeek, GPT, or local Ollama) does the reasoning. Episodic session captures → extracted conclusions (semantic) → merged permanent knowledge → archived if unused. Runs in the background when remember() or recall() is called — no cron needed.

  • Multi-modal storage — LanceDB columns for text content, code diffs, file lists, error traces, tags, and metadata. Each modality is indexed separately; vector search operates on the combined embedding. Stores not just "what happened" but the diff and stack trace that caused it.

  • Automatic conversation capture — hooks for Claude Code (Stop event) and OpenCode (on_session_end). The wrapper binary finds the current session's transcript, parses it into structured messages, and imports them as episodic memories. No manual action needed — every session is archived automatically.

  • Portable export/import — JSON export strips embeddings (re-generated on import), keeps all metadata. Use for backup, cross-device sync, or migrating between machines. Import deduplicates by ID — safe to run multiple times.

  • One-command provisioningmem-context init detects installed AI tools (Claude Code, OpenCode, Codex, Cursor), registers the MCP server, injects CLAUDE.md instructions, and installs slash-command skills (6 tools: recall, remember, forget, delete, purge, status). mem-context install adds capture hooks. Two commands, ready to use.

Installation

Linux

# 1. System dependencies
sudo pacman -S python3 python-pip  # Arch / Manjaro
# nebo
sudo apt install python3 python3-pip python3-venv  # Debian / Ubuntu
# nebo
sudo dnf install python3 python3-pip  # Fedora

# 2. Install Ollama (for embedding)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &  # Start Ollama in background

# 3. Install mem-context
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context

# 4. Add to PATH (add to ~/.bashrc or ~/.zshrc)
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# 5. Pull embedding model (~670 MB)
ollama pull mxbai-embed-large

# 6. Provision — registers MCP server + injects instructions
mem-context init                          # all detected AI tools
# or target a single tool:
mem-context init --tool claude-code       # Claude Code only
mem-context init --tool opencode          # OpenCode only

# 7. Install capture hooks (Claude Code, OpenCode)
mem-context install claude-code
mem-context install opencode       # optional
mem-context install status         # verify

# 8. Restart your AI assistant

macOS

# 1. System dependencies
brew install python@3.11

# 2. Install Ollama
brew install ollama
# Start Ollama: open Ollama.app or run `ollama serve &`

# 3-8. Same as Linux (steps 3-8 above)
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
ollama pull mxbai-embed-large
mem-context init
mem-context install claude-code

Verify installation

# Check CLI works
mem-context status

# Check Ollama + embedding model
mem-context init --check-ollama

# List detected AI tools
mem-context init --list-tools

# Check capture hooks
mem-context install status

Manual MCP registration

If mem-context init can't register the MCP server automatically:

Claude Code:

claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcp

OpenCode: Add to ~/.config/opencode/opencode.json:

{
  "mcp": {
    "mem-context": {
      "command": ["$HOME/.mem-context/.venv/bin/mem-context-mcp"],
      "enabled": true,
      "type": "local"
    }
  }
}

Usage

MCP tools (from AI assistant)

Tool Description
remember(content, type?, weight?, tags?) Store a memory with auto-embedding
recall(query, scope?, token_budget?, min_score?, limit?, type_filter?) Vector search with scoring
forget(id) Archive (weight=0)
get(id) Retrieve one memory
update(id, fields) Modify metadata
status(scope?) Memory store statistics
review() Flagged memories
consolidation_candidates(scope?) Consolidation tasks for host model

CLI

mem-context status                          # Store statistics
mem-context recall "query" --limit 5        # Search memories
mem-context get <id>                        # One memory
mem-context forget <id>                     # Archive
mem-context review                          # Flagged memories
mem-context consolidate --dry-run           # Consolidation candidates
mem-context capture transcript <path>       # Import conversation
mem-context export -o memories.json         # Export all memories
mem-context import memories.json --re-embed # Import from export
mem-context init --list-tools               # Show AI tools
mem-context install status                  # Hook status

How It Works

Write path: capture → store → embed

Session ends
  → capture hook fires (Stop / on_session_end)
  → transcript parsed into structured messages
  → each message stored as episodic memory
  → content embedded via Ollama (1024d) or local model (384d)
  → cosine similarity check: > 0.82 → merge, else insert

Read path: query → embed → search → score → return

recall("how do we handle auth?")
  → query embedded to 1024d vector
  → LanceDB ANN search (scope-filtered: same project + global)
  → raw candidates scored by 6-factor formula
  → sorted by final_score, filtered by min_score
  → token-budgeted: results accumulated until budget exhausted
  → returned to host model for use

Consolidation path: age → candidate → LLM → write-back

remember() or recall() called
  → check last_consolidation > interval_hours (24h)?
  → build_task: scan for episodic > 3d, semantic clusters > 7d
  → send prompts + candidates to host model
  → host model extracts conclusions → new semantic memories
  → host model merges similar semantics → permanent
  → low-weight (< 0.1) memories archived (weight = 0)

The host model does all reasoning — the server only prepares structured prompts and candidate lists. This means consolidation quality scales with the host model's capability (Fable 5 > Opus > Sonnet > local Ollama).

Architecture

mem-context/src/mem_context/
├── storage/lance.py        LanceDB CRUD, ANN search, FTS, export/import
│   schemas.py              PyArrow schemas: memories, relations, conversations
├── retrieval/embedder.py   Dual-backend embedding (Ollama + local fallback)
│   scoring.py              6-factor scoring: vector × weight × decay × …
├── capture/formats.py      Transcript parsers: Claude Code, OpenCode, JSON, generic
│   wrapper.py              Hook entry-point: finds transcript, runs capture
├── consolidation/
│   pipeline.py             Build tasks, run extract/merge/archive phases
│   templates.py            Prompt templates for each consolidation phase
│   ollama.py               Local model fallback for LLM tasks
├── mcp/server.py           FastMCP server: 10 tools (remember, recall, forget, …)
├── provision.py            AI tool detection, CLAUDE.md injection, skill install
├── config.py               YAML + env config with auto-detection
└── scope.py                Project scope resolution (config → path hash → global)

Scoring

final = vector_score × weight_score × recency_score × scope_score × access_boost × type_boost

vector_score = exp(-cosine_distance)
weight_score = sqrt(weight × e^(-decay_rate × days))
recency_score = e^(-recency_decay_rate × days)
  recency_decay_rate = permanent: 0.005, semantic: 0.02, episodic: 0.05
scope_score   = same_project: 1.0, global: 0.8, other: 0.4
access_boost  = min(2.0, 1.0 + 0.1 × access_count)
type_boost    = permanent: 2.0, semantic: 1.2, episodic: 1.0

Memory types

Type Default weight Decay rate Use
episodic 0.5 0.15/day Session captures, debugging
semantic 0.7 0.03/day Extracted conclusions, patterns
permanent 1.0 0.0 Architecture decisions, conventions

Consolidation pipeline

Phase Trigger Action
Extract 3 days Episodic → host model extracts conclusions → semantic
Merge 7 days Semantic cluster by embedding → host model merges
Archive 30 days weight < 0.1 → weight = 0

The server prepares prompts and candidates; the host model (Claude, DeepSeek, GPT) does the reasoning and writes results back via MCP tools.

Automatic background consolidation

No cron needed — consolidation runs automatically in the background when remember() or recall() is called, at most once per interval_hours (default 24h).

Configuration

All parameters are configurable via ~/.mem-context/config.yaml, .mem-context/config.yaml, or environment variables. See Configuration docs for all options.

# Quick overrides
export MEM_CONTEXT_CONSOLIDATION_MODEL=qwen2.5-coder:14b  # model
export MEM_CONTEXT_CONSOLIDATION_TEMPERATURE=0.1           # 0.0-1.0
export MEM_CONTEXT_CONSOLIDATION_TIMEOUT=300               # seconds
Parameter Default Env var Description
model auto-detect CONSOLIDATION_MODEL 14b→7b→3b, or override
num_ctx 8192 CONSOLIDATION_NUM_CTX Context window tokens
temperature 0.2 CONSOLIDATION_TEMPERATURE Determinism (0.0–1.0)
timeout 120s CONSOLIDATION_TIMEOUT Ollama API timeout
extract_after_days 3 CONSOLIDATION_EXTRACT_AFTER_DAYS Episodic → extraction
merge_after_days 7 CONSOLIDATION_MERGE_AFTER_DAYS Semantic → merge
archive_after_days 30 CONSOLIDATION_ARCHIVE_AFTER_DAYS Low weight → archive
max_extract 20 CONSOLIDATION_MAX_EXTRACT Candidates per run
max_merge 10 CONSOLIDATION_MAX_MERGE Merge groups per run
interval_hours 24 Hours between runs

Model auto-detection

If no model is configured, the system:

  1. Detects GPU VRAM (NVIDIA, AMD, macOS Metal/Apple Silicon)
  2. Picks the best model that fits: 14b (9+ GB) → 7b (5+ GB) → 3b (4+ GB)
  3. Auto-pulls it via Ollama if not installed
  4. Falls back to smaller model on OOM errors

No GPU: Minimum qwen2.5-coder:3b (~4 GB system RAM, slow on CPU). MCP path doesn't need a local model — host LLM does the work.

Scope detection

1. .mem-context/config.yaml → project_id → scope = "proj:" + hash
2. Fallback → scope = "path:" + hash(cwd)
3. `scope="global"` is explicit-only — never auto-detected

Requirements

  • Python 3.11+
  • Ollama (for embedding) — mxbai-embed-large (~670 MB, recommended)
  • Or: sentence-transformers local fallback (all-MiniLM-L6-v2, 384d)
  • Consolidation model: auto-detected and auto-installed (see above)

Installation options

mem-context init — instructions + skills (all 5 tools)

mem-context init                    # All detected AI tools
mem-context init --tool claude-code # Claude Code only
mem-context init --tool opencode    # OpenCode only
mem-context init --tool codex       # Codex only
mem-context init --tool cursor      # Cursor only (project-scoped)
mem-context init --dry-run          # Preview without changes
mem-context init --list-tools       # Show what's detected

mem-context install — capture hooks (2 tools)

mem-context install claude-code     # Stop hook → settings.local.json
mem-context install opencode        # on_session_end hook → config.yaml
mem-context install status          # Check all
mem-context install uninstall -c claude-code  # Remove

Manual MCP registration

claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcp

Documentation

Document Content
Installation Detailed setup, Ollama, config
Configuration Všechny parametry s vysvětlením
MCP Tools Tool reference with schemas and examples
Architecture Storage, scoring, retrieval pipeline
Consolidation Pipeline phases, host model workflow
Provisioning mem-context init, client support
Capture Automatic transcript capture setup
Test Scenarios 28 sections, 100+ test cases

Development

git clone ssh://git@git.montyho.com/turbyho/mem-context.git
cd mem-context
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python3 -m pytest tests/ -q  # 113 tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mem_context-0.1.2.tar.gz (118.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mem_context-0.1.2-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file mem_context-0.1.2.tar.gz.

File metadata

  • Download URL: mem_context-0.1.2.tar.gz
  • Upload date:
  • Size: 118.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for mem_context-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4415f6bf0daf3bb68a251087f5e2e9f0c9c8dbd9a727377d5d5b0a37563a4a85
MD5 ae67ed42b3e1f380eeeecd645b5d89cf
BLAKE2b-256 2733c020e1a37fb2cad5733178854fd9889c4bff1ef220cb29334cd17740fa13

See more details on using hashes here.

File details

Details for the file mem_context-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mem_context-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for mem_context-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a61c71fb44a21e32d68ed7e073c6296a4bc13c1e16c5272dc9d37afbdf42013a
MD5 a89ff63159c7b5888c777e60e8869c96
BLAKE2b-256 5fed46bb53e10469849e83cd8c60e75991bac9e77aeb55fd08b5a48450fec78a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page