Skip to main content

Layered Memory MCP Server — Extend AI agent memory beyond token limits with a 4-tier knowledge architecture

Project description

Layered Memory MCP Server

Extend AI agent memory beyond token limits with a 4-tier knowledge architecture.

中文 | 日本語 | 한국어

PyPI version MCP Compatible Python 3.10+ License: MIT

The Problem

AI agents have limited memory — typically 2-4KB of persistent context injected every turn. Once it's full, the agent forgets everything else. You can't store project configurations, user preferences, API conventions, or domain knowledge without constantly fighting the space limit.

The Solution

Layered Memory organizes knowledge into 4 tiers, trading immediacy for capacity:

┌─────────────────────────────────────────────────────┐
│  L0 — Index Layer (2-4KB, injected every turn)      │
│  Pure pointers: "what knowledge exists and where"    │
├─────────────────────────────────────────────────────┤
│  L1 — Knowledge Files (unlimited, loaded on-demand)  │
│  Structured markdown: configs, conventions, facts    │
├─────────────────────────────────────────────────────┤
│  L2 — Skills Layer (loaded when needed)              │
│  Procedures, workflows, tool-specific knowledge      │
├─────────────────────────────────────────────────────┤
│  L3 — Raw Sessions (searched rarely)                 │
│  Full conversation history, searchable by keyword    │
└─────────────────────────────────────────────────────┘

L0 is your table of contents. L1 is your bookshelf. L2 is your cookbook. L3 is your diary.

Features

  • Keyword Search — Find relevant knowledge across all L1 files with relevance scoring
  • Session Scanning — Extract knowledge candidates from recent agent sessions
  • Space Analytics — Monitor memory usage and get optimization suggestions
  • Agent Agnostic — Works with any MCP-compatible agent (Hermes, Claude, Cursor, etc.)
  • Zero Dependencies — Core engine uses only Python stdlib; only fastmcp for MCP transport
  • Privacy First — All data stays local, no external API calls

Quick Start

Install

pip install layered-memory-mcp

Hermes Agent

Add to ~/.hermes/config.yaml:

mcp_servers:
  layered-memory:
    command: layered-memory-mcp
    timeout: 30

OpenClaw

Install the MCP server, then register it:

pip install layered-memory-mcp

# Register as an MCP server
openclaw mcp set layered-memory --command layered-memory-mcp

Layered Memory complements OpenClaw's built-in vector-based memory:

  • OpenClaw memory: semantic search over session transcripts (heavy, needs embeddings)
  • Layered Memory: structured keyword search over curated knowledge files (light, instant)
  • Use both: OpenClaw for "what did I say about X?" and Layered Memory for "what's the database connection string?"

Claude Desktop

Add to your Claude Desktop MCP config:

{
  "mcpServers": {
    "layered-memory": {
      "command": "layered-memory-mcp"
    }
  }
}

Cursor / Other MCP Clients

# stdio mode (default)
layered-memory-mcp

# HTTP mode
layered-memory-mcp --transport http --port 8080

# Verbose logging
layered-memory-mcp --verbose

Environment Variables

Variable Description Default
LAYERED_MEMORY_HOME Root directory for memory data ~/.layered-memory/
LAYERED_MEMORY_SESSIONS_DIR Agent sessions directory (auto-detected) ~/.hermes/sessions/

Usage

1. Initialize Knowledge Base

Create markdown files in ~/.layered-memory/knowledge/:

mkdir -p ~/.layered-memory/knowledge

Create your first knowledge file:

<!-- ~/.layered-memory/knowledge/infrastructure.md -->
## Server Configuration
- Production server: prod.example.com (port 22)
- Staging server: stage.example.com
- Deploy via: `./deploy.sh --env production`

## Database
- PostgreSQL 15 on prod-db:5432
- Connection pool: 20 max connections

2. Build L0 Index

In your agent's persistent memory (the 2-4KB injected every turn), store only pointers:

[L0] infrastructure: server config, DB, deploy → knowledge/infrastructure.md
[L0] api-conventions: REST patterns, auth, errors → knowledge/api-conventions.md
[L0] user-prefs: coding style, tool preferences → knowledge/user-prefs.md

3. Search Knowledge (MCP Tool)

The agent calls recall_knowledge when it needs details:

Agent: "What's the database connection string?"
→ recall_knowledge(keyword="database")
← Returns relevant sections from infrastructure.md

4. Session Compression (Cron Job)

Set up a daily cron to extract new knowledge from conversations:

1. scan_recent_sessions → get session summaries
2. AI analyzes summaries → identifies stable facts
3. New facts → written to L1 knowledge files
4. L0 index → updated with new pointers

MCP Tools

Tool Description
recall_knowledge Search L1 knowledge files by keyword
scan_recent_sessions Scan recent sessions for knowledge candidates
get_knowledge_file Read a specific knowledge file
list_memory_stats Get space statistics and optimization suggestions
search_sessions_by_keyword Search session history for a keyword

MCP Resources

Resource Description
memory://status Overall system status and configuration
knowledge://files List all knowledge files with metadata

MCP Prompts

Prompt Description
knowledge_compression_prompt Template for AI-driven knowledge extraction from sessions

Architecture Deep Dive

Why 4 Tiers?

Tier Cost Capacity Use Case
L0 (Index) Tokens per turn ~2KB Quick lookup table
L1 (Knowledge) 1 file read Unlimited Structured facts
L2 (Skills) 1 skill load Unlimited Procedures
L3 (Sessions) Full search Unlimited Historical recall

Relevance Scoring

When you call recall_knowledge, files are scored by:

  1. Filename match (+10 points) — keyword appears in filename
  2. Heading match (+3 points) — keyword appears in a ## heading
  3. Content frequency (+0.5 per occurrence, capped at 5) — how often keyword appears

Results are sorted by score, and only matching ## sections are returned (not entire files).

Session Compression

The scan_recent_sessions tool is designed for cron-job automation:

  1. It scans session files from the past N days
  2. Extracts user messages, assistant topics, and tool calls
  3. Returns a structured JSON for an AI to analyze
  4. The AI identifies stable knowledge and writes it to L1 files

This creates a self-improving memory system — the agent gets smarter over time as more knowledge is distilled from conversations.

Agent Compatibility

Layered Memory is an MCP server — it works with any MCP-compatible agent.

Agent Config Method Notes
Hermes Agent config.yamlmcp_servers Native MCP client, zero config
OpenClaw openclaw mcp set Complements built-in vector memory
Claude Desktop claude_desktop_config.json Full MCP support
Cursor Settings → MCP Full MCP support
Codex CLI Codex MCP config Full MCP support
Any MCP client stdio or HTTP transport Standard MCP protocol

When to use Layered Memory vs. built-in memory

Most agents have limited persistent memory (2-4KB per turn). Layered Memory solves this by:

  1. Separating index from content — L0 stays small (fits in agent memory), L1 holds unlimited knowledge
  2. On-demand loading — the agent only reads what it needs, when it needs it
  3. Self-improving — session compression automatically extracts new knowledge over time

Integration patterns

Agent (2KB memory limit)
  └── L0 index (injected every turn, ~500 bytes)
        ├── [L0] infrastructure: servers, DB → knowledge/infrastructure.md
        ├── [L0] api: REST conventions → knowledge/api-conventions.md
        └── [L0] dev: code style, testing → knowledge/development.md
              │
              ↓ (on demand via recall_knowledge)
        L1 knowledge files (unlimited, loaded by keyword)

Development

# Clone
git clone https://github.com/LAIguapi/layered-memory-mcp.git
cd layered-memory-mcp

# Install in dev mode
pip install -e ".[dev]"

# Run tests
pytest

# Run locally
python -m layered_memory_mcp.server

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layered_memory_mcp-0.2.0.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

layered_memory_mcp-0.2.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file layered_memory_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: layered_memory_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for layered_memory_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7c5f78e27e41efebf8df7e2747e0437a517d92cedbad20abaf52d826a664f196
MD5 88bf89ee1f6182da4cc729fd5014ff7c
BLAKE2b-256 3b7bc606ece4c3ac6654b5a49e9b97ca8a4351b997d8181f320367635340a133

See more details on using hashes here.

File details

Details for the file layered_memory_mcp-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for layered_memory_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bb0a87250bdffc0e45f0fde6bc5f85e1cc3e6f423035a5e479fe898ca62d029
MD5 550d7e9bd15a10eedd9393df0e5e1eb8
BLAKE2b-256 735848850f2c84b7c6c121cc52ab89d9e73dc08c184c3ab30a98940ddf0d15c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page