Layered Memory MCP Server — Extend AI agent memory beyond token limits with a 4-tier knowledge architecture

These details have not been verified by PyPI

Project links

Project description

Layered Memory MCP Server

Extend AI agent memory beyond token limits with a 4-tier knowledge architecture.

The Problem

AI agents have limited memory — typically 2-4KB of persistent context injected every turn. Once it's full, the agent forgets everything else. You can't store project configurations, user preferences, API conventions, or domain knowledge without constantly fighting the space limit.

The Solution

Layered Memory organizes knowledge into 4 tiers, trading immediacy for capacity:

┌─────────────────────────────────────────────────────┐
│  L0 — Index Layer (2-4KB, injected every turn)      │
│  Pure pointers: "what knowledge exists and where"    │
├─────────────────────────────────────────────────────┤
│  L1 — Knowledge Files (unlimited, loaded on-demand)  │
│  Structured markdown: configs, conventions, facts    │
├─────────────────────────────────────────────────────┤
│  L2 — Skills Layer (loaded when needed)              │
│  Procedures, workflows, tool-specific knowledge      │
├─────────────────────────────────────────────────────┤
│  L3 — Raw Sessions (searched rarely)                 │
│  Full conversation history, searchable by keyword    │
└─────────────────────────────────────────────────────┘

L0 is your table of contents. L1 is your bookshelf. L2 is your cookbook. L3 is your diary.

Features

Keyword Search — Find relevant knowledge across all L1 files with relevance scoring
Session Scanning — Extract knowledge candidates from recent agent sessions
Space Analytics — Monitor memory usage and get optimization suggestions
Agent Agnostic — Works with any MCP-compatible agent (Hermes, Claude, Cursor, etc.)
Zero Dependencies — Core engine uses only Python stdlib; only fastmcp for MCP transport
Privacy First — All data stays local, no external API calls

Quick Start

Install

pip install layered-memory-mcp

Hermes Agent

Add to ~/.hermes/config.yaml:

mcp_servers:
  layered-memory:
    command: layered-memory-mcp
    timeout: 30

OpenClaw

Install the MCP server, then register it:

pip install layered-memory-mcp

# Register as an MCP server
openclaw mcp set layered-memory --command layered-memory-mcp

Layered Memory complements OpenClaw's built-in vector-based memory:

OpenClaw memory: semantic search over session transcripts (heavy, needs embeddings)
Layered Memory: structured keyword search over curated knowledge files (light, instant)
Use both: OpenClaw for "what did I say about X?" and Layered Memory for "what's the database connection string?"

Claude Desktop

Add to your Claude Desktop MCP config:

{
  "mcpServers": {
    "layered-memory": {
      "command": "layered-memory-mcp"
    }
  }
}

Cursor / Other MCP Clients

# stdio mode (default)
layered-memory-mcp

# HTTP mode
layered-memory-mcp --transport http --port 8080

Environment Variables

Variable	Description	Default
`LAYERED_MEMORY_HOME`	Root directory for memory data	`~/.layered-memory/`
`LAYERED_MEMORY_SESSIONS_DIR`	Agent sessions directory (auto-detected)	`~/.hermes/sessions/`

Usage

1. Initialize Knowledge Base

Create markdown files in ~/.layered-memory/knowledge/:

mkdir -p ~/.layered-memory/knowledge

Create your first knowledge file:

<!-- ~/.layered-memory/knowledge/infrastructure.md -->
## Server Configuration
- Production server: prod.example.com (port 22)
- Staging server: stage.example.com
- Deploy via: `./deploy.sh --env production`

## Database
- PostgreSQL 15 on prod-db:5432
- Connection pool: 20 max connections

2. Build L0 Index

In your agent's persistent memory (the 2-4KB injected every turn), store only pointers:

[L0] infrastructure: server config, DB, deploy → knowledge/infrastructure.md
[L0] api-conventions: REST patterns, auth, errors → knowledge/api-conventions.md
[L0] user-prefs: coding style, tool preferences → knowledge/user-prefs.md

3. Search Knowledge (MCP Tool)

The agent calls recall_knowledge when it needs details:

Agent: "What's the database connection string?"
→ recall_knowledge(keyword="database")
← Returns relevant sections from infrastructure.md

4. Session Compression (Cron Job)

Set up a daily cron to extract new knowledge from conversations:

1. scan_recent_sessions → get session summaries
2. AI analyzes summaries → identifies stable facts
3. New facts → written to L1 knowledge files
4. L0 index → updated with new pointers

MCP Tools

Tool	Description
`recall_knowledge`	Search L1 knowledge files by keyword
`scan_recent_sessions`	Scan recent sessions for knowledge candidates
`get_knowledge_file`	Read a specific knowledge file
`list_memory_stats`	Get space statistics and optimization suggestions
`search_sessions_by_keyword`	Search session history for a keyword

MCP Resources

Resource	Description
`memory://status`	Overall system status and configuration
`knowledge://files`	List all knowledge files with metadata

MCP Prompts

Prompt	Description
`knowledge_compression_prompt`	Template for AI-driven knowledge extraction from sessions

Architecture Deep Dive

Why 4 Tiers?

Tier	Cost	Capacity	Use Case
L0 (Index)	Tokens per turn	~2KB	Quick lookup table
L1 (Knowledge)	1 file read	Unlimited	Structured facts
L2 (Skills)	1 skill load	Unlimited	Procedures
L3 (Sessions)	Full search	Unlimited	Historical recall

Relevance Scoring

When you call recall_knowledge, files are scored by:

Filename match (+10 points) — keyword appears in filename
Heading match (+3 points) — keyword appears in a ## heading
Content frequency (+0.5 per occurrence, capped at 5) — how often keyword appears

Results are sorted by score, and only matching ## sections are returned (not entire files).

Session Compression

The scan_recent_sessions tool is designed for cron-job automation:

It scans session files from the past N days
Extracts user messages, assistant topics, and tool calls
Returns a structured JSON for an AI to analyze
The AI identifies stable knowledge and writes it to L1 files

This creates a self-improving memory system — the agent gets smarter over time as more knowledge is distilled from conversations.

Agent Compatibility

Layered Memory is an MCP server — it works with any MCP-compatible agent.

Agent	Config Method	Notes
Hermes Agent	`config.yaml` → `mcp_servers`	Native MCP client, zero config
OpenClaw	`openclaw mcp set`	Complements built-in vector memory
Claude Desktop	`claude_desktop_config.json`	Full MCP support
Cursor	Settings → MCP	Full MCP support
Codex CLI	Codex MCP config	Full MCP support
Any MCP client	stdio or HTTP transport	Standard MCP protocol

When to use Layered Memory vs. built-in memory

Most agents have limited persistent memory (2-4KB per turn). Layered Memory solves this by:

Separating index from content — L0 stays small (fits in agent memory), L1 holds unlimited knowledge
On-demand loading — the agent only reads what it needs, when it needs it
Self-improving — session compression automatically extracts new knowledge over time

Integration patterns

Agent (2KB memory limit)
  └── L0 index (injected every turn, ~500 bytes)
        ├── [L0] infrastructure: servers, DB → knowledge/infrastructure.md
        ├── [L0] api: REST conventions → knowledge/api-conventions.md
        └── [L0] dev: code style, testing → knowledge/development.md
              │
              ↓ (on demand via recall_knowledge)
        L1 knowledge files (unlimited, loaded by keyword)

Cognitive Decision Framework

The 4-tier architecture only works if the agent follows a disciplined decision process. This framework should be injected into the agent's system prompt (or loaded via the cognitive_decision_prompt MCP prompt) to ensure consistent behavior.

Decision Tree

Agent encounters a problem or receives a request
  │
  ├─ Step 1: Scan L0 index for relevant domains
  │
  ├─ Step 2: Match found?
  │   ├─ YES → Load the corresponding L1 knowledge file / L2 skill
  │   │   │
  │   │   ├─ Knowledge solves it → Use it. Do NOT bypass with guessing.
  │   │   ├─ Knowledge partially covers it → Use it, then enhance the entry.
  │   │   └─ Knowledge insufficient → Treat as new problem (Step 3).
  │   │
  │   └─ NO → Treat as new problem (Step 3).
  │
  ├─ Step 3: Handle as new problem/requirement
  │   Use standard tools and reasoning to solve.
  │
  └─ Step 4: Post-solution evaluation
      Is this worth preserving?
      ├─ YES → Write to L1 (knowledge) or L2 (skill) for future reuse.
      └─ NO  → Done.

Why This Matters

Without this decision framework, agents tend to:

Ignore existing knowledge — they see the L0 index but forget to load L1 files, then waste time guessing
Repeat mistakes — solved problems aren't captured, so the agent re-learns from scratch next session
Bypass established conventions — each session starts from zero instead of building on accumulated knowledge

The framework turns the memory system from a passive storage into an active cognitive loop: consult → act → learn → improve.

Integration

Add this to your agent's system prompt:

You use a 4-tier layered memory system. Before tackling any problem:
1. Check L0 index for matching domains
2. If matched, load and follow L1/L2 before acting
3. If unmatched, solve normally
4. After solving, consider if knowledge should be preserved

Or use the built-in MCP prompt cognitive_decision_prompt to get the full decision framework at runtime.

Development

# Clone
git clone https://github.com/LAIguapi/layered-memory-mcp.git
cd layered-memory-mcp

# Install in dev mode
pip install -e ".[dev]"

# Run tests
pytest

# Run locally
python -m layered_memory_mcp.server

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.9.2

Jun 29, 2026

2.9.1

Jun 24, 2026

2.9.0

Jun 24, 2026

2.7.0

Jun 21, 2026

2.6.0

Jun 21, 2026

2.5.0

Jun 21, 2026

2.3.0

Jun 16, 2026

2.2.1

Jun 13, 2026

2.2.0

Jun 13, 2026

2.1.4

May 31, 2026

2.1.3

May 26, 2026

2.1.2

May 20, 2026

2.1.1

May 15, 2026

2.1.0

May 15, 2026

2.0.0

May 12, 2026

1.2.0

May 10, 2026

1.1.0

May 8, 2026

1.0.0

May 8, 2026

0.7.3

May 8, 2026

0.7.2

May 8, 2026

0.7.1

May 8, 2026

0.7.0

May 8, 2026

0.6.0

May 8, 2026

This version

0.5.0

May 8, 2026

0.3.1

May 4, 2026

0.2.0

May 4, 2026

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layered_memory_mcp-0.5.0.tar.gz (54.1 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

layered_memory_mcp-0.5.0-py3-none-any.whl (34.0 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file layered_memory_mcp-0.5.0.tar.gz.

File metadata

Download URL: layered_memory_mcp-0.5.0.tar.gz
Upload date: May 8, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for layered_memory_mcp-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`160a68cccc60bb1605f51700b477df4dad07e35475973254dec3d4d2396048c0`
MD5	`2d36871916199e7ed8f81689b8899ed1`
BLAKE2b-256	`b207c619acfa5c99f3a72ceebcb513061507e198fe0d0c833f4d0dc9c11deeb8`

See more details on using hashes here.

File details

Details for the file layered_memory_mcp-0.5.0-py3-none-any.whl.

File metadata

Download URL: layered_memory_mcp-0.5.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 34.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for layered_memory_mcp-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`71cda93ed500d967b4724263077c6b7da9219e98429327b76ca8e82b73839371`
MD5	`9587ee78274c5179df50a2df40cdbc86`
BLAKE2b-256	`43a0618eba0b36334365c0415aeddb50c8c3662296a1760bf7a96bc51b984e40`

See more details on using hashes here.

layered-memory-mcp 0.5.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Layered Memory MCP Server

The Problem

The Solution

Features

Quick Start

Install

Hermes Agent

OpenClaw

Claude Desktop

Cursor / Other MCP Clients

Environment Variables

Usage

1. Initialize Knowledge Base

2. Build L0 Index

3. Search Knowledge (MCP Tool)

4. Session Compression (Cron Job)

MCP Tools

MCP Resources

MCP Prompts

Architecture Deep Dive

Why 4 Tiers?

Relevance Scoring

Session Compression

Agent Compatibility

When to use Layered Memory vs. built-in memory

Integration patterns

Cognitive Decision Framework

Decision Tree

Why This Matters

Integration

Development

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes