Layered Memory MCP Server — Extend AI agent memory beyond token limits with a 4-tier knowledge architecture

These details have not been verified by PyPI

Project links

Project description

Layered Memory MCP Server

Extend AI agent memory beyond token limits with a 4-tier knowledge architecture.

The Problem

AI agents have limited memory — typically 2-4KB of persistent context injected every turn. Once it's full, the agent forgets everything else. You can't store project configurations, user preferences, API conventions, or domain knowledge without constantly fighting the space limit.

The Solution

Layered Memory organizes knowledge into 4 tiers, trading immediacy for capacity:

┌─────────────────────────────────────────────────────┐
│  L0 — Index Layer (2-4KB, injected every turn)      │
│  Pure pointers: "what knowledge exists and where"    │
├─────────────────────────────────────────────────────┤
│  L1 — Knowledge Files (unlimited, loaded on-demand)  │
│  Structured markdown: configs, conventions, facts    │
├─────────────────────────────────────────────────────┤
│  L2 — Skills Layer (loaded when needed)              │
│  Procedures, workflows, tool-specific knowledge      │
├─────────────────────────────────────────────────────┤
│  L3 — Raw Sessions (searched rarely)                 │
│  Full conversation history, searchable by keyword    │
└─────────────────────────────────────────────────────┘

L0 is your table of contents. L1 is your bookshelf. L2 is your cookbook. L3 is your diary.

Features

Smart Knowledge Injection — Write-once, fully-visible: inject with dedup, section targeting, and automatic L0 index sync
Keyword Search — Find relevant knowledge across all L1 files with relevance scoring
Session Scanning — Extract knowledge candidates from recent agent sessions
Health Validation — Check L0↔L1 consistency, detect orphans and stale entries
Space Analytics — Monitor memory usage and get optimization suggestions
Agent Agnostic — Works with any MCP-compatible agent (Hermes, Claude, Cursor, etc.)
Zero Dependencies — Core engine uses only Python stdlib; only fastmcp for MCP transport
Privacy First — All data stays local, no external API calls

Quick Start

Install

pip install layered-memory-mcp

Hermes Agent

Add to ~/.hermes/config.yaml:

mcp_servers:
  layered-memory:
    command: layered-memory-mcp
    timeout: 30

OpenClaw

Install the MCP server, then register it:

pip install layered-memory-mcp

# Register as an MCP server
openclaw mcp set layered-memory --command layered-memory-mcp

Layered Memory complements OpenClaw's built-in vector-based memory:

OpenClaw memory: semantic search over session transcripts (heavy, needs embeddings)
Layered Memory: structured keyword search over curated knowledge files (light, instant)
Use both: OpenClaw for "what did I say about X?" and Layered Memory for "what's the database connection string?"

Claude Desktop

Add to your Claude Desktop MCP config:

{
  "mcpServers": {
    "layered-memory": {
      "command": "layered-memory-mcp"
    }
  }
}

Cursor / Other MCP Clients

# stdio mode (default)
layered-memory-mcp

# HTTP mode
layered-memory-mcp --transport http --port 8080

# Verbose logging
layered-memory-mcp --verbose

Environment Variables

Variable	Description	Default
`LAYERED_MEMORY_HOME`	Root directory for memory data	`~/.layered-memory/`
`LAYERED_MEMORY_SESSIONS_DIR`	Agent sessions directory (auto-detected)	`~/.hermes/sessions/`
`LAYERED_MEMORY_AUTO_SYNC_L0`	Auto-sync L0 index after writes	`true`
`LAYERED_MEMORY_DEDUP_THRESHOLD`	Dedup similarity threshold (0.3-1.0)	`0.7`
`LAYERED_MEMORY_L0_FORMAT`	L0 index format: `hermes` or `generic`	`hermes`

Usage

1. Write Knowledge (Recommended)

The inject_knowledge tool is the primary write path for all agents. It handles deduplication, section targeting, and automatic L0 index sync in a single call.

Agent learns: "Production DB is PostgreSQL 15 on prod-db:5432"
→ inject_knowledge(
    domain="infrastructure",
    section="Database",
    content="PostgreSQL 15 on prod-db:5432, connection pool: 20 max",
    mode="upsert"
  )
← Creates/updates infrastructure.md, auto-syncs L0 index

Write modes:

Mode	Behavior
`upsert` (default)	Replace if similar content exists, append if new
`append`	Always append, skip dedup check
`merge`	Combine unique parts from new + existing

2. Read Knowledge

Agent: "What's the database connection string?"
→ recall_knowledge(keyword="database")
← Returns relevant sections from infrastructure.md

3. Validate Health

→ validate_knowledge()
← Checks L0↔L1 consistency, orphaned files, stale entries, file health

4. Session Compression (Cron Job)

Set up a daily cron to extract new knowledge from conversations:

1. scan_recent_sessions → get session summaries
2. AI analyzes summaries → identifies stable facts
3. New facts → written via inject_knowledge (auto L0 sync)
4. L0 index → always up to date

5. Legacy CRUD (Also Available)

For direct file manipulation:

Tool	Description
`create_knowledge_file`	Create a new .md file (auto L0 sync)
`update_knowledge_file`	Overwrite an existing file (auto L0 sync)
`delete_knowledge_file`	Delete a file (auto L0 sync)

MCP Tools

Read Tools

Tool	Description
`recall_knowledge`	Search L1 knowledge files by keyword with relevance scoring
`get_knowledge_file`	Read a specific knowledge file by name
`list_memory_stats`	Get space statistics, file sizes, and optimization suggestions
`scan_recent_sessions`	Scan recent sessions for knowledge extraction candidates
`search_sessions_by_keyword`	Search session history for a keyword

Write Tools

Tool	Description
`inject_knowledge`	Primary write path — smart injection with dedup, section targeting, auto L0 sync
`create_knowledge_file`	Create a new .md file (auto L0 sync)
`update_knowledge_file`	Overwrite an existing file (auto L0 sync)
`delete_knowledge_file`	Delete a file (auto L0 sync)

Management Tools

Tool	Description
`sync_l0_index`	Manually rebuild L0 index from L1 files (supports `dry_run`)
`validate_knowledge`	Health check: L0↔L1 consistency, file quality, duplicates
`manage_l0_entry`	Add / remove / replace individual L0 entries

MCP Resources

Resource	Description
`memory://status`	Overall system status and configuration
`knowledge://files`	List all knowledge files with metadata

MCP Prompts

Prompt	Description
`knowledge_compression_prompt`	Template for AI-driven knowledge extraction from sessions
`cognitive_decision_prompt`	Decision framework for disciplined memory usage

Architecture Deep Dive

Why 4 Tiers?

Tier	Cost	Capacity	Use Case
L0 (Index)	Tokens per turn	~2KB	Quick lookup table
L1 (Knowledge)	1 file read	Unlimited	Structured facts
L2 (Skills)	1 skill load	Unlimited	Procedures
L3 (Sessions)	Full search	Unlimited	Historical recall

Write-Once, Fully-Visible Pipeline (v0.5.0)

The key innovation in v0.5.0 is that every write path automatically syncs the L0 index:

Agent calls inject_knowledge(domain="infra", section="Proxy", content="...")
  │
  ├─ 1. Dedup check (SequenceMatcher, threshold=0.7)
  ├─ 2. Resolve action: upsert / append / merge / skip
  ├─ 3. Section targeting (finds or creates ## heading)
  ├─ 4. File write (with fcntl.flock for concurrency safety)
  └─ 5. Auto L0 index sync
        │
        ↓
  L0 index updated → Agent sees it next turn

This eliminates the "write-but-invisible" problem where agents write L1 files but the L0 index (injected into every turn) doesn't update, causing future sessions to ignore the new knowledge.

Relevance Scoring

When you call recall_knowledge, files are scored by:

Filename match (+10 points) — keyword appears in filename
Heading match (+3 points) — keyword appears in a ## heading
Content frequency (+0.5 per occurrence, capped at 5) — how often keyword appears

Results are sorted by score, and only matching ## sections are returned (not entire files).

L0 Index Formats

Two formats are supported:

Format	Example	Best For
`hermes`	`[L0索引] infra: servers, DB → knowledge/infra.md`	Hermes Agent memory injection
`generic`	`[infra.md] Server Configuration → proxy, db, deploy`	Standalone / other agents

Configure via LAYERED_MEMORY_L0_FORMAT env var or the l0_format constructor argument.

Session Compression

The scan_recent_sessions tool is designed for cron-job automation:

It scans session files from the past N days
Extracts user messages, assistant topics, and tool calls
Returns a structured JSON for an AI to analyze
The AI identifies stable knowledge and writes it to L1 files via inject_knowledge

This creates a self-improving memory system — the agent gets smarter over time as more knowledge is distilled from conversations.

Agent Compatibility

Layered Memory is an MCP server — it works with any MCP-compatible agent.

Agent	Config Method	Notes
Hermes Agent	`config.yaml` → `mcp_servers`	Native MCP client, L0 auto-injection via memory
OpenClaw	`openclaw mcp set`	Complements built-in vector memory
Claude Desktop	`claude_desktop_config.json`	Full MCP support, L0 via tool calls
Cursor	Settings → MCP	Full MCP support
Codex CLI	Codex MCP config	Full MCP support
Any MCP client	stdio or HTTP transport	Standard MCP protocol

When to use Layered Memory vs. built-in memory

Most agents have limited persistent memory (2-4KB per turn). Layered Memory solves this by:

Separating index from content — L0 stays small (fits in agent memory), L1 holds unlimited knowledge
On-demand loading — the agent only reads what it needs, when it needs it
Self-improving — session compression automatically extracts new knowledge over time

Integration patterns

Agent (2KB memory limit)
  └── L0 index (injected every turn, ~500 bytes)
        ├── [L0] infrastructure: servers, DB → knowledge/infrastructure.md
        ├── [L0] api: REST conventions → knowledge/api-conventions.md
        └── [L0] dev: code style, testing → knowledge/development.md
              │
              ↓ (on demand via recall_knowledge)
        L1 knowledge files (unlimited, loaded by keyword)

Cognitive Decision Framework

The 4-tier architecture only works if the agent follows a disciplined decision process. This framework should be injected into the agent's system prompt (or loaded via the cognitive_decision_prompt MCP prompt) to ensure consistent behavior.

Decision Tree

Agent encounters a problem or receives a request
  │
  ├─ Step 1: Scan L0 index for relevant domains
  │
  ├─ Step 2: Match found?
  │   ├─ YES → Load the corresponding L1 knowledge file / L2 skill
  │   │   │
  │   │   ├─ Knowledge solves it → Use it. Do NOT bypass with guessing.
  │   │   ├─ Knowledge partially covers it → Use it, then enhance the entry.
  │   │   └─ Knowledge insufficient → Treat as new problem (Step 3).
  │   │
  │   └─ NO → Treat as new problem (Step 3).
  │
  ├─ Step 3: Handle as new problem/requirement
  │   Use standard tools and reasoning to solve.
  │
  └─ Step 4: Post-solution evaluation
      Is this worth preserving?
      ├─ YES → Write to L1 (via inject_knowledge) or L2 (skill) for future reuse.
      └─ NO  → Done.

Why This Matters

Without this decision framework, agents tend to:

Ignore existing knowledge — they see the L0 index but forget to load L1 files, then waste time guessing
Repeat mistakes — solved problems aren't captured, so the agent re-learns from scratch next session
Bypass established conventions — each session starts from zero instead of building on accumulated knowledge

The framework turns the memory system from a passive storage into an active cognitive loop: consult → act → learn → improve.

Integration

Add this to your agent's system prompt:

You use a 4-tier layered memory system. Before tackling any problem:
1. Check L0 index for matching domains
2. If matched, load and follow L1/L2 before acting
3. If unmatched, solve normally
4. After solving, use inject_knowledge to preserve new knowledge

Or use the built-in MCP prompt cognitive_decision_prompt to get the full decision framework at runtime.

Development

# Clone
git clone https://github.com/LAIguapi/layered-memory-mcp.git
cd layered-memory-mcp

# Install in dev mode
pip install -e ".[dev]"

# Run tests
pytest

# Run locally
python -m layered_memory_mcp.server

Changelog

v0.5.0 — Write-Once, Fully-Visible

inject_knowledge tool — Primary write path with dedup, section targeting, auto L0 sync
sync_l0_index tool — Manual L0 index rebuild with dry_run preview
validate_knowledge tool — L0↔L1 consistency check, health diagnostics
manage_l0_entry tool — Fine-grained L0 entry add/remove/replace
Auto L0 sync — All write tools (create/update/delete/inject) automatically sync L0 index
Dedup engine — SequenceMatcher-based similarity detection with configurable threshold
File locking — fcntl.flock for concurrent write safety
Knowledge watcher — File changes trigger debounced L0 sync (HTTP mode)
cognitive_decision_prompt — Built-in decision framework prompt

v0.4.0 — Initial Release

4-tier knowledge architecture (L0/L1/L2/L3)
Keyword search with relevance scoring
Session scanning and compression
MCP protocol support (stdio + HTTP)
Zero external dependencies (core engine)

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.9.2

Jun 29, 2026

2.9.1

Jun 24, 2026

2.9.0

Jun 24, 2026

2.7.0

Jun 21, 2026

2.6.0

Jun 21, 2026

2.5.0

Jun 21, 2026

2.3.0

Jun 16, 2026

2.2.1

Jun 13, 2026

2.2.0

Jun 13, 2026

2.1.4

May 31, 2026

2.1.3

May 26, 2026

2.1.2

May 20, 2026

2.1.1

May 15, 2026

2.1.0

May 15, 2026

2.0.0

May 12, 2026

1.2.0

May 10, 2026

1.1.0

May 8, 2026

1.0.0

May 8, 2026

0.7.3

May 8, 2026

0.7.2

May 8, 2026

0.7.1

May 8, 2026

0.7.0

May 8, 2026

This version

0.6.0

May 8, 2026

0.5.0

May 8, 2026

0.3.1

May 4, 2026

0.2.0

May 4, 2026

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layered_memory_mcp-0.6.0.tar.gz (61.9 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

layered_memory_mcp-0.6.0-py3-none-any.whl (38.4 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file layered_memory_mcp-0.6.0.tar.gz.

File metadata

Download URL: layered_memory_mcp-0.6.0.tar.gz
Upload date: May 8, 2026
Size: 61.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for layered_memory_mcp-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`222209b8bbc7132c8f4f41c4245049a1d7d843927756e007bc888e92722d2a08`
MD5	`bbd3caaccaf6a8d1798903d3de98d047`
BLAKE2b-256	`080f9754086e4d8c0990cc802b8fe06c2eb1771d701b67f1d1a092a738eff246`

See more details on using hashes here.

File details

Details for the file layered_memory_mcp-0.6.0-py3-none-any.whl.

File metadata

Download URL: layered_memory_mcp-0.6.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 38.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for layered_memory_mcp-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69759bfcb0926de2502e9afde214d3aea9edeb505f562a81348e9e17e52b2bc7`
MD5	`f8bb1fa2c850a179b1fa2391ffe39f6d`
BLAKE2b-256	`f062b5aef3b20e00afe6258402436b2a1ab9694d67720fe842bdb7a121fc8c2d`

See more details on using hashes here.

layered-memory-mcp 0.6.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Layered Memory MCP Server

The Problem

The Solution

Features

Quick Start

Install

Hermes Agent

OpenClaw

Claude Desktop

Cursor / Other MCP Clients

Environment Variables

Usage

1. Write Knowledge (Recommended)

2. Read Knowledge

3. Validate Health

4. Session Compression (Cron Job)

5. Legacy CRUD (Also Available)

MCP Tools

Read Tools

Write Tools

Management Tools

MCP Resources

MCP Prompts

Architecture Deep Dive

Why 4 Tiers?

Write-Once, Fully-Visible Pipeline (v0.5.0)

Relevance Scoring

L0 Index Formats

Session Compression

Agent Compatibility

When to use Layered Memory vs. built-in memory

Integration patterns

Cognitive Decision Framework

Decision Tree

Why This Matters

Integration

Development

Changelog

v0.5.0 — Write-Once, Fully-Visible

v0.4.0 — Initial Release

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes