A compact lifelong memory framework for LLM Agents.

Project description

LycheeMem

License Python Version LangGraph litellm

中文 | English

LycheeMem is a compact memory framework for LLM agents. It starts from efficient conversational memory—through structured organization, lightweight consolidation, and adaptive retrieval—and gradually extends toward action-aware, usage-aware memory for more capable agentic systems.

News • Memory Architecture • Pipeline • Quick Start • Web Demo • OpenClaw Plugin • MCP • API Reference

🔥 News

[03/30/2026] We evaluated LycheeMem on PinchBench with the OpenClaw plugin: compared to OpenClaw's native memory, it achieved an ~6% score improvement, while reducing token consumption by ~71% and cost by ~55%!
[03/28/2026] Semantic memory has been upgraded to Compact Semantic Memory (SQLite + LanceDB), no Neo4j required. See /quick-start for details.
[03/27/2026] OpenClaw Plugin is now available at /openclaw-plugin ! Setup guide →
[03/26/2026] MCP support is available at /mcp !
[03/23/2026] LycheeMem is now open source: GitHub Repository →

📚 Memory Architecture

LycheeMem organizes memory into three complementary stores:

Working Memory	Semantic Memory	Procedural Memory
(Episodic) Session turns Summaries Token budget management	(Typed Action Store) 7 MemoryRecord types Conflict-aware Record Fusion Hierarchical memory tree Action-grounded retrieval planning Usage feedback loop + RL-ready statistics	(Skills) Skill entries HyDE retrieval

Working Memory

Semantic Memory

Procedural Memory

(Episodic)

Session turns
Summaries
Token budget management

(Typed Action Store)

7 MemoryRecord types
Conflict-aware Record Fusion
Hierarchical memory tree
Action-grounded retrieval planning
Usage feedback loop + RL-ready statistics

(Skills)

Skill entries
HyDE retrieval

💾 Working Memory

The working memory window holds the active conversation context for a session. It operates under a dual-threshold token budget:

Warn threshold (70%) — triggers asynchronous background pre-compression; the current request is not blocked.
Block threshold (90%) — the pipeline pauses and flushes older turns to a compressed summary before proceeding.

Compression produces summary anchors (past context, distilled) + raw recent turns (last N turns, verbatim). Both are passed downstream as the conversation history.

🗺️ Semantic Memory — Compact Semantic Memory

Semantic memory is organised around typed MemoryRecords plus action-grounded retrieval state. The storage layer is SQLite (FTS5 full-text search) + LanceDB (vector index), while retrieval is conditioned on recent context, tentative action, constraints, and missing slots.

Memory Record Types

Each memory entry is stored as a MemoryRecord. The memory_type field distinguishes seven semantic categories:

Type	Description
`fact`	Objective facts about the user, environment, or world
`preference`	User preferences (style, habits, likes/dislikes)
`event`	Specific events that have occurred
`constraint`	Conditions that must be respected
`procedure`	Reusable step-by-step procedures / methods
`failure_pattern`	Previously failed action paths and their causes
`tool_affordance`	Capabilities and applicable scenarios of tools/APIs

Beyond text, every MemoryRecord carries action-facing metadata (tool_tags, constraint_tags, failure_tags, affordance_tags) and usage statistics (retrieval_count, action_success_count, etc.) to seed future reinforcement-learning signals. Retrieval logs also persist retrieval_plan, action_state, response excerpts, and later user feedback so the system can close a lightweight action-outcome loop without training.

Related MemoryRecords can be fused online by the Record Fusion Engine into denser CompositeRecords. Composite entries persist direct child_composite_ids, so long-term semantic memory is organised as a hierarchical memory tree instead of a flat bag of summaries.

Four-Module Pipeline

Module 1: Compact Semantic Encoding

A single-pass pipeline that converts conversation turns into a list of MemoryRecords:

Typed extraction — LLM extracts self-contained facts and assigns a semantic category to each record.
Decontextualization — Pronouns and context-dependent phrases are expanded into full expressions, so each record is understandable without the original dialogue.
Action metadata annotation — LLM annotates each record with memory_type, tool_tags, constraint_tags, failure_tags, affordance_tags, and other structured labels.

record_id = SHA256(normalized_text) — naturally idempotent; duplicate content is deduplicated automatically.

Module 2: Record Fusion, Conflict Update, and Hierarchical Consolidation

Triggered online after each consolidation:

FTS / vector recall gathers related existing atomic records around the new records (candidate pool).
The existing synthesis judge prompt decides whether each candidate set should produce a new CompositeRecord or perform a conflict_update against an existing atomic record.
On conflict_update, the existing anchor record is updated in place, conflicting incoming records are soft-expired, and composites covering affected source records are invalidated.
On synthesis, the engine writes a new CompositeRecord to SQLite + LanceDB.
Additional hierarchy rounds can synthesize record -> composite and composite -> composite, persisting child_composite_ids so the memory tree can keep growing upward.

Module 3: Action-Grounded Retrieval Planning

Before retrieval, ActionAwareRetrievalPlanner analyses the user query + recent context + ActionState and emits a SearchPlan:

mode: answer (factual Q&A) / action (needs execution) / mixed
semantic_queries: content-facing search terms
pragmatic_queries: action/tool/constraint-facing search terms
tool_hints: tools likely needed for this request
required_constraints: constraints that must be respected
required_affordances: capabilities the retrieved memory should provide
missing_slots: parameters / slots that are absent
tree_retrieval_mode / tree_expansion_depth / include_leaf_records: whether retrieval should stay at high-level composites (root_only) or descend into child composites / direct leaf records (balanced / descend)

ActionState can carry fields such as current_subgoal, tentative_action, known_constraints, available_tools, failure_signal, and a recent-context excerpt. The planner merges this state with the LLM-produced plan so retrieval is conditioned on the current decision state rather than the query alone.

The plan drives multi-channel recall:

FTS channel — SQLite FTS5 keyword recall over MemoryRecord + CompositeRecord
Semantic vector channel — LanceDB ANN over semantic_text embeddings
Normalised vector channel — LanceDB ANN over normalized_text embeddings (for pragmatic queries)
Tag filter channel — exact filter by tool_hints / required_constraints / required_affordances
Temporal channel — filter by SearchPlan.temporal_filter time window
Slot-hint supplementation — when missing_slots is non-empty, extra FTS/tag recall is triggered to find records that can fill missing parameters

After base recall, retrieval can also expand along the memory tree. root_only keeps high-level composite summaries, balanced descends one level when tree hints match, and descend pulls child composites plus direct leaf records when the current action needs finer-grained detail.

Module 4: Multi-Dimensional Scorer

Candidates from all channels are de-duplicated and ranked by MemoryScorer using a weighted linear combination. Final top-k selection is composite-first: covering parent composites are preferred, covered child records are folded away unless they add unique value, and near-duplicate fragments are suppressed.

$$\text{Score} = \alpha \cdot S_\text{sem} + \beta \cdot S_\text{action} + \kappa \cdot S_\text{slot} + \gamma \cdot S_\text{temporal} + \delta \cdot S_\text{recency} + \eta \cdot S_\text{evidence} - \lambda \cdot C_\text{token}$$

Weight	Meaning	Default
α	SemanticRelevance (vector distance -> similarity)	0.25
β	ActionUtility (tag match score, mode-aware)	0.25
κ	SlotUtility (whether the memory helps fill missing action slots)	0.15
γ	TemporalFit (temporal reference match)	0.15
δ	Recency (memory freshness)	0.10
η	EvidenceDensity (evidence span density)	0.10
λ	TokenCost penalty (text length penalty)	0.10

🛠️ Procedural Memory — Skill Store

The skill store preserves reusable how-to knowledge as structured skill entries, each carrying:

Intent — a short description of what the skill does.
doc_markdown — a full Markdown document describing the procedure, commands, parameters, and caveats.
Embedding — a dense vector of the intent text, used for similarity search.
Metadata — usage counters, last-used timestamp, preconditions.

Skill retrieval uses HyDE (Hypothetical Document Embeddings): the query is first expanded into a hypothetical ideal answer by the LLM, then that draft text is embedded to produce a query vector that matches well against stored procedure descriptions, even when the user's original phrasing is vague.

⚙️ Pipeline

Every request passes through a fixed sequence of five agents. Four are synchronous stages in the LangGraph pipeline; one is a background post-processing task.

START

▼

1. WMManager — Token budget check + compress/render

↓

2. SearchCoordinator — Planner → Semantic + Skill retrieval

↓

3. SynthesizerAgent — LLM-as-Judge scoring + context fusion

↓

4. ReasoningAgent — Final response generation

▼

END

Background asyncio.create_task( ConsolidatorAgent )

Stage 1 — WMManager

Rule-based agent (no LLM prompt). Appends the user turn to the session log, counts tokens, and fires compression if either threshold is crossed. Produces compressed_history and raw_recent_turns for downstream stages.

Stage 2 — SearchCoordinator

SearchCoordinator first builds recent_context from compressed summaries + raw recent turns, then derives an ActionState from the current query, constraints, recent failures, token budget, and recent tool use. ActionAwareRetrievalPlanner uses that state to produce a SearchPlan containing mode, semantic_queries, pragmatic_queries, tool_hints, required_affordances, missing_slots, tree-traversal strategy, and more. Multi-channel recall (FTS, semantic vector, normalised vector, tag/affordance filter, temporal filter, slot-hint supplementation, plus tree expansion when needed) then queries SQLite + LanceDB. This stage returns raw semantic fragments, skill hits, retrieval provenance, and a dedicated novelty_retrieved_context built from pre-synthesis semantic fragments for later novelty checking; it does not build the final background_context yet. Skill retrieval is mode-aware (answer / action / mixed) and uses HyDE against the skill store only when it is likely to help.

When a new user turn arrives, SearchCoordinator also tries to apply lightweight feedback to the most recent unresolved action/mixed retrieval log, so the next turn can mark the prior memory usage as success / fail / correction.

Stage 3 — SynthesizerAgent

Acts as an LLM-as-Judge: scores every retrieved memory fragment on an absolute 0-1 relevance scale, discards fragments below the threshold (default 0.6), and fuses the survivors into a single dense background_context string. It also identifies skill_reuse_plan entries that can directly guide the final response. This stage is where the final answer-time context is built; it outputs provenance — a citation list containing scoring breakdown and source references for each kept memory item.

Stage 4 — ReasoningAgent

Receives compressed_history, background_context, and skill_reuse_plan and generates the final assistant reply. It appends the assistant turn back to the session store, and the pipeline finalizes the semantic usage log with a response excerpt so the next user turn can provide outcome feedback.

Background — ConsolidatorAgent

Triggered immediately after ReasoningAgent completes, runs in a thread pool and does not block the response. It:

Performs a novelty check — LLM judges whether the conversation introduced new information worth persisting. Skips consolidation for pure retrieval exchanges.
Compact consolidation — calls CompactSemanticEngine.ingest_conversation(), which runs a single-pass encoder (typed extraction → decontextualization → action metadata annotation), writes MemoryRecords to SQLite + LanceDB, then triggers conflict-aware Record Fusion. Novelty check uses the search-stage novelty_retrieved_context (raw semantic fragments), not the answer-time background_context, so query-conditioned synthesis does not suppress valid new-memory ingestion.
Skill extraction — identifies successful tool-usage patterns in the conversation and adds skill entries to the skill store. Runs in parallel with compact consolidation (ThreadPoolExecutor).

⚡ Quick Start

Prerequisites

Python 3.11+
An LLM API key (OpenAI, Gemini, or any litellm-compatible provider)

Installation

git clone https://github.com/LycheeMem/LycheeMem.git
cd LycheeMem
pip install -e .

Configuration

Copy .env.example to .env and fill in your values. The full template in .env.example also includes session/user DB paths, JWT settings, and working-memory thresholds; the snippet below shows the most important ones:

# LLM — litellm format: provider/model
LLM_MODEL=openai/gpt-4o-mini
LLM_API_KEY=sk-...
LLM_API_BASE=                     # optional

# Embedder
EMBEDDING_MODEL=openai/text-embedding-3-small
EMBEDDING_DIM=1536
EMBEDDING_API_KEY=                # optional
EMBEDDING_API_BASE=               # optional

Supported LLM providers (via litellm):
openai/gpt-4o-mini · gemini/gemini-2.0-flash · ollama_chat/qwen2.5 · any OpenAI-compatible endpoint

Start the Server

python main.py

The API is served at http://localhost:8000. Interactive docs at /docs.

main.py currently starts Uvicorn without enabling live reload. For development reload, run Uvicorn directly, for example:
uvicorn src.api.server:create_app --factory --reload

🎨 Web Demo

A frontend demo is included under web-demo/. It provides a chat interface alongside live views of the semantic memory tree, skill library, and working memory state.

cd web-demo
npm install
npm run dev      # served at http://localhost:5173

Make sure the backend is running on port 8000 (or update proxy settings in web-demo/vite.config.ts) before starting the frontend.

🦞 OpenClaw Plugin

LycheeMem ships a native OpenClaw plugin that gives any OpenClaw session persistent long-term memory with zero manual wiring.

What the plugin provides:

lychee_memory_smart_search — default long-term memory retrieval entry point
Automatic turn mirroring via hooks — the model does not need to call append_turn manually
- User messages are appended automatically
- Assistant messages are appended automatically
/new, /reset, /stop, and session_end automatically trigger boundary consolidation
Proactive consolidation on strong long-term knowledge signals

Under normal operation:

The model only calls lychee_memory_smart_search when recalling long-term context
The model may call lychee_memory_consolidate manually when an immediate persist is warranted
The model does not need to call lychee_memory_append_turn at all

Quick Install

openclaw plugins install "/path/to/LycheeMem/openclaw-plugin"
openclaw gateway restart

See the full setup guide: openclaw-plugin/INSTALL_OPENCLAW.md

🔧 MCP

LycheeMem also exposes an HTTP MCP endpoint at http://localhost:8000/mcp.

Available tools: lychee_memory_smart_search, lychee_memory_search, lychee_memory_append_turn, lychee_memory_synthesize, lychee_memory_consolidate
Use Authorization: Bearer <token> if you want per-user memory isolation
lychee_memory_consolidate works for sessions that already contain mirrored turns from /chat, /memory/reason, or lychee_memory_append_turn

MCP Transport

POST /mcp handles JSON-RPC requests
GET /mcp exposes the SSE stream used by some MCP clients
The server returns Mcp-Session-Id during initialize; reuse that header on later requests

Authentication

If you want isolated memory per user, first obtain a JWT token from /auth/register or /auth/login, then send:

Authorization: Bearer <token>

Without a token, requests run with an empty user_id, so anonymous traffic shares the same namespace.

Client Configuration

For any MCP client that supports remote HTTP servers, configure the MCP URL as:

http://localhost:8000/mcp

Generic config example:

{
  "mcpServers": {
    "lycheemem": {
      "url": "http://localhost:8000/mcp",
      "headers": {
        "Authorization": "Bearer <token>"
      }
    }
  }
}

Manual JSON-RPC Flow

Call initialize
Reuse the returned Mcp-Session-Id
Send initialized
Call tools/list
Call tools/call

Initialize example:

curl -i -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-03-26",
      "capabilities": {},
      "clientInfo": {
        "name": "debug-client",
        "version": "0.1.0"
      }
    }
  }'

Tool call example:

curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -H "Mcp-Session-Id: <session-id>" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "lychee_memory_smart_search",
      "arguments": {
        "query": "what tools do I use for database backups",
        "top_k": 5,
        "mode": "compact",
        "include_graph": true,
        "include_skills": true
      }
    }
  }'

Recommended MCP Usage Pattern

Use /chat or /memory/reason with a stable session_id to write conversation turns, or mirror external host turns with lychee_memory_append_turn.
Use lychee_memory_smart_search in compact mode for the default one-shot recall path.
Use lychee_memory_search + lychee_memory_synthesize only when you explicitly want search and synthesis as separate stages.
After the conversation ends, call lychee_memory_consolidate with the same session_id.

🔌 API Reference

`POST /memory/search` — Unified Memory Retrieval

Query both the semantic memory channel and the skill store in a single call. New integrations should prefer semantic_results; graph_results is kept as a backward-compatible alias. The response also includes novelty_retrieved_context, which is the correct input for later /memory/consolidate calls.

// Request
{
  "query": "what tools do I use for database backups",
  "top_k": 5,
  "include_graph": true,
  "include_skills": true
}

// Response
{
  "query": "...",
  "graph_results": [
    {
      "anchor": {
        "node_id": "compact_context",
        "name": "CompactSemanticMemory",
        "label": "SemanticContext",
        "score": 1.0
      },
      "constructed_context": "...",
      "provenance": [ { "record_id": "...", "source": "record", "semantic_source_type": "record", "score": 0.91, ... } ]
    }
  ],
  "semantic_results": [
    {
      "anchor": { "node_id": "compact_context", "name": "CompactSemanticMemory", "label": "SemanticContext", "score": 1.0 },
      "constructed_context": "...",
      "provenance": [ { "record_id": "...", "source": "record", "semantic_source_type": "record", "score": 0.91, ... } ]
    }
  ],
  "novelty_retrieved_context": "[1] (procedure, source=record) Use pg_dump with cron ...",
  "skill_results": [ { "id": "...", "intent": "pg_dump backup to S3", "score": 0.87, ... } ],
  "total": 6
}

`POST /memory/smart-search` — One-Shot Recall

Runs search and, optionally, synthesis in one API call. mode=compact is the default integration path when you want a concise background_context without handling intermediate payloads yourself. Even in compact mode, the response still returns novelty_retrieved_context so a host can consolidate against raw retrieved memory instead of answer-time synthesis.

// Request
{
  "query": "what tools do I use for database backups",
  "top_k": 5,
  "synthesize": true,
  "mode": "compact"
}

// Response
{
  "query": "...",
  "mode": "compact",
  "synthesized": true,
  "background_context": "User regularly uses pg_dump with a cron job...",
  "skill_reuse_plan": [ { "skill_id": "...", "intent": "...", "doc_markdown": "..." } ],
  "provenance": [ { "record_id": "...", "source": "record", "score": 0.91, ... } ],
  "novelty_retrieved_context": "[1] (procedure, source=record) Use pg_dump with cron ...",
  "kept_count": 4,
  "dropped_count": 2,
  "total": 6
}

`POST /memory/synthesize` — Memory Fusion

Takes raw retrieval results and produces a fused memory context using LLM-as-Judge.

// Request
{
  "user_query": "what tools do I use for database backups",
  "semantic_results": [...], // preferred from /memory/search
  "graph_results": [...],    // compatibility alias also accepted
  "skill_results": [...]
}

// Response
{
  "background_context": "User regularly uses pg_dump with a cron job...",
  "skill_reuse_plan": [ { "skill_id": "...", "intent": "...", "doc_markdown": "..." } ],
  "provenance": [ { "record_id": "...", "source": "semantic", "semantic_source_type": "record", "score": 0.91, ... } ],
  "kept_count": 4,
  "dropped_count": 2
}

`POST /memory/reason` — Grounded Reasoning

Runs the ReasoningAgent given pre-synthesized context. Can be chained after /memory/synthesize for full pipeline control.

// Request
{
  "session_id": "my-session",
  "user_query": "what tools do I use for database backups",
  "background_context": "User regularly uses pg_dump...",
  "skill_reuse_plan": [...],
  "append_to_session": true   // write result to session history (default: true)
}

// Response
{
  "response": "You typically use pg_dump scheduled via cron...",
  "session_id": "my-session",
  "wm_token_usage": 3412
}

`POST /memory/append-turn` — Mirror External Host Turns

Appends one user or assistant turn into LycheeMem's session store so it can be consolidated later.

// Request
{
  "session_id": "my-session",
  "role": "user",
  "content": "I usually back up PostgreSQL with pg_dump to S3."
}

// Response
{
  "status": "appended",
  "session_id": "my-session",
  "turn_count": 3
}

`POST /memory/consolidate` — Trigger Consolidation

Manually trigger memory consolidation for a session. This is the primary consolidation endpoint and supports both background and synchronous modes.

retrieved_context should preferably be the novelty_retrieved_context returned by /memory/search or /memory/smart-search, i.e. the search-stage raw semantic fragments, not /memory/synthesize's background_context.

// Request
{
  "session_id": "my-session",
  "retrieved_context": "[1] (procedure, source=record) Use pg_dump with cron ...",
  "background": true
}

// Response (background mode)
{
  "status": "started",
  "entities_added": 0,
  "skills_added": 0,
  "facts_added": 0
}

Legacy compatibility endpoint: POST /memory/consolidate/{session_id}.

`GET /memory/graph` — Semantic Memory Tree

Returns the current semantic memory as a hierarchy. mode=cleaned (default) emits tree_roots plus direct tree edges for the frontend memory-tree view; mode=debug exposes the lower-level flattened relations for inspection.

`GET /pipeline/status` and `GET /pipeline/last-consolidation`

Use these endpoints for operational checks and background consolidation polling:

GET /pipeline/status returns aggregate counts for sessions, semantic memory, and skills.
GET /pipeline/last-consolidation?session_id=<id> returns the latest consolidation result for a session, or pending if the background task has not finished yet.

Usage Examples

# Basic single-turn demo (automatically registers 'demo_user')
python examples/api_pipeline_demo.py

# Multi-turn chat demo (3 consecutive turns, followed by consolidation)
python examples/api_pipeline_demo.py --multi-turn

# Custom query and user credentials
python examples/api_pipeline_demo.py --username alice --password secret123 \
  --query "How do I backup my database with pg_dump?"

# Use a fixed session_id (useful for accumulating history across multiple runs)
python examples/api_pipeline_demo.py --session-id my-test-session

Project details

Release history Release notifications | RSS feed

0.1.4

May 13, 2026

0.1.3

May 13, 2026

0.1.2

Apr 3, 2026

This version

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lycheemem-0.1.0.tar.gz (46.5 MB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lycheemem-0.1.0-py3-none-any.whl (166.1 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file lycheemem-0.1.0.tar.gz.

File metadata

Download URL: lycheemem-0.1.0.tar.gz
Upload date: Apr 3, 2026
Size: 46.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for lycheemem-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bf6b43e0874682b161648d6ab9d0119d421913af433a4a1811175b490edaaf8d`
MD5	`2a32b2d3bb9976214fe8db633f866da9`
BLAKE2b-256	`4c385e40eda10a112efd0f95ee40570721db67a974e58bb08aa474a0b323623a`

See more details on using hashes here.

File details

Details for the file lycheemem-0.1.0-py3-none-any.whl.

File metadata

Download URL: lycheemem-0.1.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 166.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for lycheemem-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`800f501a71c3c2e704c81841718441a17577b3b9dd0aa0d60149c3f44fd87d04`
MD5	`3b8f0d61e83054b15839d5de8263fc85`
BLAKE2b-256	`e6e0429fcb4ae75b40420591faad7b01247c882c066836e83d9bb6dd48b90dc3`

See more details on using hashes here.

lycheemem 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LycheeMem

🔥 News

📚 Memory Architecture

💾 Working Memory

🗺️ Semantic Memory — Compact Semantic Memory

Memory Record Types

Four-Module Pipeline

Module 1: Compact Semantic Encoding

Module 2: Record Fusion, Conflict Update, and Hierarchical Consolidation

Module 3: Action-Grounded Retrieval Planning

Module 4: Multi-Dimensional Scorer

🛠️ Procedural Memory — Skill Store

⚙️ Pipeline

Stage 1 — WMManager

Stage 2 — SearchCoordinator

Stage 3 — SynthesizerAgent

Stage 4 — ReasoningAgent

Background — ConsolidatorAgent

⚡ Quick Start

Prerequisites

Installation

Configuration

Start the Server

🎨 Web Demo

🦞 OpenClaw Plugin

Quick Install

🔧 MCP

MCP Transport

Authentication

Client Configuration

Manual JSON-RPC Flow

Recommended MCP Usage Pattern

🔌 API Reference

POST /memory/search — Unified Memory Retrieval

POST /memory/smart-search — One-Shot Recall

POST /memory/synthesize — Memory Fusion

POST /memory/reason — Grounded Reasoning

POST /memory/append-turn — Mirror External Host Turns

POST /memory/consolidate — Trigger Consolidation

GET /memory/graph — Semantic Memory Tree

GET /pipeline/status and GET /pipeline/last-consolidation

Usage Examples

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /memory/search` — Unified Memory Retrieval

`POST /memory/smart-search` — One-Shot Recall

`POST /memory/synthesize` — Memory Fusion

`POST /memory/reason` — Grounded Reasoning

`POST /memory/append-turn` — Mirror External Host Turns

`POST /memory/consolidate` — Trigger Consolidation

`GET /memory/graph` — Semantic Memory Tree

`GET /pipeline/status` and `GET /pipeline/last-consolidation`