Efficient Lifelong Memory for LLM Agents
Project description
SimpleMem: Efficient Lifelong Memory for LLM Agents
๐ฅ News
- [01/14/2026] SimpleMem MCP Server is now LIVE and Open Source! ๐ Experience SimpleMem as a cloud-hosted memory service at mcp.simplemem.cloud. Easily integrate with your favorite chat platforms (LM Studio, Cherry Studio) and AI agents (Cursor, Claude Desktop) using the Streamable HTTP MCP protocol. The MCP implementation features production-ready optimizations including multi-tenant user isolation, faster response times, and enhanced security. View MCP Documentation โ
- [01/08/2026] We've set up a Discord server and WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! ๐ฅ Join our Discord and WeChat Group Now!
- [01/05/2026] SimpleMem paper was released on arXiv!
๐ Table of Contents
- ๐ Overview
- ๐ฏ Key Contributions
- ๐ Performance Highlights
- ๐ฆ Installation
- โก Quick Start
- ๐ MCP Server
- ๐ Evaluation
- ๐ File Structure
- ๐ Citation
- ๐ License
- ๐ Acknowledgments
๐ Overview
SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.
SimpleMem addresses the fundamental challenge of efficient long-term memory for LLM agents through a three-stage pipeline grounded in Semantic Lossless Compression. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through:
๐ Stage 1Semantic Structured Compression Entropy-based filtering and de-linearization of dialogue into self-contained atomic facts |
๐๏ธ Stage 2Structured Indexing Asynchronous evolution from fragmented atoms to higher-order molecular insights |
๐ฏ Stage 3Adaptive Retrieval Complexity-aware pruning across semantic, lexical, and symbolic layers |
The SimpleMem Architecture: A three-stage pipeline for efficient lifelong memory through semantic lossless compression
๐ Performance Comparison
Speed Comparison Demo
SimpleMem vs. Baseline: Real-time speed comparison demonstration
LoCoMo-10 Benchmark Results (GPT-4.1-mini)
| Model | โฑ๏ธ Construction Time | ๐ Retrieval Time | โก Total Time | ๐ฏ Average F1 |
|---|---|---|---|---|
| A-Mem | 5140.5s | 796.7s | 5937.2s | 32.58% |
| LightMem | 97.8s | 577.1s | 675.9s | 24.63% |
| Mem0 | 1350.9s | 583.4s | 1934.3s | 34.20% |
| SimpleMem โญ | 92.6s | 388.3s | 480.9s | 43.24% |
๐ก Key Advantages:
- ๐ Highest F1 Score: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem)
- โก Fastest Retrieval: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0)
- ๐ Fastest End-to-End: 480.9s total processing time (12.5ร faster than A-Mem)
๐ฏ Key Contributions
1๏ธโฃ Semantic Lossless Compression Pipeline
SimpleMem transforms raw, ambiguous dialogue streams into atomic entries โ self-contained facts with resolved coreferences and absolute timestamps. This write-time disambiguation eliminates downstream reasoning overhead.
โจ Example Transformation:
- Input: "He'll meet Bob tomorrow at 2pm" [โ relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00" [โ
absolute, atomic]
2๏ธโฃ Structured Multi-View Indexing
Memory is indexed across three structured dimensions for robust, multi-granular retrieval:
| ๐ Layer | ๐ Type | ๐ฏ Purpose | ๐ ๏ธ Implementation |
|---|---|---|---|
| Semantic | Dense | Conceptual similarity | Vector embeddings (1024-d) |
| Lexical | Sparse | Exact term matching | BM25-style keyword index |
| Symbolic | Metadata | Structured filtering | Timestamps, entities, persons |
3๏ธโฃ Complexity-Aware Adaptive Retrieval
Instead of fixed-depth retrieval, SimpleMem dynamically estimates query complexity ($C_q$) to modulate retrieval depth:
$$k_{dyn} = \lfloor k_{base} \cdot (1 + \delta \cdot C_q) \rfloor$$
|
๐น Low Complexity Queries
|
๐ธ High Complexity Queries
|
๐ Result: 43.24% F1 score with 30ร fewer tokens than full-context methods.
๐ Performance Highlights
๐ Benchmark Results (LoCoMo)
๐ฌ High-Capability Models (GPT-4.1-mini)
| Task Type | SimpleMem F1 | Mem0 F1 | Improvement |
|---|---|---|---|
| MultiHop | 43.46% | 30.14% | +43.8% |
| Temporal | 58.62% | 48.91% | +19.9% |
| SingleHop | 51.12% | 41.3% | +23.8% |
โ๏ธ Efficient Models (Qwen2.5-1.5B)
| Metric | SimpleMem | Mem0 | Notes |
|---|---|---|---|
| Average F1 | 25.23% | 23.77% | Competitive with 99ร smaller model |
๐ฆ Installation
๐ Requirements
- ๐ Python 3.10+
- ๐ OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.)
๐ Quick Install (PyPI)
# Install from PyPI
pip install simplemem
# With GPU support (for faster embeddings)
pip install simplemem[gpu]
# For development
pip install simplemem[dev]
๐ ๏ธ Install from Source
# ๐ฅ Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem
# ๐ฆ Install in editable mode
pip install -e .
# Or install dependencies only
pip install -r requirements.txt
โ๏ธ Configuration
SimpleMem uses environment variables for configuration:
# Required: Set your OpenAI API key
export OPENAI_API_KEY="your-api-key"
# Optional: Custom API endpoint (for Qwen, Azure, etc.)
export OPENAI_BASE_URL="https://api.example.com/v1"
# Optional: Override model settings
export SIMPLEMEM_MODEL="gpt-4.1-mini"
export SIMPLEMEM_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B"
Or configure programmatically:
from simplemem import set_config
set_config(
openai_api_key="your-api-key",
llm_model="gpt-4.1-mini",
embedding_model="Qwen/Qwen3-Embedding-0.6B"
)
โก Quick Start
๐ Basic Usage
from simplemem import SimpleMemSystem
# ๐ Initialize system
system = SimpleMemSystem(clear_db=True)
# ๐ฌ Add dialogues (Stage 1: Semantic Structured Compression)
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00")
# โ
Finalize atomic encoding
system.finalize()
# ๐ Query with adaptive retrieval (Stage 3: Adaptive Query-Aware Retrieval)
answer = system.ask("When and where will Alice and Bob meet?")
print(answer)
# Output: "16 November 2025 at 2:00 PM at Starbucks"
๐ Advanced: Parallel Processing
For large-scale dialogue processing, enable parallel mode:
from simplemem import SimpleMemSystem
system = SimpleMemSystem(
clear_db=True,
enable_parallel_processing=True, # โก Parallel memory building
max_parallel_workers=8,
enable_parallel_retrieval=True, # ๐ Parallel query execution
max_retrieval_workers=4
)
๐ก Pro Tip: Parallel processing significantly reduces latency for batch operations!
๐ MCP Server
SimpleMem is available as a cloud-hosted memory service via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients.
๐ Cloud Service: mcp.simplemem.cloud
Key Features
| Feature | Description |
|---|---|
| Streamable HTTP | MCP 2025-03-26 protocol with JSON-RPC 2.0 |
| Multi-tenant Isolation | Per-user data tables with token authentication |
| Hybrid Retrieval | Semantic search + keyword matching + metadata filtering |
| Production Optimized | Faster response times with OpenRouter integration |
Quick Configuration
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
๐ For detailed setup instructions and self-hosting guide, see MCP Documentation
๐ Evaluation
๐งช Run Benchmark Tests
# ๐ฏ Full LoCoMo benchmark
python test_locomo10.py
# ๐ Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5
# ๐พ Custom output file
python test_locomo10.py --result-file my_results.json
๐ฌ Reproduce Paper Results
Use the exact configurations in config.py:
- ๐ High-capability: GPT-4.1-mini, Qwen3-Plus
- โ๏ธ Efficient: Qwen2.5-1.5B, Qwen2.5-3B
- ๐ Embedding: Qwen3-Embedding-0.6B (1024-d)
๐ Citation
If you use SimpleMem in your research, please cite:
@article{simplemem2025,
title={SimpleMem: Efficient Lifelong Memory for LLM Agents},
author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
journal={arXiv preprint arXiv:2601.02553},
year={2025},
url={https://github.com/aiming-lab/SimpleMem}
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
We would like to thank the following projects and teams:
- ๐ Embedding Model: Qwen3-Embedding - State-of-the-art retrieval performance
- ๐๏ธ Vector Database: LanceDB - High-performance columnar storage
- ๐ Benchmark: LoCoMo - Long-context memory evaluation framework
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simplemem-0.1.0.tar.gz.
File metadata
- Download URL: simplemem-0.1.0.tar.gz
- Upload date:
- Size: 38.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae70c518205ba8dccc51f8731007fb1b86fb8d33254b32e192a829e0a6a86196
|
|
| MD5 |
83485910227b0f9b52ebdc5ee3a91a3b
|
|
| BLAKE2b-256 |
321b1e57f0719d7e455ac7b1e560ffcde9fcdcbbde3b8f47508226913f084ec5
|
File details
Details for the file simplemem-0.1.0-py3-none-any.whl.
File metadata
- Download URL: simplemem-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6754a09f2f8474103d454b2e815daa4a4a1be1cd16b329fe2815322b9aef6b6
|
|
| MD5 |
6c3080f951f3b201f8143fdf39ff863a
|
|
| BLAKE2b-256 |
e3f5a1b50fc41772bc028d0b7a15a1845f8f9a0cbfbc5b2170d1edab3a6139ed
|