Skip to main content

Efficient Lifelong Memory for LLM Agents

Project description

SimpleMem: Efficient Lifelong Memory for LLM Agents

Project Page

PyPI version Python Paper GitHub License


๐Ÿ”ฅ News

  • [01/14/2026] SimpleMem MCP Server is now LIVE and Open Source! ๐ŸŽ‰ Experience SimpleMem as a cloud-hosted memory service at mcp.simplemem.cloud. Easily integrate with your favorite chat platforms (LM Studio, Cherry Studio) and AI agents (Cursor, Claude Desktop) using the Streamable HTTP MCP protocol. The MCP implementation features production-ready optimizations including multi-tenant user isolation, faster response times, and enhanced security. View MCP Documentation โ†’
  • [01/08/2026] We've set up a Discord server and WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! ๐Ÿ”ฅ Join our Discord and WeChat Group Now!
  • [01/05/2026] SimpleMem paper was released on arXiv!

๐Ÿ“‘ Table of Contents


๐ŸŒŸ Overview

Performance vs Efficiency Trade-off

SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.

SimpleMem addresses the fundamental challenge of efficient long-term memory for LLM agents through a three-stage pipeline grounded in Semantic Lossless Compression. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through:

๐Ÿ” Stage 1

Semantic Structured Compression

Entropy-based filtering and de-linearization of dialogue into self-contained atomic facts

๐Ÿ—‚๏ธ Stage 2

Structured Indexing

Asynchronous evolution from fragmented atoms to higher-order molecular insights

๐ŸŽฏ Stage 3

Adaptive Retrieval

Complexity-aware pruning across semantic, lexical, and symbolic layers

SimpleMem Framework

The SimpleMem Architecture: A three-stage pipeline for efficient lifelong memory through semantic lossless compression


๐Ÿ† Performance Comparison

Speed Comparison Demo

SimpleMem vs. Baseline: Real-time speed comparison demonstration

LoCoMo-10 Benchmark Results (GPT-4.1-mini)

Model โฑ๏ธ Construction Time ๐Ÿ”Ž Retrieval Time โšก Total Time ๐ŸŽฏ Average F1
A-Mem 5140.5s 796.7s 5937.2s 32.58%
LightMem 97.8s 577.1s 675.9s 24.63%
Mem0 1350.9s 583.4s 1934.3s 34.20%
SimpleMem โญ 92.6s 388.3s 480.9s 43.24%

๐Ÿ’ก Key Advantages:

  • ๐Ÿ† Highest F1 Score: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem)
  • โšก Fastest Retrieval: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0)
  • ๐Ÿš€ Fastest End-to-End: 480.9s total processing time (12.5ร— faster than A-Mem)

๐ŸŽฏ Key Contributions

1๏ธโƒฃ Semantic Lossless Compression Pipeline

SimpleMem transforms raw, ambiguous dialogue streams into atomic entries โ€” self-contained facts with resolved coreferences and absolute timestamps. This write-time disambiguation eliminates downstream reasoning overhead.

โœจ Example Transformation:

- Input:  "He'll meet Bob tomorrow at 2pm"  [โŒ relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00"  [โœ… absolute, atomic]

2๏ธโƒฃ Structured Multi-View Indexing

Memory is indexed across three structured dimensions for robust, multi-granular retrieval:

๐Ÿ” Layer ๐Ÿ“Š Type ๐ŸŽฏ Purpose ๐Ÿ› ๏ธ Implementation
Semantic Dense Conceptual similarity Vector embeddings (1024-d)
Lexical Sparse Exact term matching BM25-style keyword index
Symbolic Metadata Structured filtering Timestamps, entities, persons

3๏ธโƒฃ Complexity-Aware Adaptive Retrieval

Instead of fixed-depth retrieval, SimpleMem dynamically estimates query complexity ($C_q$) to modulate retrieval depth:

$$k_{dyn} = \lfloor k_{base} \cdot (1 + \delta \cdot C_q) \rfloor$$

๐Ÿ”น Low Complexity Queries

  • Retrieve minimal molecular headers
  • ~100 tokens
  • Fast response time

๐Ÿ”ธ High Complexity Queries

  • Expand to detailed atomic contexts
  • ~1000 tokens
  • Comprehensive coverage

๐Ÿ“ˆ Result: 43.24% F1 score with 30ร— fewer tokens than full-context methods.


๐Ÿš€ Performance Highlights

๐Ÿ“Š Benchmark Results (LoCoMo)

๐Ÿ”ฌ High-Capability Models (GPT-4.1-mini)
Task Type SimpleMem F1 Mem0 F1 Improvement
MultiHop 43.46% 30.14% +43.8%
Temporal 58.62% 48.91% +19.9%
SingleHop 51.12% 41.3% +23.8%
โš™๏ธ Efficient Models (Qwen2.5-1.5B)
Metric SimpleMem Mem0 Notes
Average F1 25.23% 23.77% Competitive with 99ร— smaller model

๐Ÿ“ฆ Installation

๐Ÿ“‹ Requirements

  • ๐Ÿ Python 3.10+
  • ๐Ÿ”‘ OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.)

๐Ÿš€ Quick Install (PyPI)

# Install from PyPI
pip install simplemem

# With GPU support (for faster embeddings)
pip install simplemem[gpu]

# For development
pip install simplemem[dev]

๐Ÿ› ๏ธ Install from Source

# ๐Ÿ“ฅ Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem

# ๐Ÿ“ฆ Install in editable mode
pip install -e .

# Or install dependencies only
pip install -r requirements.txt

โš™๏ธ Configuration

SimpleMem uses environment variables for configuration:

# Required: Set your OpenAI API key
export OPENAI_API_KEY="your-api-key"

# Optional: Custom API endpoint (for Qwen, Azure, etc.)
export OPENAI_BASE_URL="https://api.example.com/v1"

# Optional: Override model settings
export SIMPLEMEM_MODEL="gpt-4.1-mini"
export SIMPLEMEM_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B"

Or configure programmatically:

from simplemem import set_config

set_config(
    openai_api_key="your-api-key",
    llm_model="gpt-4.1-mini",
    embedding_model="Qwen/Qwen3-Embedding-0.6B"
)

โšก Quick Start

๐ŸŽ“ Basic Usage

from simplemem import SimpleMemSystem

# ๐Ÿš€ Initialize system
system = SimpleMemSystem(clear_db=True)

# ๐Ÿ’ฌ Add dialogues (Stage 1: Semantic Structured Compression)
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00")

# โœ… Finalize atomic encoding
system.finalize()

# ๐Ÿ”Ž Query with adaptive retrieval (Stage 3: Adaptive Query-Aware Retrieval)
answer = system.ask("When and where will Alice and Bob meet?")
print(answer)
# Output: "16 November 2025 at 2:00 PM at Starbucks"

๐Ÿš„ Advanced: Parallel Processing

For large-scale dialogue processing, enable parallel mode:

from simplemem import SimpleMemSystem

system = SimpleMemSystem(
    clear_db=True,
    enable_parallel_processing=True,  # โšก Parallel memory building
    max_parallel_workers=8,
    enable_parallel_retrieval=True,   # ๐Ÿ” Parallel query execution
    max_retrieval_workers=4
)

๐Ÿ’ก Pro Tip: Parallel processing significantly reduces latency for batch operations!


๐Ÿ”Œ MCP Server

SimpleMem is available as a cloud-hosted memory service via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients.

๐ŸŒ Cloud Service: mcp.simplemem.cloud

Key Features

Feature Description
Streamable HTTP MCP 2025-03-26 protocol with JSON-RPC 2.0
Multi-tenant Isolation Per-user data tables with token authentication
Hybrid Retrieval Semantic search + keyword matching + metadata filtering
Production Optimized Faster response times with OpenRouter integration

Quick Configuration

{
  "mcpServers": {
    "simplemem": {
      "url": "https://mcp.simplemem.cloud/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN"
      }
    }
  }
}

๐Ÿ“– For detailed setup instructions and self-hosting guide, see MCP Documentation


๐Ÿ“Š Evaluation

๐Ÿงช Run Benchmark Tests

# ๐ŸŽฏ Full LoCoMo benchmark
python test_locomo10.py

# ๐Ÿ“‰ Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5

# ๐Ÿ’พ Custom output file
python test_locomo10.py --result-file my_results.json

๐Ÿ”ฌ Reproduce Paper Results

Use the exact configurations in config.py:

  • ๐Ÿš€ High-capability: GPT-4.1-mini, Qwen3-Plus
  • โš™๏ธ Efficient: Qwen2.5-1.5B, Qwen2.5-3B
  • ๐Ÿ” Embedding: Qwen3-Embedding-0.6B (1024-d)

๐Ÿ“ Citation

If you use SimpleMem in your research, please cite:

@article{simplemem2025,
  title={SimpleMem: Efficient Lifelong Memory for LLM Agents},
  author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and  Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2601.02553},
  year={2025},
  url={https://github.com/aiming-lab/SimpleMem}
}

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

We would like to thank the following projects and teams:

  • ๐Ÿ” Embedding Model: Qwen3-Embedding - State-of-the-art retrieval performance
  • ๐Ÿ—„๏ธ Vector Database: LanceDB - High-performance columnar storage
  • ๐Ÿ“Š Benchmark: LoCoMo - Long-context memory evaluation framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplemem-0.1.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplemem-0.1.0-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file simplemem-0.1.0.tar.gz.

File metadata

  • Download URL: simplemem-0.1.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for simplemem-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae70c518205ba8dccc51f8731007fb1b86fb8d33254b32e192a829e0a6a86196
MD5 83485910227b0f9b52ebdc5ee3a91a3b
BLAKE2b-256 321b1e57f0719d7e455ac7b1e560ffcde9fcdcbbde3b8f47508226913f084ec5

See more details on using hashes here.

File details

Details for the file simplemem-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: simplemem-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for simplemem-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6754a09f2f8474103d454b2e815daa4a4a1be1cd16b329fe2815322b9aef6b6
MD5 6c3080f951f3b201f8143fdf39ff863a
BLAKE2b-256 e3f5a1b50fc41772bc028d0b7a15a1845f8f9a0cbfbc5b2170d1edab3a6139ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page