The memory layer for LLMs — add persistent, intelligent memory to any model in minutes.
Project description
MemLayer – The Plug-and-play persistent memory for your LLMs
The memory layer for LLMs - add persistent, intelligent memory to any LLM in minutes.
MemLayer transforms stateless LLMs into memory-enabled AI assistants that remember context across conversations, extract structured knowledge, and proactively surface relevant information when needed.
Contents
- Features
- Quick Start
- Key Concepts
- Memory Modes
- Search Tiers
- Providers
- Advanced Features
- Examples
- Performance
- Documentation
- Contributing
Features
- Universal LLM Support: Works with OpenAI, Claude, Gemini, Ollama models
- Plug-and-play: Install with
pip install memlayerand get started in minutes — minimal setup required. - Intelligent Memory Filtering: Three operation modes (LOCAL/ONLINE/LIGHTWEIGHT) automatically filter important information
- Hybrid Search: Combines vector similarity + knowledge graph traversal for accurate retrieval
- Three Search Tiers: Fast (<100ms), Balanced (<500ms), Deep (<2s) optimized for different use cases
- Knowledge Graph: Automatically extracts entities, relationships, and facts from conversations
- Proactive Reminders: Schedule tasks and get automatic reminders when they're due
- Built-in Observability: Trace every search operation with detailed performance metrics
- Flexible Storage: ChromaDB (vector) + NetworkX (graph) or graph-only mode
- Production Ready: Serverless-friendly with fast cold starts using online mode
Quick Start
Installation
pip install memlayer
Basic Usage
from memlayer.wrappers.openai import OpenAI
# Initialize with memory capabilities
client = OpenAI(
model="gpt-4.1-mini",
storage_path="./memories",
user_id="user_123"
)
# Store information automatically
client.chat([
{"role": "user", "content": "My name is Alice and I work at TechCorp"}
])
# Retrieve information automatically (no manual prompting needed!)
response = client.chat([
{"role": "user", "content": "Where do I work?"}
])
# Response: "You work at TechCorp."
That's it! MemLayer automatically:
- ✅ Filters salient information using ML-based classification
- ✅ Extracts structured facts, entities, and relationships
- ✅ Stores memories in hybrid vector + graph storage
- ✅ Retrieves relevant context for each query
- ✅ Injects memories seamlessly into LLM context
Key Concepts
Salience Filtering
Not all conversation content is worth storing. MemLayer uses salience gates to intelligently filter:
- ✅ Save: Facts, preferences, user info, decisions, relationships
- ❌ Skip: Greetings, acknowledgments, filler words, meta-conversation
Hybrid Storage
Memories are stored in two complementary systems:
- Vector Store (ChromaDB): Semantic similarity search for facts
- Knowledge Graph (NetworkX): Entity relationships and structured knowledge
Automatic Consolidation
After each conversation, background threads:
- Extract facts, entities, and relationships using LLM
- Store facts in vector database with embeddings
- Build knowledge graph with entities and relationships
- Index everything for fast retrieval
Memory Modes
MemLayer offers three modes that control both memory filtering (salience) and storage:
1. LOCAL Mode (Default)
client = OpenAI(salience_mode="local")
- Filtering: Sentence-transformers ML model (high accuracy)
- Storage: ChromaDB (vector) + NetworkX (graph)
- Startup: ~10s (model loading)
- Best for: High-volume production, offline apps
- Cost: Free (no API calls)
2. ONLINE Mode
client = OpenAI(salience_mode="online")
- Filtering: OpenAI embeddings API (high accuracy)
- Storage: ChromaDB (vector) + NetworkX (graph)
- Startup: ~2s (no model loading!)
- Best for: Serverless, cloud functions, fast cold starts
- Cost: ~$0.0001 per operation
3. LIGHTWEIGHT Mode
client = OpenAI(salience_mode="lightweight")
- Filtering: Keyword-based (medium accuracy)
- Storage: NetworkX only (no vector storage!)
- Startup: <1s (instant)
- Best for: Prototyping, testing, low-resource environments
- Cost: Free (no embeddings at all)
Performance Comparison:
Mode Startup Time Accuracy API Cost Storage
──────────────────────────────────────────────────────────────
LOCAL ~10s High Free Vector+Graph
ONLINE ~2s High $0.0001/op Vector+Graph
LIGHTWEIGHT <1s Medium Free Graph-only
Search Tiers
MemLayer provides three search tiers optimized for different latency requirements:
Fast Tier (<100ms)
# Automatic - LLM chooses based on query complexity
response = client.chat([{"role": "user", "content": "What's my name?"}])
- 2 vector search results
- No graph traversal
- Perfect for: Real-time chat, simple factual recall
Balanced Tier (<500ms) DEFAULT
# Automatic - handles most queries well
response = client.chat([{"role": "user", "content": "Tell me about my projects"}])
- 5 vector search results
- No graph traversal
- Perfect for: General conversation, most use cases
Deep Tier (<2s)
# Explicit request or auto-detected for complex queries
response = client.chat([{
"role": "user",
"content": "Use deep search: Tell me everything about Alice and her relationships"
}])
- 10 vector search results
- Graph traversal enabled (entity extraction + 1-hop relationships)
- Perfect for: Research, "tell me everything", multi-hop reasoning
🔌 Providers
MemLayer works with all major LLM providers:
OpenAI
from memlayer.wrappers.openai import OpenAI
client = OpenAI(
model="gpt-4.1-mini", # or gpt-4.1, gpt-5, etc.
storage_path="./memories",
user_id="user_123"
)
Claude (Anthropic)
from memlayer.wrappers.claude import Claude
client = Claude(
model="claude-4-sonnet",
storage_path="./memories",
user_id="user_123"
)
Google Gemini
from memlayer.wrappers.gemini import Gemini
client = Gemini(
model="gemini-2.5-flash",
storage_path="./memories",
user_id="user_123"
)
Ollama (Local)
from memlayer.wrappers.ollama import Ollama
client = Ollama(
host="http://localhost:11434",
model="qwen3:1.7b", # or llama3.2, mistral, etc.
storage_path="./memories",
user_id="user_123",
salience_mode="local" # Run 100% offline!
)
All providers share the same API - switch between them seamlessly!
Advanced Features
Proactive Task Reminders
# User schedules a task
client.chat([{
"role": "user",
"content": "Remind me to submit the report next Friday at 9am"
}])
# Later, when the task is due, MemLayer automatically injects it
response = client.chat([{"role": "user", "content": "What should I do today?"}])
# Response includes: "Don't forget to submit the report - it's due today at 9am!"
Observability & Tracing
response = client.chat(messages)
# Inspect search performance
if client.last_trace:
print(f"Search tier: {client.last_trace.events[0].metadata.get('tier')}")
print(f"Total time: {client.last_trace.total_duration_ms}ms")
for event in client.last_trace.events:
print(f" {event.event_type}: {event.duration_ms}ms")
Custom Salience Threshold
# Control memory filtering strictness
client = OpenAI(
salience_threshold=-0.1 # Permissive (saves more)
# salience_threshold=0.0 # Balanced (default)
# salience_threshold=0.1 # Strict (saves less)
)
Knowledge Graph Extraction
# Manually extract structured knowledge
kg = client.analyze_and_extract_knowledge(
"Alice leads Project Phoenix in the London office. The project uses Python and React."
)
print(kg["facts"]) # ["Alice leads Project Phoenix", ...]
print(kg["entities"]) # [{"name": "Alice", "type": "Person"}, ...]
print(kg["relationships"]) # [{"subject": "Alice", "predicate": "leads", "object": "Project Phoenix"}]
Examples
Explore the examples/ directory for comprehensive examples:
Basics
# Getting started
python examples/01_basics/getting_started.py
Search Tiers
# Try all three search tiers
python examples/02_search_tiers/fast_tier_example.py
python examples/02_search_tiers/balanced_tier_example.py
python examples/02_search_tiers/deep_tier_example.py
# Compare them side-by-side
python examples/02_search_tiers/tier_comparison.py
Advanced Features
# Proactive task reminders
python examples/03_features/task_reminders.py
# Knowledge graph visualization
python examples/03_features/test_knowledge_graph.py
Benchmarks
# Compare salience modes
python examples/04_benchmarks/compare_salience_modes.py
Providers
# Try different LLM providers
python examples/05_providers/openai_example.py
python examples/05_providers/claude_example.py
python examples/05_providers/gemini_example.py
python examples/05_providers/ollama_example.py
See examples/README.md for full documentation.
Performance
Salience Mode Comparison
Real-world startup times from benchmarks:
Mode First Use Memory Savings Trade-off
─────────────────────────────────────────────────────────
LIGHTWEIGHT ~5s No embeddings No semantic search
ONLINE ~5s 5s faster Small API cost
LOCAL ~10s No API cost 11s model loading
Search Tier Latency
Typical query latencies:
Tier Latency Vector Results Graph Use Case
────────────────────────────────────────────────────────────
Fast 50-150ms 2 No Real-time chat
Balanced 200-600ms 5 No General use
Deep 800-2500ms 10 Yes Research queries
Memory Consolidation
Background processing (non-blocking):
Step Time Async
──────────────────────────────────────────────
Salience filtering ~10ms Yes
Knowledge extraction ~1-2s Yes (background thread)
Vector storage ~50ms Yes
Graph storage ~20ms Yes
Total (non-blocking) ~0ms User doesn't wait!
Documentation
Getting Started
- Basics Overview - Architecture, components, and how MemLayer works
- Quickstart Guide - Get up and running in 5 minutes
- Streaming Mode - Complete guide to streaming responses
Provider Setup
- Providers Overview - Compare all providers, choose the right one
- Ollama Setup - Run completely offline with local models
- OpenAI - OpenAI configuration
- Claude - Anthropic Claude setup
- Gemini - Google Gemini configuration
Examples
- Examples Index - Comprehensive examples by category
- Provider Examples - Provider comparison and usage
Tunable features (quick index)
The project exposes several runtime/configuration knobs you can tune to match latency, cost, and accuracy trade-offs. Detailed docs for each area live in the docs/ folder:
- docs/tuning/operation_mode.md — Architecture deep dive: How to choose between
online,local, andlightweightmodes, performance implications, storage composition, and deployment strategies. - docs/tuning/intervals.md — Scheduler and curation interval configuration (
scheduler_interval_seconds,curation_interval_seconds) and practical guidance. - docs/tuning/salience_threshold.md — How to adjust
salience_thresholdand expected behavior. - docs/services/consolidation.md — Consolidation pipeline internals and how to call it programmatically (including
update_from_text). - docs/services/curation.md — How memory curation works, archiving rules, and how to run/stop the curation service.
- docs/storage/chroma.md — ChromaDB notes: metadata types, connection handling, and Windows file-lock guidance.
- docs/storage/networkx.md — Knowledge graph persistence, expected node schemas, and backup/restore tips.
Use the docs when tuning for production. The following docs/ files were added to this repository and provide detailed, practical guidance.
Development
Setup
# Clone repository
git clone https://github.com/divagr18/memlayer.git
cd memlayer
# Install dependencies
pip install -e .
# Run tests
python -m pytest tests/
# Run examples
python examples/01_basics/getting_started.py
Project Structure
memlayer/
├── memlayer/ # Core library
│ ├── wrappers/ # LLM provider wrappers
│ ├── storage/ # Storage backends (ChromaDB, NetworkX)
│ ├── services.py # Search & consolidation services
│ ├── ml_gate.py # Salience filtering
│ └── embedding_models.py # Embedding model implementations
├── examples/ # Organized examples by category
│ ├── 01_basics/
│ ├── 02_search_tiers/
│ ├── 03_features/
│ ├── 04_benchmarks/
│ └── 05_providers/
├── tests/ # Tests and benchmarks
├── docs/ # Documentation
└── README.md # This file
Contributing
Contributions are welcome! Here's how you can help:
- Report bugs - Open an issue with reproduction steps
- Suggest features - Share your use case and requirements
- Submit PRs - Fix bugs, add features, improve docs
- Share examples - Show us what you've built!
Please keep PRs focused and include tests for new features.
Contact & Support
- Author/Maintainer: Divyansh Agrawal
- Email: keshav.r.1925@gmail.com
- GitHub: divagr18
- Issues: Report bugs or request features via GitHub Issues
For security vulnerabilities, please email directly with SECURITY in the subject line instead of opening a public issue.
License
MIT License - see LICENSE for details.
Acknowledgments
- Built with ChromaDB for vector storage
- Uses NetworkX for knowledge graph operations
- Powered by sentence-transformers for local embeddings
- Supports OpenAI, Anthropic, Google Gemini, and Ollama
Made with ❤️ for the AI community
Give your LLMs memory. Try MemLayer today!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memlayer-0.1.5.tar.gz.
File metadata
- Download URL: memlayer-0.1.5.tar.gz
- Upload date:
- Size: 106.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c7dba06f1e20b1dcc535401ebb0d1f59c74c8b45f0e67f109f1ec161eefd01f
|
|
| MD5 |
363cc1f7d648d7dde6d17b604e72b14e
|
|
| BLAKE2b-256 |
d43ecb4c8a9ef7647b82f4536cb34ed24f0f80861e6d89bb8e2add9d0c4e7498
|
File details
Details for the file memlayer-0.1.5-py3-none-any.whl.
File metadata
- Download URL: memlayer-0.1.5-py3-none-any.whl
- Upload date:
- Size: 77.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a421a46aef4d2f29c67f87d430bffd19912dc3be9daebe63c361dbe06c25329
|
|
| MD5 |
70eca87c2493ae6bd1eab988a96cdcf6
|
|
| BLAKE2b-256 |
8525c5a4551f3d69829f214161d91b02d2a1e4ccfd4c0fe3fa46ec8011fbfe5d
|