The memory layer for LLMs — add persistent, intelligent memory to any model in minutes.

These details have not been verified by PyPI

Project links

Project description

MemLayer – The Plug-and-play persistent memory for your LLMs

The memory layer for LLMs - add persistent, intelligent memory to any LLM in minutes.

MemLayer transforms stateless LLMs into memory-enabled AI assistants that remember context across conversations, extract structured knowledge, and proactively surface relevant information when needed.

MemLayer Overview

Features
Quick Start
Key Concepts
Memory Modes
Search Tiers
Providers
Advanced Features
Examples
Performance
Documentation
Contributing

Features

Universal LLM Support: Works with OpenAI, Claude, Gemini, Ollama models
Plug-and-play: Install with pip install memlayer and get started in minutes — minimal setup required.
Intelligent Memory Filtering: Three operation modes (LOCAL/ONLINE/LIGHTWEIGHT) automatically filter important information
Hybrid Search: Combines vector similarity + knowledge graph traversal for accurate retrieval
Three Search Tiers: Fast (<100ms), Balanced (<500ms), Deep (<2s) optimized for different use cases
Knowledge Graph: Automatically extracts entities, relationships, and facts from conversations
Proactive Reminders: Schedule tasks and get automatic reminders when they're due
Built-in Observability: Trace every search operation with detailed performance metrics
Flexible Storage: ChromaDB (vector) + NetworkX (graph) or graph-only mode
Production Ready: Serverless-friendly with fast cold starts using online mode

Quick Start

Installation

pip install memlayer

Basic Usage

from memlayer.wrappers.openai import OpenAI

# Initialize with memory capabilities
client = OpenAI(
    model="gpt-4.1-mini",
    storage_path="./memories",
    user_id="user_123"
)

# Store information automatically
client.chat([
    {"role": "user", "content": "My name is Alice and I work at TechCorp"}
])

# Retrieve information automatically (no manual prompting needed!)
response = client.chat([
    {"role": "user", "content": "Where do I work?"}
])
# Response: "You work at TechCorp."

That's it! MemLayer automatically:

✅ Filters salient information using ML-based classification
✅ Extracts structured facts, entities, and relationships
✅ Stores memories in hybrid vector + graph storage
✅ Retrieves relevant context for each query
✅ Injects memories seamlessly into LLM context

Key Concepts

Salience Filtering

Not all conversation content is worth storing. MemLayer uses salience gates to intelligently filter:

✅ Save: Facts, preferences, user info, decisions, relationships
❌ Skip: Greetings, acknowledgments, filler words, meta-conversation

Hybrid Storage

Memories are stored in two complementary systems:

Vector Store (ChromaDB): Semantic similarity search for facts
Knowledge Graph (NetworkX): Entity relationships and structured knowledge

Automatic Consolidation

After each conversation, background threads:

Extract facts, entities, and relationships using LLM
Store facts in vector database with embeddings
Build knowledge graph with entities and relationships
Index everything for fast retrieval

Memory Modes

MemLayer offers three modes that control both memory filtering (salience) and storage:

1. LOCAL Mode (Default)

client = OpenAI(salience_mode="local")

Filtering: Sentence-transformers ML model (high accuracy)
Storage: ChromaDB (vector) + NetworkX (graph)
Startup: ~10s (model loading)
Best for: High-volume production, offline apps
Cost: Free (no API calls)

2. ONLINE Mode

client = OpenAI(salience_mode="online")

Filtering: OpenAI embeddings API (high accuracy)
Storage: ChromaDB (vector) + NetworkX (graph)
Startup: ~2s (no model loading!)
Best for: Serverless, cloud functions, fast cold starts
Cost: ~$0.0001 per operation

3. LIGHTWEIGHT Mode

client = OpenAI(salience_mode="lightweight")

Filtering: Keyword-based (medium accuracy)
Storage: NetworkX only (no vector storage!)
Startup: <1s (instant)
Best for: Prototyping, testing, low-resource environments
Cost: Free (no embeddings at all)

Performance Comparison:

Mode          Startup Time    Accuracy    API Cost    Storage
──────────────────────────────────────────────────────────────
LOCAL         ~10s            High        Free        Vector+Graph
ONLINE        ~2s             High        $0.0001/op  Vector+Graph  
LIGHTWEIGHT   <1s             Medium      Free        Graph-only

Search Tiers

MemLayer provides three search tiers optimized for different latency requirements:

Fast Tier (<100ms)

# Automatic - LLM chooses based on query complexity
response = client.chat([{"role": "user", "content": "What's my name?"}])

2 vector search results
No graph traversal
Perfect for: Real-time chat, simple factual recall

Balanced Tier (<500ms) DEFAULT

# Automatic - handles most queries well
response = client.chat([{"role": "user", "content": "Tell me about my projects"}])

5 vector search results
No graph traversal
Perfect for: General conversation, most use cases

Deep Tier (<2s)

# Explicit request or auto-detected for complex queries
response = client.chat([{
    "role": "user",
    "content": "Use deep search: Tell me everything about Alice and her relationships"
}])

10 vector search results
Graph traversal enabled (entity extraction + 1-hop relationships)
Perfect for: Research, "tell me everything", multi-hop reasoning

🔌 Providers

MemLayer works with all major LLM providers:

OpenAI

from memlayer.wrappers.openai import OpenAI

client = OpenAI(
    model="gpt-4.1-mini",  # or gpt-4.1, gpt-5, etc.
    storage_path="./memories",
    user_id="user_123"
)

Claude (Anthropic)

from memlayer.wrappers.claude import Claude

client = Claude(
    model="claude-4-sonnet",
    storage_path="./memories",
    user_id="user_123"
)

Google Gemini

from memlayer.wrappers.gemini import Gemini

client = Gemini(
    model="gemini-2.5-flash",
    storage_path="./memories",
    user_id="user_123"
)

Ollama (Local)

from memlayer.wrappers.ollama import Ollama

client = Ollama(
    host="http://localhost:11434",
    model="qwen3:1.7b",  # or llama3.2, mistral, etc.
    storage_path="./memories",
    user_id="user_123",
    salience_mode="local"  # Run 100% offline!
)

All providers share the same API - switch between them seamlessly!

Advanced Features

Proactive Task Reminders

# User schedules a task
client.chat([{
    "role": "user",
    "content": "Remind me to submit the report next Friday at 9am"
}])

# Later, when the task is due, MemLayer automatically injects it
response = client.chat([{"role": "user", "content": "What should I do today?"}])
# Response includes: "Don't forget to submit the report - it's due today at 9am!"

Observability & Tracing

response = client.chat(messages)

# Inspect search performance
if client.last_trace:
    print(f"Search tier: {client.last_trace.events[0].metadata.get('tier')}")
    print(f"Total time: {client.last_trace.total_duration_ms}ms")
    
    for event in client.last_trace.events:
        print(f"  {event.event_type}: {event.duration_ms}ms")

Custom Salience Threshold

# Control memory filtering strictness
client = OpenAI(
    salience_threshold=-0.1  # Permissive (saves more)
    # salience_threshold=0.0   # Balanced (default)
    # salience_threshold=0.1   # Strict (saves less)
)

Knowledge Graph Extraction

# Manually extract structured knowledge
kg = client.analyze_and_extract_knowledge(
    "Alice leads Project Phoenix in the London office. The project uses Python and React."
)

print(kg["facts"])         # ["Alice leads Project Phoenix", ...]
print(kg["entities"])      # [{"name": "Alice", "type": "Person"}, ...]
print(kg["relationships"]) # [{"subject": "Alice", "predicate": "leads", "object": "Project Phoenix"}]

Examples

Explore the examples/ directory for comprehensive examples:

Basics

# Getting started
python examples/01_basics/getting_started.py

Search Tiers

# Try all three search tiers
python examples/02_search_tiers/fast_tier_example.py
python examples/02_search_tiers/balanced_tier_example.py
python examples/02_search_tiers/deep_tier_example.py

# Compare them side-by-side
python examples/02_search_tiers/tier_comparison.py

Advanced Features

# Proactive task reminders
python examples/03_features/task_reminders.py

# Knowledge graph visualization
python examples/03_features/test_knowledge_graph.py

Benchmarks

# Compare salience modes
python examples/04_benchmarks/compare_salience_modes.py

Providers

# Try different LLM providers
python examples/05_providers/openai_example.py
python examples/05_providers/claude_example.py
python examples/05_providers/gemini_example.py
python examples/05_providers/ollama_example.py

See examples/README.md for full documentation.

Performance

Salience Mode Comparison

Real-world startup times from benchmarks:

Mode          First Use    Memory Savings    Trade-off
─────────────────────────────────────────────────────────
LIGHTWEIGHT   ~5s          No embeddings     No semantic search
ONLINE        ~5s          5s faster         Small API cost
LOCAL         ~10s         No API cost       11s model loading

Search Tier Latency

Typical query latencies:

Tier        Latency    Vector Results    Graph    Use Case
────────────────────────────────────────────────────────────
Fast        50-150ms   2                 No       Real-time chat
Balanced    200-600ms  5                 No       General use
Deep        800-2500ms 10                Yes      Research queries

Memory Consolidation

Background processing (non-blocking):

Step                        Time      Async
──────────────────────────────────────────────
Salience filtering         ~10ms      Yes
Knowledge extraction       ~1-2s      Yes (background thread)
Vector storage             ~50ms      Yes
Graph storage              ~20ms      Yes
Total (non-blocking)       ~0ms       User doesn't wait!

Documentation

Getting Started

Basics Overview - Architecture, components, and how MemLayer works
Quickstart Guide - Get up and running in 5 minutes
Streaming Mode - Complete guide to streaming responses

Provider Setup

Providers Overview - Compare all providers, choose the right one
Ollama Setup - Run completely offline with local models
OpenAI - OpenAI configuration
Claude - Anthropic Claude setup
Gemini - Google Gemini configuration

Examples

Examples Index - Comprehensive examples by category
Provider Examples - Provider comparison and usage

Tunable features (quick index)

The project exposes several runtime/configuration knobs you can tune to match latency, cost, and accuracy trade-offs. Detailed docs for each area live in the docs/ folder:

docs/tuning/operation_mode.md — Architecture deep dive: How to choose between online, local, and lightweight modes, performance implications, storage composition, and deployment strategies.
docs/tuning/intervals.md — Scheduler and curation interval configuration (scheduler_interval_seconds, curation_interval_seconds) and practical guidance.
docs/tuning/salience_threshold.md — How to adjust salience_threshold and expected behavior.
docs/services/consolidation.md — Consolidation pipeline internals and how to call it programmatically (including update_from_text).
docs/services/curation.md — How memory curation works, archiving rules, and how to run/stop the curation service.
docs/storage/chroma.md — ChromaDB notes: metadata types, connection handling, and Windows file-lock guidance.
docs/storage/networkx.md — Knowledge graph persistence, expected node schemas, and backup/restore tips.

Use the docs when tuning for production. The following docs/ files were added to this repository and provide detailed, practical guidance.

Development

Setup

# Clone repository
git clone https://github.com/divagr18/memlayer.git
cd memlayer

# Install dependencies
pip install -e .

# Run tests
python -m pytest tests/

# Run examples
python examples/01_basics/getting_started.py

Project Structure

memlayer/
├── memlayer/           # Core library
│   ├── wrappers/          # LLM provider wrappers
│   ├── storage/           # Storage backends (ChromaDB, NetworkX)
│   ├── services.py        # Search & consolidation services
│   ├── ml_gate.py         # Salience filtering
│   └── embedding_models.py # Embedding model implementations
├── examples/              # Organized examples by category
│   ├── 01_basics/
│   ├── 02_search_tiers/
│   ├── 03_features/
│   ├── 04_benchmarks/
│   └── 05_providers/
├── tests/                 # Tests and benchmarks
├── docs/                  # Documentation
└── README.md              # This file

Contributing

Contributions are welcome! Here's how you can help:

Report bugs - Open an issue with reproduction steps
Suggest features - Share your use case and requirements
Submit PRs - Fix bugs, add features, improve docs
Share examples - Show us what you've built!

Please keep PRs focused and include tests for new features.

Contact & Support

Author/Maintainer: Divyansh Agrawal
Email: keshav.r.1925@gmail.com
GitHub: divagr18
Issues: Report bugs or request features via GitHub Issues

For security vulnerabilities, please email directly with SECURITY in the subject line instead of opening a public issue.

License

MIT License - see LICENSE for details.

Acknowledgments

Built with ChromaDB for vector storage
Uses NetworkX for knowledge graph operations
Powered by sentence-transformers for local embeddings
Supports OpenAI, Anthropic, Google Gemini, and Ollama

Made with ❤️ for the AI community

Give your LLMs memory. Try MemLayer today!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.8

Nov 22, 2025

0.1.7

Nov 21, 2025

0.1.6

Nov 21, 2025

This version

0.1.5

Nov 17, 2025

0.1.3

Nov 16, 2025

0.1.2

Nov 16, 2025

0.1.1

Nov 16, 2025

0.1.0

Nov 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memlayer-0.1.5.tar.gz (106.6 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memlayer-0.1.5-py3-none-any.whl (77.3 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file memlayer-0.1.5.tar.gz.

File metadata

Download URL: memlayer-0.1.5.tar.gz
Upload date: Nov 17, 2025
Size: 106.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for memlayer-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`2c7dba06f1e20b1dcc535401ebb0d1f59c74c8b45f0e67f109f1ec161eefd01f`
MD5	`363cc1f7d648d7dde6d17b604e72b14e`
BLAKE2b-256	`d43ecb4c8a9ef7647b82f4536cb34ed24f0f80861e6d89bb8e2add9d0c4e7498`

See more details on using hashes here.

File details

Details for the file memlayer-0.1.5-py3-none-any.whl.

File metadata

Download URL: memlayer-0.1.5-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 77.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for memlayer-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a421a46aef4d2f29c67f87d430bffd19912dc3be9daebe63c361dbe06c25329`
MD5	`70eca87c2493ae6bd1eab988a96cdcf6`
BLAKE2b-256	`8525c5a4551f3d69829f214161d91b02d2a1e4ccfd4c0fe3fa46ec8011fbfe5d`

See more details on using hashes here.

memlayer 0.1.5

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

MemLayer – The Plug-and-play persistent memory for your LLMs

Contents

Features

Quick Start

Installation

Basic Usage

Key Concepts

Salience Filtering

Hybrid Storage

Automatic Consolidation

Memory Modes

1. LOCAL Mode (Default)

2. ONLINE Mode

3. LIGHTWEIGHT Mode

Search Tiers

Fast Tier (<100ms)

Balanced Tier (<500ms) DEFAULT

Deep Tier (<2s)

🔌 Providers

OpenAI

Claude (Anthropic)

Google Gemini

Ollama (Local)

Advanced Features

Proactive Task Reminders

Observability & Tracing

Custom Salience Threshold

Knowledge Graph Extraction

Examples

Basics

Search Tiers

Advanced Features

Benchmarks

Providers

Performance

Salience Mode Comparison

Search Tier Latency

Memory Consolidation

Documentation

Getting Started

Provider Setup

Examples

Tunable features (quick index)

Development

Setup

Project Structure

Contributing

Contact & Support

License

Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes