Skip to main content

Smart memory management for LLM conversations - topic-aware compression that just works

Project description

LLMem

Smart memory management for LLM conversations - topic-aware compression that just works.

Features

  • LLM-agnostic - Works with OpenAI, Gemini, Anthropic, local models, or any LLM
  • Topic-aware compression - Intelligently compresses based on conversation topics, not just token count
  • Storage-agnostic - Works with PostgreSQL, MongoDB, or in-memory
  • LangChain/LangGraph compatible - Works seamlessly with popular frameworks
  • Zero-config start - Works out of the box with smart defaults
  • Multi-user safe - Thread isolation for millions of users via thread_id
  • Fast - Target <100ms for context retrieval

Installation

pip install llmem

With optional dependencies:

pip install llmem[postgres]    # PostgreSQL storage
pip install llmem[mongo]       # MongoDB storage
pip install llmem[all]         # Everything

Quick Start

from llmem import Memory

# Create memory (zero config)
memory = Memory()

# Add conversation turns
memory.add("How do I setup my VR headset?", role="user")
memory.add("To setup your VR headset, first...", role="assistant")
memory.add("What games do you recommend?", role="user")
memory.add("I recommend these games...", role="assistant")

# Get optimized context for next LLM call
context = memory.get_context()

# Check health
health = memory.check_health()
print(f"Status: {health.status.value}, Tokens: {health.token_count}")

With Persistent Storage

PostgreSQL

import asyncpg
from llmem import Memory
from llmem.storage.postgres import PostgresStorage

pool = await asyncpg.create_pool("postgresql://user:pass@localhost/db")
storage = PostgresStorage(pool=pool)
memory = Memory(storage=storage)

# Thread ID for multi-user isolation
memory.add("Hello", role="user", thread_id="user-123")
context = memory.get_context(thread_id="user-123")

MongoDB

from motor.motor_asyncio import AsyncIOMotorClient
from llmem import Memory
from llmem.storage.mongo import MongoStorage

client = AsyncIOMotorClient("mongodb://localhost:27017")
storage = MongoStorage(db=client.mydb)
memory = Memory(storage=storage)

With Any LLM

LLMem is LLM-agnostic - it manages conversation memory, you bring your own model:

from llmem import Memory

memory = Memory()

# Add user message
memory.add(user_input, role="user")

# Get optimized context
context = memory.get_context()

# Use with ANY LLM - OpenAI, Gemini, Anthropic, local models, etc.
response = your_llm.generate(context)

# Track response
memory.add(response, role="assistant")

OpenAI Example

from openai import OpenAI
from llmem import Memory

client = OpenAI()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

response = client.chat.completions.create(
    model="your-model",
    messages=context
)
memory.add(response.choices[0].message.content, role="assistant")

Google Gemini Example

import google.generativeai as genai
from llmem import Memory

genai.configure(api_key="your-key")
model = genai.GenerativeModel("your-model")
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()
response = model.generate_content(str(context))
memory.add(response.text, role="assistant")

Anthropic Claude Example

from anthropic import Anthropic
from llmem import Memory

client = Anthropic()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

response = client.messages.create(
    model="your-model",
    messages=context
)
memory.add(response.content[0].text, role="assistant")

With LangChain (Any Provider)

from langchain_core.messages import HumanMessage, AIMessage
from llmem import Memory

# Use any LangChain-supported LLM
# from langchain_openai import ChatOpenAI
# from langchain_google_genai import ChatGoogleGenerativeAI
# from langchain_anthropic import ChatAnthropic

llm = YourLangChainLLM()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

# Convert to LangChain messages
messages = [HumanMessage(content=m["content"]) if m["role"] == "user" 
            else AIMessage(content=m["content"]) for m in context]

response = llm.invoke(messages)
memory.add(response.content, role="assistant")

Health Monitoring

health = memory.check_health()
print(f"Status: {health.status.value}")        # healthy, warning, critical
print(f"Token usage: {health.token_usage:.1%}")
print(f"Recommendation: {health.recommendation.value}")

stats = memory.get_stats()
print(f"Total turns: {stats['total_turns']}")
print(f"Total tokens: {stats['total_tokens']}")

Callbacks

memory = Memory(
    on_compress=lambda info: print(f"Compressed: {info}"),
    on_health_change=lambda health: print(f"Health: {health.status.value}")
)

Examples

See the examples/ folder for complete working demos:

Example Description
01_basic_usage.py Core functionality - add, get, health, stats
02_callbacks.py Compression and health callbacks
03_multi_user.py Thread isolation for multi-user apps
04_with_openai.py Integration with OpenAI GPT
04_with_gemini.py Integration with Google Gemini
05_langchain_integration.py LangChain with any LLM provider
06_langgraph_integration.py LangGraph agents
07_postgres_storage.py PostgreSQL persistent storage
08_mongodb_storage.py MongoDB persistent storage
09_e2e_agent_test.py End-to-end test with all backends
10_custom_storage.py Build your own storage backend

Running Examples

# Clone and setup
git clone https://github.com/sharanharsoor/llmem.git
cd llmem
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"

# Create .env file with your credentials
echo "GOOGLE_API_KEY=your-key" > .env
echo "DATABASE_URL=postgresql://user:pass@localhost/db" >> .env
echo "MONGODB_URL=mongodb://localhost:27017" >> .env

# Run examples
python examples/01_basic_usage.py
python examples/04_with_gemini.py

API Reference

Memory Class

Method Description
add(content, role, thread_id=None) Add a conversation turn
get_context(thread_id=None) Get optimized context
get_context_for(query, thread_id=None) Get context relevant to query
check_health(thread_id=None) Get context health metrics
get_stats(thread_id=None) Get statistics
compress(thread_id=None) Force compression
clear(thread_id=None) Clear memory

Storage Backends

Backend Description
InMemoryStorage Default, no persistence
PostgresStorage PostgreSQL with asyncpg
MongoStorage MongoDB with motor
Custom Implement StorageBackend for any database

Custom Storage Backend

LLMem supports any database. Implement the StorageBackend interface:

from llmem.storage.base import StorageBackend
from llmem.types import Turn, Topic

class MyCustomStorage(StorageBackend):
    """Your custom storage (Redis, SQLite, DynamoDB, etc.)"""
    
    async def save_turn(self, turn: Turn, thread_id: str) -> None:
        # Save turn to your database
        pass
    
    async def get_turns(self, thread_id: str, limit=None, offset=0) -> list:
        # Retrieve turns from your database
        pass
    
    async def get_turn_count(self, thread_id: str) -> int:
        # Return count of turns
        pass
    
    async def update_turn(self, turn: Turn, thread_id: str) -> None:
        # Update existing turn
        pass
    
    async def delete_turns(self, turn_ids: list, thread_id: str) -> None:
        # Delete specific turns
        pass
    
    async def clear(self, thread_id: str) -> None:
        # Clear all turns for thread
        pass

# Use your custom storage
storage = MyCustomStorage()
memory = Memory(storage=storage)

See examples/10_custom_storage.py for complete Redis and SQLite reference implementations.

Configuration

memory = Memory(
    max_tokens=128000,          # Max context tokens
    compression_threshold=0.7,  # Compress at 70% usage
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_smartmem-0.1.0.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_smartmem-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_smartmem-0.1.0.tar.gz.

File metadata

  • Download URL: llm_smartmem-0.1.0.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for llm_smartmem-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3055a3ebc4448f706d25514589cb10edd5c7cc6c48a3bc9dea8ed8515a1a161e
MD5 bbbacc27473f61dfbfc45074ac0cb457
BLAKE2b-256 bd89ed9ddd1a1c0aed213a6229e6fd6d85157a5d83936f64c647648dbc2bb817

See more details on using hashes here.

File details

Details for the file llm_smartmem-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_smartmem-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for llm_smartmem-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea3de80761d829c5ecabc65717af4b38c91b67f4e22ef4a5e3f49889c8f8ec29
MD5 4031c52c688244f3ede79c8be397ad99
BLAKE2b-256 cab9ac226cc2a1a262c381bb7eee623564cd8a3ffae7e433a558cfabb53c42df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page