Skip to main content

Smart memory management for LLM conversations - topic-aware compression that just works

Project description

LLM SmartMem

PyPI version Python License Tests

Smart memory management for LLM conversations - topic-aware compression that just works.

Features

  • LLM-agnostic - Works with OpenAI, Gemini, Anthropic, local models, or any LLM
  • Topic-aware compression - Intelligently compresses based on conversation topics, not just token count
  • Storage-agnostic - Works with PostgreSQL, MongoDB, or in-memory
  • LangChain/LangGraph compatible - Works seamlessly with popular frameworks
  • Zero-config start - Works out of the box with smart defaults
  • Multi-user safe - Thread isolation for millions of users via thread_id
  • Fast - Target <100ms for context retrieval

Installation

pip install llm-smartmem

With optional dependencies:

pip install llm-smartmem[postgres]    # PostgreSQL storage
pip install llm-smartmem[mongo]       # MongoDB storage
pip install llm-smartmem[all]         # Everything

Quick Start

from llmem import Memory

# Create memory (zero config)
memory = Memory()

# Add conversation turns
memory.add("How do I setup my VR headset?", role="user")
memory.add("To setup your VR headset, first...", role="assistant")
memory.add("What games do you recommend?", role="user")
memory.add("I recommend these games...", role="assistant")

# Get optimized context for next LLM call
context = memory.get_context()

# Check health
health = memory.check_health()
print(f"Status: {health.status.value}, Tokens: {health.token_count}")

With Persistent Storage

PostgreSQL

import asyncpg
from llmem import Memory
from llmem.storage.postgres import PostgresStorage

pool = await asyncpg.create_pool("postgresql://user:pass@localhost/db")
storage = PostgresStorage(pool=pool)
memory = Memory(storage=storage)

# Thread ID for multi-user isolation
memory.add("Hello", role="user", thread_id="user-123")
context = memory.get_context(thread_id="user-123")

MongoDB

from motor.motor_asyncio import AsyncIOMotorClient
from llmem import Memory
from llmem.storage.mongo import MongoStorage

client = AsyncIOMotorClient("mongodb://localhost:27017")
storage = MongoStorage(db=client.mydb)
memory = Memory(storage=storage)

With Any LLM

LLMem is LLM-agnostic - it manages conversation memory, you bring your own model:

from llmem import Memory

memory = Memory()

# Add user message
memory.add(user_input, role="user")

# Get optimized context
context = memory.get_context()

# Use with ANY LLM - OpenAI, Gemini, Anthropic, local models, etc.
response = your_llm.generate(context)

# Track response
memory.add(response, role="assistant")

OpenAI Example

from openai import OpenAI
from llmem import Memory

client = OpenAI()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

response = client.chat.completions.create(
    model="your-model",
    messages=context
)
memory.add(response.choices[0].message.content, role="assistant")

Google Gemini Example

import google.generativeai as genai
from llmem import Memory

genai.configure(api_key="your-key")
model = genai.GenerativeModel("your-model")
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()
response = model.generate_content(str(context))
memory.add(response.text, role="assistant")

Anthropic Claude Example

from anthropic import Anthropic
from llmem import Memory

client = Anthropic()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

response = client.messages.create(
    model="your-model",
    messages=context
)
memory.add(response.content[0].text, role="assistant")

With LangChain (Any Provider)

from langchain_core.messages import HumanMessage, AIMessage
from llmem import Memory

# Use any LangChain-supported LLM
# from langchain_openai import ChatOpenAI
# from langchain_google_genai import ChatGoogleGenerativeAI
# from langchain_anthropic import ChatAnthropic

llm = YourLangChainLLM()
memory = Memory()

memory.add(user_input, role="user")
context = memory.get_context()

# Convert to LangChain messages
messages = [HumanMessage(content=m["content"]) if m["role"] == "user" 
            else AIMessage(content=m["content"]) for m in context]

response = llm.invoke(messages)
memory.add(response.content, role="assistant")

Health Monitoring

health = memory.check_health()
print(f"Status: {health.status.value}")        # healthy, warning, critical
print(f"Token usage: {health.token_usage:.1%}")
print(f"Recommendation: {health.recommendation.value}")

stats = memory.get_stats()
print(f"Total turns: {stats['total_turns']}")
print(f"Total tokens: {stats['total_tokens']}")

Callbacks

memory = Memory(
    on_compress=lambda info: print(f"Compressed: {info}"),
    on_health_change=lambda health: print(f"Health: {health.status.value}")
)

Examples

See the examples/ folder for complete working demos:

Example Description
01_basic_usage.py Core functionality - add, get, health, stats
02_callbacks.py Compression and health callbacks
03_multi_user.py Thread isolation for multi-user apps
04_with_openai.py Integration with OpenAI GPT
04_with_gemini.py Integration with Google Gemini
05_langchain_integration.py LangChain with any LLM provider
06_langgraph_integration.py LangGraph agents
07_postgres_storage.py PostgreSQL persistent storage
08_mongodb_storage.py MongoDB persistent storage
09_e2e_agent_test.py End-to-end test with all backends
10_custom_storage.py Build your own storage backend

Running Examples

# Clone and setup
git clone https://github.com/sharanharsoor/llmem.git
cd llmem
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"

# Create .env file with your credentials
echo "GOOGLE_API_KEY=your-key" > .env
echo "DATABASE_URL=postgresql://user:pass@localhost/db" >> .env
echo "MONGODB_URL=mongodb://localhost:27017" >> .env

# Run examples
python examples/01_basic_usage.py
python examples/04_with_gemini.py

API Reference

Memory Class

Method Description
add(content, role, thread_id=None) Add a conversation turn
get_context(thread_id=None) Get optimized context
get_context_for(query, thread_id=None) Get context relevant to query
check_health(thread_id=None) Get context health metrics
get_stats(thread_id=None) Get statistics
compress(thread_id=None) Force compression
clear(thread_id=None) Clear memory

Storage Backends

Backend Description
InMemoryStorage Default, no persistence
PostgresStorage PostgreSQL with asyncpg
MongoStorage MongoDB with motor
Custom Implement StorageBackend for any database

Custom Storage Backend

LLMem supports any database. Implement the StorageBackend interface:

from llmem.storage.base import StorageBackend
from llmem.types import Turn, Topic

class MyCustomStorage(StorageBackend):
    """Your custom storage (Redis, SQLite, DynamoDB, etc.)"""
    
    async def save_turn(self, turn: Turn, thread_id: str) -> None:
        # Save turn to your database
        pass
    
    async def get_turns(self, thread_id: str, limit=None, offset=0) -> list:
        # Retrieve turns from your database
        pass
    
    async def get_turn_count(self, thread_id: str) -> int:
        # Return count of turns
        pass
    
    async def update_turn(self, turn: Turn, thread_id: str) -> None:
        # Update existing turn
        pass
    
    async def delete_turns(self, turn_ids: list, thread_id: str) -> None:
        # Delete specific turns
        pass
    
    async def clear(self, thread_id: str) -> None:
        # Clear all turns for thread
        pass

# Use your custom storage
storage = MyCustomStorage()
memory = Memory(storage=storage)

See examples/10_custom_storage.py for complete Redis and SQLite reference implementations.

Configuration

memory = Memory(
    max_tokens=128000,          # Max context tokens
    compression_threshold=0.7,  # Compress at 70% usage
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_smartmem-0.1.1.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_smartmem-0.1.1-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_smartmem-0.1.1.tar.gz.

File metadata

  • Download URL: llm_smartmem-0.1.1.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_smartmem-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7c230ff137926adf6b93763c63c1108e4231bdaf332f5aa4ad958d18bfe112e4
MD5 849b4f8ad3da73c6fed1d501f40cd9fa
BLAKE2b-256 37f65941c5c7a28a4cd8b86f5863c17b8e0e9afdf1fd3d869ab3a530730ca1ab

See more details on using hashes here.

File details

Details for the file llm_smartmem-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_smartmem-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_smartmem-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a77ec53f7566687ea46adc7c5dedc0f4f340bd4523ce796d2a645066f57fcd6a
MD5 f0bbda640061cffaa117cf4123131aee
BLAKE2b-256 23f1adc0a32834f73d3ae7a921a0058dd3441e08251389478d9a191e61304dd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page