Skip to main content

Capacity-bounded warm memory for LLM agents, with a LangGraph BaseStore implementation, embeddings-based importance scoring, and a comparative benchmark.

Project description

WarmMemory

CI Python License: MIT LangGraph

WarmMemory is a Python package for short-term memory management in LLM agents. It adds a small in-process working-memory layer that keeps the most recent or most relevant interactions close to the agent, reducing repeated retrieval work and helping control prompt growth.

The repository provides:

  • a reusable Python package for warm-memory buffering,
  • a decorator for automatic interaction capture,
  • a pluggable importance scoring interface,
  • a deterministic benchmark for recency vs relevance vs fallback memory policies,
  • a LangGraph BaseStore integration with per-namespace eviction, embeddings-based ranking, and a pre-built agent,
  • HTML documentation for architecture and usage.

Why This Exists

Many agent systems use one of two expensive patterns:

  • they keep appending conversation history to the prompt,
  • or they query long-term memory on nearly every turn.

Both increase latency and cost. WarmMemory introduces a hot path:

  • keep a small working set in RAM,
  • retrieve from that working set first,
  • fall back to longer-term retrieval only when needed,
  • and send only a compact context window to the model.

Core Ideas

1. Sliding-Window Memory

The system can keep the last N interactions using recent(k).

2. Relevance-Aware Memory

Instead of only keeping the latest messages, the system can rank rows against the current query using relevant(query, k) and compact the active working set with retain_relevant(query, k).

3. Automatic Agent Capture

The @remember_interaction decorator records agent inputs and outputs without forcing changes into the core agent logic.

4. Two-Tier Memory Architecture

The benchmark models a practical split:

  • warm memory for fast in-process access,
  • long-term memory for slower fallback retrieval.

Repository Layout

  • warm_memory/: package source code
  • warm_memory/buffer.py: Pandas-backed warm-memory store
  • warm_memory/scoring.py: scoring interface and default heuristic scorer
  • warm_memory/decorators.py: function decorator for interaction capture
  • warm_memory/benchmark.py: deterministic benchmark harness
  • warm_memory/workload.py: synthetic workload for evaluation
  • warm_memory/langgraph/: LangGraph integration (optional extra)
    • store.py: WarmStore(BaseStore) with per-namespace eviction
    • embeddings.py: bring-your-own embeddings scorer
    • agent.py: pre-built build_warm_memory_agent graph
    • benchmark.py: full-history vs vector-only vs warm-fallback benchmark
  • examples/langgraph_warm_agent.py: runnable LangGraph agent example
  • scripts/run_benchmark.py: legacy benchmark entrypoint
  • scripts/run_langgraph_benchmark.py: LangGraph-based benchmark entrypoint
  • reports/warm_memory_benchmark.md: legacy benchmark output
  • reports/warm_memory_langgraph_benchmark.md: LangGraph benchmark output
  • docs/warm_memory_guide.html: public-facing HTML documentation
  • tests/: unit tests

Installation

python3 -m pip install -e .

Quick Start

from warm_memory import WarmMemoryBuffer, remember_interaction

memory = WarmMemoryBuffer(capacity=8)

@remember_interaction(memory)
def agent(prompt: str) -> str:
    if "billing" in prompt.lower():
        return "Your invoice is available in the billing portal."
    return f"Echo: {prompt}"

agent("How do I reset my password?")
agent("Where is my billing invoice?")

recent_rows = memory.recent(4)
relevant_rows = memory.relevant("invoice", limit=2)
memory.retain_relevant("invoice", limit=4)

Example Usage Pattern

Use WarmMemory in front of a larger memory system:

  1. Receive a new user query.
  2. Search the warm buffer first.
  3. If warm memory is sufficient, build a compact prompt from those rows.
  4. If warm memory is weak, fall back to long-term retrieval.
  5. Write the new interaction back into warm memory.

This pattern is useful for:

  • coding agents,
  • research assistants,
  • task-oriented copilots,
  • customer support agents,
  • and any multi-turn system with repeated local context.

Benchmark

The repository includes a deterministic benchmark that compares:

  • recency: always use the latest warm-memory rows,
  • relevance: rank and retain the top relevant warm-memory rows,
  • fallback: use warm relevance first, then long-term retrieval on misses.

Run it with:

python3 scripts/run_benchmark.py

This writes a report to reports/warm_memory_benchmark.md.

On the current synthetic workload, the tradeoff looks like this:

  • recency is the fastest policy,
  • fallback is the most accurate policy,
  • relevance sits between the two and provides a cleaner hot working set.

The benchmark is designed to surface that tradeoff rather than name a single winner: each policy occupies a different point on the latency-accuracy curve.

Documentation

  • HTML guide: docs/warm_memory_guide.html
  • Benchmark report: reports/warm_memory_benchmark.md
  • README visual: docs/warm_memory_architecture.svg

The HTML guide explains:

  • how the architecture works,
  • where latency is saved,
  • how to use the package,
  • and how the components fit together.

Architecture Preview

WarmMemory Architecture

For a richer visual walkthrough, open docs/warm_memory_guide.html locally or publish it with GitHub Pages.

Development

Run tests:

python3 -m unittest discover -s tests -v

LangGraph Integration

WarmMemory ships an optional warm_memory.langgraph module that plugs directly into the LangGraph ecosystem. Install the extra:

python3 -m pip install -e ".[langgraph]"

Drop-in BaseStore

WarmStore implements LangGraph's BaseStore interface with per-namespace warm buffers — each namespace gets its own bounded buffer, so multi-tenant agents don't evict each other's memory.

from warm_memory.langgraph import WarmStore

store = WarmStore(capacity=16)
store.put(("alice",), "preferences", {"text": "wants concise answers"})
store.put(("alice",), "billing", {"text": "invoice overdue", "topic": "billing"})

# query-based recall (keyword scorer by default)
hits = store.search(("alice",), query="how do I pay my invoice?")

# filter operators: $eq, $ne, $gt, $gte, $lt, $lte
billing = store.search(("alice",), filter={"topic": "billing"})

Bring-your-own embeddings

Swap the default keyword scorer for any LangChain Embeddings:

from langchain_openai import OpenAIEmbeddings
from warm_memory.langgraph import EmbeddingsImportanceScorer, WarmStore

scorer = EmbeddingsImportanceScorer(OpenAIEmbeddings())
store = WarmStore(scorer=scorer)

Works with any LangChain embeddings provider — OpenAI, HuggingFace, Voyage, Anthropic — or DeterministicFakeEmbedding for tests.

Pre-built agent

build_warm_memory_agent returns a compiled LangGraph that reads warm memory before responding and writes the new exchange back on the way out:

from warm_memory.langgraph import WarmStore, build_warm_memory_agent

store = WarmStore(capacity=8)
agent = build_warm_memory_agent(model=my_chat_model, store=store)
agent.invoke({"query": "Where's my invoice?", "namespace": ("alice",)})

A runnable example using FakeListChatModel (no API keys) lives at examples/langgraph_warm_agent.py.

Comparative benchmark

scripts/run_langgraph_benchmark.py compares three retrieval strategies through the LangGraph store API:

  • full-history: every prior turn in the prompt (naive baseline)
  • vector-only: LangGraph's InMemoryStore with an embedding index
  • warm-fallback: WarmStore in front of the vector store
python3 scripts/run_langgraph_benchmark.py

This writes reports/warm_memory_langgraph_benchmark.md. Run it with synthetic embeddings by default; set WARM_BENCH_EMBEDDINGS=openai (and OPENAI_API_KEY) to compare against real semantic search.

Roadmap

  • add an embedding-based or reranker-based importance scorer (done via EmbeddingsImportanceScorer)
  • compare against vector-store-first baselines (done via warm-fallback strategy in the LangGraph benchmark)
  • benchmark against real agent traces instead of only synthetic workloads
  • record actual model latency and token usage from a live LLM pipeline
  • add charts and experiment summaries for publication-style reporting
  • TTL support for the LangGraph BaseStore
  • publish warm-memory to PyPI and propose inclusion in LangGraph's third-party store list

License

This project is released under the MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

warm_memory-0.2.1.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

warm_memory-0.2.1-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file warm_memory-0.2.1.tar.gz.

File metadata

  • Download URL: warm_memory-0.2.1.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9c3ef82586b64d5dce8f3cb4759696d9f0fbe0c1aa5d5da66966f895f0daff63
MD5 57a1c9344894a821fbc98be9efdad1db
BLAKE2b-256 98509e9eaf0ffba796fda98de090bb6245f5a43cf96753b78be9093776d1a665

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.1.tar.gz:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file warm_memory-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: warm_memory-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 680343fef30c699b43688054c54b605960fcdeb80374cd57317a592d2b77e9ea
MD5 65a0d51ce03da9d234a71c1abd29aeb4
BLAKE2b-256 a42c2223b3f8c1e51e6047cabc7fa9bf996cc136759aef54ad7d4234ba3ed1f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.1-py3-none-any.whl:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page