Capacity-bounded warm memory for LLM agents, with a LangGraph BaseStore implementation, embeddings-based importance scoring, and a comparative benchmark.
Project description
WarmMemory
WarmMemory is a Python package for short-term memory management in LLM agents. It adds a small in-process working-memory layer that keeps the most recent or most relevant interactions close to the agent, reducing repeated retrieval work and helping control prompt growth.
The repository provides:
- a reusable Python package for warm-memory buffering,
- a decorator for automatic interaction capture,
- a pluggable importance scoring interface,
- a deterministic benchmark for recency vs relevance vs fallback memory policies,
- a LangGraph
BaseStoreintegration with per-namespace eviction, embeddings-based ranking, and a pre-built agent, - HTML documentation for architecture and usage.
Why This Exists
Many agent systems use one of two expensive patterns:
- they keep appending conversation history to the prompt,
- or they query long-term memory on nearly every turn.
Both increase latency and cost. WarmMemory introduces a hot path:
- keep a small working set in RAM,
- retrieve from that working set first,
- fall back to longer-term retrieval only when needed,
- and send only a compact context window to the model.
Core Ideas
1. Sliding-Window Memory
The system can keep the last N interactions using recent(k).
2. Relevance-Aware Memory
Instead of only keeping the latest messages, the system can rank rows against the
current query using relevant(query, k) and compact the active working set with
retain_relevant(query, k).
3. Automatic Agent Capture
The @remember_interaction decorator records agent inputs and outputs without forcing
changes into the core agent logic.
4. Two-Tier Memory Architecture
The benchmark models a practical split:
- warm memory for fast in-process access,
- long-term memory for slower fallback retrieval.
Repository Layout
warm_memory/: package source codewarm_memory/buffer.py: Pandas-backed warm-memory storewarm_memory/scoring.py: scoring interface and default heuristic scorerwarm_memory/decorators.py: function decorator for interaction capturewarm_memory/benchmark.py: deterministic benchmark harnesswarm_memory/workload.py: synthetic workload for evaluationwarm_memory/langgraph/: LangGraph integration (optional extra)store.py:WarmStore(BaseStore)with per-namespace evictionembeddings.py: bring-your-own embeddings scoreragent.py: pre-builtbuild_warm_memory_agentgraphbenchmark.py: full-history vs vector-only vs warm-fallback benchmark
examples/langgraph_warm_agent.py: runnable LangGraph agent examplescripts/run_benchmark.py: legacy benchmark entrypointscripts/run_langgraph_benchmark.py: LangGraph-based benchmark entrypointreports/warm_memory_benchmark.md: legacy benchmark outputreports/warm_memory_langgraph_benchmark.md: LangGraph benchmark outputdocs/warm_memory_guide.html: public-facing HTML documentationtests/: unit tests
Installation
pip install warm-memory
# or with the LangGraph integration:
pip install warm-memory[langgraph]
Or install from source for development:
python3 -m pip install -e ".[langgraph]"
Quick Start
from warm_memory import WarmMemoryBuffer, remember_interaction
memory = WarmMemoryBuffer(capacity=8)
@remember_interaction(memory)
def agent(prompt: str) -> str:
if "billing" in prompt.lower():
return "Your invoice is available in the billing portal."
return f"Echo: {prompt}"
agent("How do I reset my password?")
agent("Where is my billing invoice?")
recent_rows = memory.recent(4)
relevant_rows = memory.relevant("invoice", limit=2)
memory.retain_relevant("invoice", limit=4)
Example Usage Pattern
Use WarmMemory in front of a larger memory system:
- Receive a new user query.
- Search the warm buffer first.
- If warm memory is sufficient, build a compact prompt from those rows.
- If warm memory is weak, fall back to long-term retrieval.
- Write the new interaction back into warm memory.
This pattern is useful for:
- coding agents,
- research assistants,
- task-oriented copilots,
- customer support agents,
- and any multi-turn system with repeated local context.
Benchmark
The repository includes a deterministic benchmark that compares:
recency: always use the latest warm-memory rows,relevance: rank and retain the top relevant warm-memory rows,fallback: use warm relevance first, then long-term retrieval on misses.
Run it with:
python3 scripts/run_benchmark.py
This writes a report to reports/warm_memory_benchmark.md.
On the current synthetic workload, the tradeoff looks like this:
recencyis the fastest policy,fallbackis the most accurate policy,relevancesits between the two and provides a cleaner hot working set.
The benchmark is designed to surface that tradeoff rather than name a single winner: each policy occupies a different point on the latency-accuracy curve.
Documentation
- HTML guide:
docs/warm_memory_guide.html - Benchmark report:
reports/warm_memory_benchmark.md - README visual:
docs/warm_memory_architecture.svg
The HTML guide explains:
- how the architecture works,
- where latency is saved,
- how to use the package,
- and how the components fit together.
Architecture
The pipeline:
- Agent Runtime receives the user query in a per-user namespace and
triggers two reads: a fast lookup against WarmMemory (the in-process
working set) and a Retrieval Ranker scoring pass over those rows
(
KeywordImportanceScorerby default; swap inEmbeddingsImportanceScorerfor semantic ranking). - Warm Hit? checks the best score against the configured threshold.
- Green path (warm hit): results flow to Prompt Builder, which injects only the top-K rows into the system prompt before invoking the LLM. The vector tier is never touched.
- Orange path (warm miss): the query falls through to Long-Term Memory
(LangGraph's
InMemoryStorewith an embedding index,PostgresStore, or anyBaseStore) and the LLM consumes those results as fallback. - Dashed write-back loop: the LLM response is captured by the decorator
and written back to WarmMemory (and mirrored to Long-Term Memory by the
memory_writenode), so future turns can recall it.
On the synthetic benchmark, ~50% of turns take the green path, eliminating that many vector-store calls.
The diagram ships in two paired formats:
docs/warm_memory_architecture.drawio.svg— the rendered SVG that GitHub displays inline. The decision arrows flow ("marching ants" SMIL animation) so the hot/cold paths read at a glance. Open the file directly in a browser to see the animation; GitHub also renders the animation when displaying the SVG.docs/warm_memory_architecture.drawio— the editable mxgraph source. Open at diagrams.net (File → Open from device) to edit; re-export the SVG when done.
The .drawio.svg also embeds the mxgraph XML in its content attribute, so
either file round-trips through the editor — they're kept in sync from the
same generator script.
For a richer narrated walkthrough, open
docs/warm_memory_guide.html locally or publish
it with GitHub Pages.
Development
Run tests:
python3 -m unittest discover -s tests -v
LangGraph Integration
WarmMemory ships an optional warm_memory.langgraph module that plugs directly
into the LangGraph ecosystem. Install with the extra:
pip install warm-memory[langgraph]
Drop-in BaseStore
WarmStore implements LangGraph's BaseStore interface with per-namespace
warm buffers — each namespace gets its own bounded buffer, so multi-tenant
agents don't evict each other's memory.
from warm_memory.langgraph import WarmStore
store = WarmStore(capacity=16)
store.put(("alice",), "preferences", {"text": "wants concise answers"})
store.put(("alice",), "billing", {"text": "invoice overdue", "topic": "billing"})
# query-based recall (keyword scorer by default)
hits = store.search(("alice",), query="how do I pay my invoice?")
# filter operators: $eq, $ne, $gt, $gte, $lt, $lte
billing = store.search(("alice",), filter={"topic": "billing"})
Bring-your-own embeddings
Swap the default keyword scorer for any LangChain Embeddings:
from langchain_openai import OpenAIEmbeddings
from warm_memory.langgraph import EmbeddingsImportanceScorer, WarmStore
scorer = EmbeddingsImportanceScorer(OpenAIEmbeddings())
store = WarmStore(scorer=scorer)
Works with any LangChain embeddings provider — OpenAI, HuggingFace, Voyage,
Anthropic — or DeterministicFakeEmbedding for tests.
Pre-built agent
build_warm_memory_agent returns a compiled LangGraph that reads warm memory
before responding and writes the new exchange back on the way out:
from warm_memory.langgraph import WarmStore, build_warm_memory_agent
store = WarmStore(capacity=8)
agent = build_warm_memory_agent(model=my_chat_model, store=store)
agent.invoke({"query": "Where's my invoice?", "namespace": ("alice",)})
A runnable example using FakeListChatModel (no API keys) lives at
examples/langgraph_warm_agent.py.
Comparative benchmark
scripts/run_langgraph_benchmark.py compares three retrieval strategies through
the LangGraph store API:
full-history: every prior turn in the prompt (naive baseline)vector-only: LangGraph'sInMemoryStorewith an embedding indexwarm-fallback:WarmStorein front of the vector store
python3 scripts/run_langgraph_benchmark.py
This writes reports/warm_memory_langgraph_benchmark.md. Run it with synthetic
embeddings by default; set WARM_BENCH_EMBEDDINGS=openai (and OPENAI_API_KEY)
to compare against real semantic search.
Roadmap
add an embedding-based or reranker-based importance scorer(done viaEmbeddingsImportanceScorer)compare against vector-store-first baselines(done viawarm-fallbackstrategy in the LangGraph benchmark)- benchmark against real agent traces instead of only synthetic workloads
- record actual model latency and token usage from a live LLM pipeline
- add charts and experiment summaries for publication-style reporting
- TTL support for the LangGraph
BaseStore publish(live at v0.2.1)warm-memoryto PyPI- propose inclusion in LangGraph's third-party store list (LangChain Forum proposal in flight)
License
This project is released under the MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file warm_memory-0.2.2.tar.gz.
File metadata
- Download URL: warm_memory-0.2.2.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11066768f086fc9e6f2a36691809da1dd2afa06a8bc789b96263dfb7d1fc6771
|
|
| MD5 |
ee01512bfa5c85d44c716d2e052d979b
|
|
| BLAKE2b-256 |
99f22d2cfaf8d9c9b1f75d25b71a33ea88352be2d9b47c17b689decd5c6aaa2a
|
Provenance
The following attestation bundles were made for warm_memory-0.2.2.tar.gz:
Publisher:
publish.yml on vsingh45/WarmMemory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warm_memory-0.2.2.tar.gz -
Subject digest:
11066768f086fc9e6f2a36691809da1dd2afa06a8bc789b96263dfb7d1fc6771 - Sigstore transparency entry: 1553908254
- Sigstore integration time:
-
Permalink:
vsingh45/WarmMemory@2abe482e6eda00addb99a03315029ad0018974a5 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/vsingh45
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2abe482e6eda00addb99a03315029ad0018974a5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file warm_memory-0.2.2-py3-none-any.whl.
File metadata
- Download URL: warm_memory-0.2.2-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
346387ef6bedcf03db9732333aeb0011d25e2c4bc6c74e63913a024b55db9cda
|
|
| MD5 |
0745a09e4fea4fa800f32b0b6aa4c4a6
|
|
| BLAKE2b-256 |
ce2cd95daa29b694a65675434ba0a95f01e9ae9a0b325c2e5cce3a4c73f5d7b2
|
Provenance
The following attestation bundles were made for warm_memory-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on vsingh45/WarmMemory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warm_memory-0.2.2-py3-none-any.whl -
Subject digest:
346387ef6bedcf03db9732333aeb0011d25e2c4bc6c74e63913a024b55db9cda - Sigstore transparency entry: 1553908271
- Sigstore integration time:
-
Permalink:
vsingh45/WarmMemory@2abe482e6eda00addb99a03315029ad0018974a5 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/vsingh45
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2abe482e6eda00addb99a03315029ad0018974a5 -
Trigger Event:
release
-
Statement type: