Capacity-bounded warm memory for LLM agents, with a LangGraph BaseStore implementation, embeddings-based importance scoring, and a comparative benchmark.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vsingh45

These details have not been verified by PyPI

Project description

WarmMemory

WarmMemory is a Python package for short-term memory management in LLM agents. It adds a small in-process working-memory layer that keeps the most recent or most relevant interactions close to the agent, reducing repeated retrieval work and helping control prompt growth.

The repository provides:

a reusable Python package for warm-memory buffering,
a decorator for automatic interaction capture,
a pluggable importance scoring interface,
a deterministic benchmark for recency vs relevance vs fallback memory policies,
a LangGraph BaseStore integration with per-namespace eviction, embeddings-based ranking, and a pre-built agent,
HTML documentation for architecture and usage.

Why This Exists

Many agent systems use one of two expensive patterns:

they keep appending conversation history to the prompt,
or they query long-term memory on nearly every turn.

Both increase latency and cost. WarmMemory introduces a hot path:

keep a small working set in RAM,
retrieve from that working set first,
fall back to longer-term retrieval only when needed,
and send only a compact context window to the model.

Core Ideas

1. Sliding-Window Memory

The system can keep the last N interactions using recent(k).

2. Relevance-Aware Memory

Instead of only keeping the latest messages, the system can rank rows against the current query using relevant(query, k) and compact the active working set with retain_relevant(query, k).

3. Automatic Agent Capture

The @remember_interaction decorator records agent inputs and outputs without forcing changes into the core agent logic.

4. Two-Tier Memory Architecture

The benchmark models a practical split:

warm memory for fast in-process access,
long-term memory for slower fallback retrieval.

Repository Layout

warm_memory/: package source code
warm_memory/buffer.py: Pandas-backed warm-memory store
warm_memory/scoring.py: scoring interface and default heuristic scorer
warm_memory/decorators.py: function decorator for interaction capture
warm_memory/benchmark.py: deterministic benchmark harness
warm_memory/workload.py: synthetic workload for evaluation
warm_memory/langgraph/: LangGraph integration (optional extra)
- store.py: WarmStore(BaseStore) with per-namespace eviction
- embeddings.py: bring-your-own embeddings scorer
- agent.py: pre-built build_warm_memory_agent graph
- benchmark.py: full-history vs vector-only vs warm-fallback benchmark
examples/langgraph_warm_agent.py: runnable LangGraph agent example
scripts/run_benchmark.py: legacy benchmark entrypoint
scripts/run_langgraph_benchmark.py: LangGraph-based benchmark entrypoint
reports/warm_memory_benchmark.md: legacy benchmark output
reports/warm_memory_langgraph_benchmark.md: LangGraph benchmark output
docs/warm_memory_guide.html: public-facing HTML documentation
tests/: unit tests

Installation

pip install warm-memory
# or with the LangGraph integration:
pip install warm-memory[langgraph]

Or install from source for development:

python3 -m pip install -e ".[langgraph]"

Quick Start

from warm_memory import WarmMemoryBuffer, remember_interaction

memory = WarmMemoryBuffer(capacity=8)

@remember_interaction(memory)
def agent(prompt: str) -> str:
    if "billing" in prompt.lower():
        return "Your invoice is available in the billing portal."
    return f"Echo: {prompt}"

agent("How do I reset my password?")
agent("Where is my billing invoice?")

recent_rows = memory.recent(4)
relevant_rows = memory.relevant("invoice", limit=2)
memory.retain_relevant("invoice", limit=4)

Example Usage Pattern

Use WarmMemory in front of a larger memory system:

Receive a new user query.
Search the warm buffer first.
If warm memory is sufficient, build a compact prompt from those rows.
If warm memory is weak, fall back to long-term retrieval.
Write the new interaction back into warm memory.

This pattern is useful for:

coding agents,
research assistants,
task-oriented copilots,
customer support agents,
and any multi-turn system with repeated local context.

Benchmark

The repository includes a deterministic benchmark that compares:

recency: always use the latest warm-memory rows,
relevance: rank and retain the top relevant warm-memory rows,
fallback: use warm relevance first, then long-term retrieval on misses.

Run it with:

python3 scripts/run_benchmark.py

This writes a report to reports/warm_memory_benchmark.md.

On the current synthetic workload, the tradeoff looks like this:

recency is the fastest policy,
fallback is the most accurate policy,
relevance sits between the two and provides a cleaner hot working set.

The benchmark is designed to surface that tradeoff rather than name a single winner: each policy occupies a different point on the latency-accuracy curve.

Documentation

HTML guide: docs/warm_memory_guide.html
Benchmark report: reports/warm_memory_benchmark.md
README visual: docs/warm_memory_architecture.svg

The HTML guide explains:

how the architecture works,
where latency is saved,
how to use the package,
and how the components fit together.

Architecture

WarmMemory architecture

The pipeline:

Agent Runtime receives the user query in a per-user namespace and triggers two reads: a fast lookup against WarmMemory (the in-process working set) and a Retrieval Ranker scoring pass over those rows (KeywordImportanceScorer by default; swap in EmbeddingsImportanceScorer for semantic ranking).
Warm Hit? checks the best score against the configured threshold.
Green path (warm hit): results flow to Prompt Builder, which injects only the top-K rows into the system prompt before invoking the LLM. The vector tier is never touched.
Orange path (warm miss): the query falls through to Long-Term Memory (LangGraph's InMemoryStore with an embedding index, PostgresStore, or any BaseStore) and the LLM consumes those results as fallback.
Dashed write-back loop: the LLM response is captured by the decorator and written back to WarmMemory (and mirrored to Long-Term Memory by the memory_write node), so future turns can recall it.

On the synthetic benchmark, ~50% of turns take the green path, eliminating that many vector-store calls.

The diagram ships in two paired formats:

docs/warm_memory_architecture.drawio.svg — the rendered SVG that GitHub displays inline. The decision arrows flow ("marching ants" SMIL animation) so the hot/cold paths read at a glance. Open the file directly in a browser to see the animation; GitHub also renders the animation when displaying the SVG.
docs/warm_memory_architecture.drawio — the editable mxgraph source. Open at diagrams.net (File → Open from device) to edit; re-export the SVG when done.

The .drawio.svg also embeds the mxgraph XML in its content attribute, so either file round-trips through the editor — they're kept in sync from the same generator script.

For a richer narrated walkthrough, open docs/warm_memory_guide.html locally or publish it with GitHub Pages.

Development

Run tests:

python3 -m unittest discover -s tests -v

LangGraph Integration

WarmMemory ships an optional warm_memory.langgraph module that plugs directly into the LangGraph ecosystem. Install with the extra:

pip install warm-memory[langgraph]

Drop-in `BaseStore`

WarmStore implements LangGraph's BaseStore interface with per-namespace warm buffers — each namespace gets its own bounded buffer, so multi-tenant agents don't evict each other's memory.

from warm_memory.langgraph import WarmStore

store = WarmStore(capacity=16)
store.put(("alice",), "preferences", {"text": "wants concise answers"})
store.put(("alice",), "billing", {"text": "invoice overdue", "topic": "billing"})

# query-based recall (keyword scorer by default)
hits = store.search(("alice",), query="how do I pay my invoice?")

# filter operators: $eq, $ne, $gt, $gte, $lt, $lte
billing = store.search(("alice",), filter={"topic": "billing"})

Bring-your-own embeddings

Swap the default keyword scorer for any LangChain Embeddings:

from langchain_openai import OpenAIEmbeddings
from warm_memory.langgraph import EmbeddingsImportanceScorer, WarmStore

scorer = EmbeddingsImportanceScorer(OpenAIEmbeddings())
store = WarmStore(scorer=scorer)

Works with any LangChain embeddings provider — OpenAI, HuggingFace, Voyage, Anthropic — or DeterministicFakeEmbedding for tests.

Pre-built agent

build_warm_memory_agent returns a compiled LangGraph that reads warm memory before responding and writes the new exchange back on the way out:

from warm_memory.langgraph import WarmStore, build_warm_memory_agent

store = WarmStore(capacity=8)
agent = build_warm_memory_agent(model=my_chat_model, store=store)
agent.invoke({"query": "Where's my invoice?", "namespace": ("alice",)})

A runnable example using FakeListChatModel (no API keys) lives at examples/langgraph_warm_agent.py.

Comparative benchmark

scripts/run_langgraph_benchmark.py compares three retrieval strategies through the LangGraph store API:

full-history: every prior turn in the prompt (naive baseline)
vector-only: LangGraph's InMemoryStore with an embedding index
warm-fallback: WarmStore in front of the vector store

python3 scripts/run_langgraph_benchmark.py

This writes reports/warm_memory_langgraph_benchmark.md. Run it with synthetic embeddings by default; set WARM_BENCH_EMBEDDINGS=openai (and OPENAI_API_KEY) to compare against real semantic search.

Roadmap

~~add an embedding-based or reranker-based importance scorer~~ (done via EmbeddingsImportanceScorer)
~~compare against vector-store-first baselines~~ (done via warm-fallback strategy in the LangGraph benchmark)
benchmark against real agent traces instead of only synthetic workloads
record actual model latency and token usage from a live LLM pipeline
add charts and experiment summaries for publication-style reporting
TTL support for the LangGraph BaseStore
~~publish warm-memory to PyPI~~ (live at v0.2.1)
propose inclusion in LangGraph's third-party store list (LangChain Forum proposal in flight)

License

This project is released under the MIT License. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vsingh45

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.2

May 16, 2026

0.2.1

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

warm_memory-0.2.2.tar.gz (29.8 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

warm_memory-0.2.2-py3-none-any.whl (25.3 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file warm_memory-0.2.2.tar.gz.

File metadata

Download URL: warm_memory-0.2.2.tar.gz
Upload date: May 16, 2026
Size: 29.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`11066768f086fc9e6f2a36691809da1dd2afa06a8bc789b96263dfb7d1fc6771`
MD5	`ee01512bfa5c85d44c716d2e052d979b`
BLAKE2b-256	`99f22d2cfaf8d9c9b1f75d25b71a33ea88352be2d9b47c17b689decd5c6aaa2a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.2.tar.gz:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: warm_memory-0.2.2.tar.gz
- Subject digest: 11066768f086fc9e6f2a36691809da1dd2afa06a8bc789b96263dfb7d1fc6771
- Sigstore transparency entry: 1553908254
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: vsingh45/WarmMemory@2abe482e6eda00addb99a03315029ad0018974a5
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/vsingh45
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2abe482e6eda00addb99a03315029ad0018974a5
- Trigger Event: release

File details

Details for the file warm_memory-0.2.2-py3-none-any.whl.

File metadata

Download URL: warm_memory-0.2.2-py3-none-any.whl
Upload date: May 16, 2026
Size: 25.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`346387ef6bedcf03db9732333aeb0011d25e2c4bc6c74e63913a024b55db9cda`
MD5	`0745a09e4fea4fa800f32b0b6aa4c4a6`
BLAKE2b-256	`ce2cd95daa29b694a65675434ba0a95f01e9ae9a0b325c2e5cce3a4c73f5d7b2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.2-py3-none-any.whl:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: warm_memory-0.2.2-py3-none-any.whl
- Subject digest: 346387ef6bedcf03db9732333aeb0011d25e2c4bc6c74e63913a024b55db9cda
- Sigstore transparency entry: 1553908271
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: vsingh45/WarmMemory@2abe482e6eda00addb99a03315029ad0018974a5
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/vsingh45
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2abe482e6eda00addb99a03315029ad0018974a5
- Trigger Event: release

warm-memory 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

WarmMemory

Why This Exists

Core Ideas

1. Sliding-Window Memory

2. Relevance-Aware Memory

3. Automatic Agent Capture

4. Two-Tier Memory Architecture

Repository Layout

Installation

Quick Start

Example Usage Pattern

Benchmark

Documentation

Architecture

Development

LangGraph Integration

Drop-in BaseStore

Bring-your-own embeddings

Pre-built agent

Comparative benchmark

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Drop-in `BaseStore`