Capacity-bounded warm memory for LLM agents, with a LangGraph BaseStore implementation, embeddings-based importance scoring, and a comparative benchmark.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vsingh45

These details have not been verified by PyPI

Project description

WarmMemory

WarmMemory is a Python package for short-term memory management in LLM agents. It adds a small in-process working-memory layer that keeps the most recent or most relevant interactions close to the agent, reducing repeated retrieval work and helping control prompt growth.

The repository provides:

a reusable Python package for warm-memory buffering,
a decorator for automatic interaction capture,
a pluggable importance scoring interface,
a deterministic benchmark for recency vs relevance vs fallback memory policies,
a LangGraph BaseStore integration with per-namespace eviction, embeddings-based ranking, and a pre-built agent,
HTML documentation for architecture and usage.

Why This Exists

Many agent systems use one of two expensive patterns:

they keep appending conversation history to the prompt,
or they query long-term memory on nearly every turn.

Both increase latency and cost. WarmMemory introduces a hot path:

keep a small working set in RAM,
retrieve from that working set first,
fall back to longer-term retrieval only when needed,
and send only a compact context window to the model.

Core Ideas

1. Sliding-Window Memory

The system can keep the last N interactions using recent(k).

2. Relevance-Aware Memory

Instead of only keeping the latest messages, the system can rank rows against the current query using relevant(query, k) and compact the active working set with retain_relevant(query, k).

3. Automatic Agent Capture

The @remember_interaction decorator records agent inputs and outputs without forcing changes into the core agent logic.

4. Two-Tier Memory Architecture

The benchmark models a practical split:

warm memory for fast in-process access,
long-term memory for slower fallback retrieval.

Repository Layout

warm_memory/: package source code
warm_memory/buffer.py: Pandas-backed warm-memory store
warm_memory/scoring.py: scoring interface and default heuristic scorer
warm_memory/decorators.py: function decorator for interaction capture
warm_memory/benchmark.py: deterministic benchmark harness
warm_memory/workload.py: synthetic workload for evaluation
warm_memory/langgraph/: LangGraph integration (optional extra)
- store.py: WarmStore(BaseStore) with per-namespace eviction
- embeddings.py: bring-your-own embeddings scorer
- agent.py: pre-built build_warm_memory_agent graph
- benchmark.py: full-history vs vector-only vs warm-fallback benchmark
examples/langgraph_warm_agent.py: runnable LangGraph agent example
scripts/run_benchmark.py: legacy benchmark entrypoint
scripts/run_langgraph_benchmark.py: LangGraph-based benchmark entrypoint
reports/warm_memory_benchmark.md: legacy benchmark output
reports/warm_memory_langgraph_benchmark.md: LangGraph benchmark output
docs/warm_memory_guide.html: public-facing HTML documentation
tests/: unit tests

Installation

python3 -m pip install -e .

Quick Start

from warm_memory import WarmMemoryBuffer, remember_interaction

memory = WarmMemoryBuffer(capacity=8)

@remember_interaction(memory)
def agent(prompt: str) -> str:
    if "billing" in prompt.lower():
        return "Your invoice is available in the billing portal."
    return f"Echo: {prompt}"

agent("How do I reset my password?")
agent("Where is my billing invoice?")

recent_rows = memory.recent(4)
relevant_rows = memory.relevant("invoice", limit=2)
memory.retain_relevant("invoice", limit=4)

Example Usage Pattern

Use WarmMemory in front of a larger memory system:

Receive a new user query.
Search the warm buffer first.
If warm memory is sufficient, build a compact prompt from those rows.
If warm memory is weak, fall back to long-term retrieval.
Write the new interaction back into warm memory.

This pattern is useful for:

coding agents,
research assistants,
task-oriented copilots,
customer support agents,
and any multi-turn system with repeated local context.

Benchmark

The repository includes a deterministic benchmark that compares:

recency: always use the latest warm-memory rows,
relevance: rank and retain the top relevant warm-memory rows,
fallback: use warm relevance first, then long-term retrieval on misses.

Run it with:

python3 scripts/run_benchmark.py

This writes a report to reports/warm_memory_benchmark.md.

On the current synthetic workload, the tradeoff looks like this:

recency is the fastest policy,
fallback is the most accurate policy,
relevance sits between the two and provides a cleaner hot working set.

The benchmark is designed to surface that tradeoff rather than name a single winner: each policy occupies a different point on the latency-accuracy curve.

Documentation

HTML guide: docs/warm_memory_guide.html
Benchmark report: reports/warm_memory_benchmark.md
README visual: docs/warm_memory_architecture.svg

The HTML guide explains:

how the architecture works,
where latency is saved,
how to use the package,
and how the components fit together.

Architecture Preview

WarmMemory Architecture

For a richer visual walkthrough, open docs/warm_memory_guide.html locally or publish it with GitHub Pages.

Development

Run tests:

python3 -m unittest discover -s tests -v

LangGraph Integration

WarmMemory ships an optional warm_memory.langgraph module that plugs directly into the LangGraph ecosystem. Install the extra:

python3 -m pip install -e ".[langgraph]"

Drop-in `BaseStore`

WarmStore implements LangGraph's BaseStore interface with per-namespace warm buffers — each namespace gets its own bounded buffer, so multi-tenant agents don't evict each other's memory.

from warm_memory.langgraph import WarmStore

store = WarmStore(capacity=16)
store.put(("alice",), "preferences", {"text": "wants concise answers"})
store.put(("alice",), "billing", {"text": "invoice overdue", "topic": "billing"})

# query-based recall (keyword scorer by default)
hits = store.search(("alice",), query="how do I pay my invoice?")

# filter operators: $eq, $ne, $gt, $gte, $lt, $lte
billing = store.search(("alice",), filter={"topic": "billing"})

Bring-your-own embeddings

Swap the default keyword scorer for any LangChain Embeddings:

from langchain_openai import OpenAIEmbeddings
from warm_memory.langgraph import EmbeddingsImportanceScorer, WarmStore

scorer = EmbeddingsImportanceScorer(OpenAIEmbeddings())
store = WarmStore(scorer=scorer)

Works with any LangChain embeddings provider — OpenAI, HuggingFace, Voyage, Anthropic — or DeterministicFakeEmbedding for tests.

Pre-built agent

build_warm_memory_agent returns a compiled LangGraph that reads warm memory before responding and writes the new exchange back on the way out:

from warm_memory.langgraph import WarmStore, build_warm_memory_agent

store = WarmStore(capacity=8)
agent = build_warm_memory_agent(model=my_chat_model, store=store)
agent.invoke({"query": "Where's my invoice?", "namespace": ("alice",)})

A runnable example using FakeListChatModel (no API keys) lives at examples/langgraph_warm_agent.py.

Comparative benchmark

scripts/run_langgraph_benchmark.py compares three retrieval strategies through the LangGraph store API:

full-history: every prior turn in the prompt (naive baseline)
vector-only: LangGraph's InMemoryStore with an embedding index
warm-fallback: WarmStore in front of the vector store

python3 scripts/run_langgraph_benchmark.py

This writes reports/warm_memory_langgraph_benchmark.md. Run it with synthetic embeddings by default; set WARM_BENCH_EMBEDDINGS=openai (and OPENAI_API_KEY) to compare against real semantic search.

Roadmap

~~add an embedding-based or reranker-based importance scorer~~ (done via EmbeddingsImportanceScorer)
~~compare against vector-store-first baselines~~ (done via warm-fallback strategy in the LangGraph benchmark)
benchmark against real agent traces instead of only synthetic workloads
record actual model latency and token usage from a live LLM pipeline
add charts and experiment summaries for publication-style reporting
TTL support for the LangGraph BaseStore
publish warm-memory to PyPI and propose inclusion in LangGraph's third-party store list

License

This project is released under the MIT License. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vsingh45

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

May 16, 2026

This version

0.2.1

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

warm_memory-0.2.1.tar.gz (27.9 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

warm_memory-0.2.1-py3-none-any.whl (24.3 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file warm_memory-0.2.1.tar.gz.

File metadata

Download URL: warm_memory-0.2.1.tar.gz
Upload date: May 16, 2026
Size: 27.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`9c3ef82586b64d5dce8f3cb4759696d9f0fbe0c1aa5d5da66966f895f0daff63`
MD5	`57a1c9344894a821fbc98be9efdad1db`
BLAKE2b-256	`98509e9eaf0ffba796fda98de090bb6245f5a43cf96753b78be9093776d1a665`

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.1.tar.gz:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: warm_memory-0.2.1.tar.gz
- Subject digest: 9c3ef82586b64d5dce8f3cb4759696d9f0fbe0c1aa5d5da66966f895f0daff63
- Sigstore transparency entry: 1552395189
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: vsingh45/WarmMemory@bd8bad2297a2a8d43d182babb2a2d6dd9e727198
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/vsingh45
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bd8bad2297a2a8d43d182babb2a2d6dd9e727198
- Trigger Event: release

File details

Details for the file warm_memory-0.2.1-py3-none-any.whl.

File metadata

Download URL: warm_memory-0.2.1-py3-none-any.whl
Upload date: May 16, 2026
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for warm_memory-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`680343fef30c699b43688054c54b605960fcdeb80374cd57317a592d2b77e9ea`
MD5	`65a0d51ce03da9d234a71c1abd29aeb4`
BLAKE2b-256	`a42c2223b3f8c1e51e6047cabc7fa9bf996cc136759aef54ad7d4234ba3ed1f7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for warm_memory-0.2.1-py3-none-any.whl:

Publisher: publish.yml on vsingh45/WarmMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: warm_memory-0.2.1-py3-none-any.whl
- Subject digest: 680343fef30c699b43688054c54b605960fcdeb80374cd57317a592d2b77e9ea
- Sigstore transparency entry: 1552395191
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: vsingh45/WarmMemory@bd8bad2297a2a8d43d182babb2a2d6dd9e727198
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/vsingh45
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bd8bad2297a2a8d43d182babb2a2d6dd9e727198
- Trigger Event: release

warm-memory 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

WarmMemory

Why This Exists

Core Ideas

1. Sliding-Window Memory

2. Relevance-Aware Memory

3. Automatic Agent Capture

4. Two-Tier Memory Architecture

Repository Layout

Installation

Quick Start

Example Usage Pattern

Benchmark

Documentation

Architecture Preview

Development

LangGraph Integration

Drop-in BaseStore

Bring-your-own embeddings

Pre-built agent

Comparative benchmark

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Drop-in `BaseStore`