Standalone, dependency-free rolling conversation memory (summary + buffer), inspired by LangChain's ConversationSummaryBufferMemory.
Project description
rollmem
Standalone, dependency-free rolling conversation memory for LLM apps —
a running summary plus a recent-message buffer, inspired by LangChain's
ConversationSummaryBufferMemory, but with no LangChain (or any) dependency.
Handy for conversation memory, context compression, summarization, and gist-style long-chat handling — a tiny LangChain alternative when you only need the summary-buffer pattern.
Why
ConversationSummaryBufferMemory is a great pattern: keep recent turns
verbatim, fold older turns into a running summary so context stays bounded.
But pulling in all of LangChain just for that is heavy. rollmem extracts the
idea into a tiny, provider-agnostic package. You inject how to summarize and
how to count tokens — rollmem stays neutral.
Install
pip install rollmem
Requires Python 3.9+. Zero runtime dependencies, and fully typed (ships
py.typed, so your type checker sees the annotations).
Usage
from rollmem import RollingMemory
def summarize(existing_summary, messages):
# plug in any LLM here; return the new summary string
folded = " ".join(m.content for m in messages)
return (existing_summary + " " + folded).strip()
mem = RollingMemory(
max_tokens=2000,
summarize_fn=summarize, # optional; without it, evicted turns are dropped
# token_counter=... # optional; defaults to a word-count estimate.
# # In production inject a model-accurate counter, e.g.
# # token_counter=lambda text: len(enc.encode(text))
)
mem.add_user_message("Hi, I'm planning a trip to Korea.")
mem.add_assistant_message("Great! When are you going?")
print(mem.get_context()) # -> str: summary (as a system turn) + buffer, joined
print(mem.get_messages()) # -> list[Message]: summary prepended as a system turn
Agentic conversations work too — messages can carry tool calls, an id, and opaque metadata:
from rollmem import ASSISTANT, ToolCall
mem.add_message(
ASSISTANT,
"",
tool_calls=[ToolCall(id="c1", name="get_weather", arguments='{"city": "Seoul"}')],
)
mem.add_tool_message("sunny, 23C", tool_call_id="c1")
A tool call and its results are an atomic unit: pruning evicts them together
or keeps them together. The eviction boundary is also aligned so that after
pruning the buffer never starts with an assistant turn, a tool turn, or an
orphaned tool result — openings most provider APIs reject (Anthropic, for
example, requires the first message to use the user role).
In asyncio applications, use AsyncRollingMemory — same behaviour and
serialization format, but the add_* methods are coroutines and
summarize_fn may be a coroutine function:
from rollmem import AsyncRollingMemory
async def summarize(existing_summary, messages):
folded = " ".join(m.content for m in messages)
return (existing_summary + " " + folded).strip() # await your LLM here
mem = AsyncRollingMemory(max_tokens=2000, summarize_fn=summarize)
await mem.add_user_message("Hi, I'm planning a trip to Korea.")
AsyncRollingMemory is safe for concurrent use by multiple asyncio tasks on
one event loop — pruning is serialized internally, and a summarizer failure
never loses turns. It is not thread-safe, and one instance must stay on one
event loop.
max_tokens is the budget for the verbatim recent-message buffer — not the
running summary, and not a model's generation max_tokens (output limit). When
the buffer exceeds it, the oldest turns are folded into the summary.
token_counter takes a single message's text (str) and returns an int. The
default is a crude word count — fine for demos, but pass a model-accurate counter
(such as tiktoken) for real token budgets. The text it receives is
Message.token_text() — the content plus any tool-call names, arguments, and
linkage — so tool payloads count toward the budget.
Persistence
to_dict() / from_dict() serialize the memory state (running summary plus
buffer) to and from a plain dict — you choose the storage format:
import json
raw = json.dumps(mem.to_dict()) # save anywhere: file, DB column, cache...
mem = RollingMemory.from_dict(
json.loads(raw),
max_tokens=2000,
summarize_fn=summarize, # callbacks are NOT serialized — re-inject them
# token_counter=...
)
max_tokens and the callbacks are runtime configuration, not saved state, so you
pass them again on restore. The buffer is restored verbatim; the token budget is
re-applied on the next added message.
How it works
- New turns go into
buffer. - When
bufferexceedsmax_tokens, the oldest turns are folded intosummaryviasummarize_fn(or dropped if none is provided). Eviction is atomic over tool-call units — an assistant message withtool_callsand its linked tool results always travel together — and boundary-aligned, so the buffer is kept from starting on a response-like turn whenever possible. get_messages() -> list[Message]returns the buffer with the summary prepended as asystemturn.get_context() -> stris the string form of the same thing (prompt-ready), so the two never diverge. Neither adds a language-specific label — relabel the summary in your own prompt assembly if you need to.
API
RollingMemory(max_tokens=2000, summarize_fn=None, token_counter=None)
add_message(role, content, *, id=None, tool_calls=(), tool_call_id=None, metadata=None)— append a turn with any role string, optionally carrying tool calls, an id, and metadata.add_user_message/add_assistant_message/add_system_message/add_tool_message— convenience wrappers overadd_messageusing theUSER/ASSISTANT/SYSTEM/TOOLrole constants (add_tool_messagealso takestool_call_id=).get_messages() -> list[Message]/get_context() -> str— read the state back (see How it works).to_dict()/from_dict(data, *, max_tokens=..., summarize_fn=..., token_counter=...)— serialize and restore (see Persistence).clear()— reset the summary and buffer.summary: strandbuffer: list[Message]— the live state, exposed as plain public attributes.
AsyncRollingMemory(max_tokens=2000, summarize_fn=None, token_counter=None) is
the asyncio variant: the same API, but the add_* methods are coroutines and
summarize_fn may be a regular or coroutine function (AsyncSummarizeFn).
Reads, clear(), and serialization stay synchronous, and state saved by either
class loads in the other. Passing a coroutine function as summarize_fn to the
synchronous RollingMemory raises TypeError instead of failing silently.
Message(role, content, id=None, tool_calls=(), tool_call_id=None, metadata={})
is the provider-neutral turn type: a frozen dataclass with to_dict() /
from_dict(), a "role: content" string form, and token_text() — the
canonical text used for token counting (content plus tool-call payloads).
tool_calls holds ToolCall(id, name, arguments) entries on assistant turns;
tool_call_id links a tool-result turn back to its call; metadata is opaque
and round-tripped verbatim. The exported role constants are USER,
ASSISTANT, SYSTEM, and TOOL — but any string is accepted as a role.
Limitations
- Lossy by design. Older turns are folded into the summary repeatedly, so
each pass can blur or drop detail (a "telephone game" effect). Keep
max_tokenslarge enough that anything you can't afford to lose stays in the verbatim buffer. - The summary is not bounded for you.
max_tokenslimits only the verbatim buffer, not the running summary. rollmem hands yoursummarize_fnthe current summary plus the evicted turns and stores whatever it returns — so keeping the summary compact is yoursummarize_fn's job. If it merely concatenates, the summary (and thusget_context()) grows without limit. Prompt it to compress, or cap the summary length inside the callback. - Only as accurate as your counter. The default token counter is a rough
word count; inject a model-accurate one (e.g.
tiktoken) for real budgets. - In-memory by default. State lives in memory, but
to_dict()/from_dict()let you persist and restore it (see Persistence). Callbacks are not serialized and must be re-injected on restore.
Development
pip install -e ".[dev]" # editable install with dev tools (pytest, build, twine)
pytest # run the test suite
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rollmem-0.3.0.tar.gz.
File metadata
- Download URL: rollmem-0.3.0.tar.gz
- Upload date:
- Size: 33.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbdcf2150cdf6fa3fa5d663fa31048f283bad4fc338563e5bec3b300bf61d053
|
|
| MD5 |
3ddc1809acfb61fde4fe2289172017ff
|
|
| BLAKE2b-256 |
8329a7c836761c15b46b2214a292eca1e410758fc5c71cd33287b2fdec3047fb
|
Provenance
The following attestation bundles were made for rollmem-0.3.0.tar.gz:
Publisher:
release.yml on okdoittttt/rollmem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rollmem-0.3.0.tar.gz -
Subject digest:
dbdcf2150cdf6fa3fa5d663fa31048f283bad4fc338563e5bec3b300bf61d053 - Sigstore transparency entry: 1779741728
- Sigstore integration time:
-
Permalink:
okdoittttt/rollmem@b3b85c022ff9477cbc1ee5e5b709d6527c2268f9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/okdoittttt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3b85c022ff9477cbc1ee5e5b709d6527c2268f9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rollmem-0.3.0-py3-none-any.whl.
File metadata
- Download URL: rollmem-0.3.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df57eed241a582d0f5937b280be3b3c13d318b8f7674574fd5cb850845c3b702
|
|
| MD5 |
b5c5e70aa339fef5fdc543f715ebfbed
|
|
| BLAKE2b-256 |
c59bd17251c9a609c292971f55a7fcac0122fede7b4c49b5c62afa84591f77d1
|
Provenance
The following attestation bundles were made for rollmem-0.3.0-py3-none-any.whl:
Publisher:
release.yml on okdoittttt/rollmem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rollmem-0.3.0-py3-none-any.whl -
Subject digest:
df57eed241a582d0f5937b280be3b3c13d318b8f7674574fd5cb850845c3b702 - Sigstore transparency entry: 1779742407
- Sigstore integration time:
-
Permalink:
okdoittttt/rollmem@b3b85c022ff9477cbc1ee5e5b709d6527c2268f9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/okdoittttt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3b85c022ff9477cbc1ee5e5b709d6527c2268f9 -
Trigger Event:
push
-
Statement type: