Skip to main content

Token-aware chat-history compaction — summarize old turns, keep system + recent. Zero dependencies.

Project description

chatcram

PyPI version Python versions CI License: MIT

Keep a long chat history within a token budget — by summarizing the old middle and keeping the system prompt + recent turns verbatim. Tiny, zero-dependency, framework-agnostic. Bring your own summarizer.

As a conversation grows, you eventually blow past the context window. Dropping old turns loses information; keeping everything is impossible. chatcram collapses the older middle into a single summary while preserving what matters most — the system prompt and the most recent turns.

from chatcram import Compactor

# `summarize` is any callable you provide — usually an LLM call
compactor = Compactor(budget=4000, summarize=my_llm_summarizer, keep_recent=1500)

result = compactor.compact(messages)   # list of {"role", "content"} dicts

for m in result.messages:
    print(m["role"], "->", m["content"][:60])

print(result.summarized)    # True if the middle was collapsed
print(result.used_tokens)   # tokens in the compacted history

What you get back:

  • System messages — always kept, verbatim, at the front.
  • A single summary message — the older middle, collapsed via your summarizer.
  • Recent turns — the latest keep_recent tokens, kept verbatim.

Why

  • Zero dependencies. Pure Python. A fast characters-per-token heuristic by default; plug in tiktoken or any tokenizer for exact counts.
  • Bring your own summarizer. Any str -> str callable (an LLM call, a local model, anything). No provider lock-in, no hidden API calls.
  • Framework-agnostic. Works on plain message dicts — not tied to LangChain or LlamaIndex.
  • Composes with contextcram. Compact the history, then pack it into a full prompt budget.

Installation

pip install chatcram
# optional: exact token counts via tiktoken
pip install "chatcram[tiktoken]"

How it works

from chatcram import Compactor

def summarize(transcript: str) -> str:
    # call your LLM here; return a short summary string
    return my_client.complete(f"Summarize this conversation:\n{transcript}")

compactor = Compactor(
    budget=4000,          # if the history exceeds this, compact it
    summarize=summarize,
    keep_recent=1500,     # tokens of the most recent turns to keep verbatim
)

result = compactor.compact(messages)
messages = result.messages   # ready to send to the model

If the history is already under budget, it's returned unchanged (summarized=False). The most recent turn is always kept, even if it alone exceeds keep_recent.

Pairs with contextcram

from chatcram import Compactor
from contextcram import Packer

history = Compactor(budget=3000, summarize=summarize).compact(messages).messages

ctx = (
    Packer(model="gpt-4o", reserve=1500)
    .add(SYSTEM_PROMPT, priority="required")
    .add([f"{m['role']}: {m['content']}" for m in history], priority="high", strategy="trim")
    .add(retrieved_docs, priority="medium", strategy="drop")
    .fit()
)

Alternatives

Summarizing old turns isn't new, but it's almost always bundled into a framework or a heavyweight memory platform. chatcram is the standalone, dependency-free building block:

Library Approach When to prefer it over chatcram
LangChain ConversationSummaryBufferMemory Summary + buffer memory, inside LangChain You're already all-in on LangChain
mem0 / Zep Hosted "memory layer" with fact extraction + embeddings You want long-term, retrieval-based memory
tokentrim Drops messages to fit a token limit You only need to drop, not summarize

Choose chatcram when you want a tiny, framework-agnostic helper that summarizes the old middle of a conversation, with your own summarizer and no dependencies.

Development

git clone https://github.com/Waelr1985/chatcram.git
cd chatcram
uv sync
uv run pytest
uv run ruff check .
uv run mypy

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatcram-0.1.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chatcram-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file chatcram-0.1.0.tar.gz.

File metadata

  • Download URL: chatcram-0.1.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chatcram-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f8c9b788b6732bcfe880f6f934542aff13aef7908e76b9a5d44ad4de383d377
MD5 87f74a1b2c7af8b3d7984cbae6b6d34c
BLAKE2b-256 b56d9e4f85c02c7ea1612dd7ab2e10fc377f228f301d56d13579fd18f7e56916

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatcram-0.1.0.tar.gz:

Publisher: publish.yml on Waelr1985/chatcram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chatcram-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chatcram-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chatcram-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0fac5bce8568d07b0c9b6b022fe2137210646401207c82ede1f5e96a0e4c851c
MD5 744918bac4393787b9de57f1b8a95150
BLAKE2b-256 945fa458f6e13f4f7b80bc325e205f0024bf51e60dafaef51eb6f94bc18a5433

See more details on using hashes here.

Provenance

The following attestation bundles were made for chatcram-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Waelr1985/chatcram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page