Skip to main content

Trim conversation history to fit an LLM token budget.

Project description

chatfit

Trim conversation history to fit an LLM token budget — without forgetting.

When a chat with an LLM gets long, you eventually blow past the model's context window and the API errors out. chatfit trims the conversation down to a token budget you choose. It keeps the system prompt and the most recent turns, and condenses the older turns into a single summary so the model retains the gist of earlier context instead of forgetting it.

contextfit packs your RAG chunks. chatfit packs your chat history.

  • 🧠 Remembers, doesn't just delete — old turns become a summary
  • 🪶 Tiny & dependency-free — pure Python, tiktoken optional
  • 📌 Pins your system prompt so it's never dropped
  • Always fits — even an oversized summary is truncated to the budget
  • 📊 Tells you what happened — tokens before/after, messages dropped

Install

pip install chatfit               # pure-Python word-count estimate
pip install "chatfit[tiktoken]"   # accurate token counts

Quick start

from chatfit import fit

messages = [
    {"role": "system",    "content": "You are a helpful assistant."},
    {"role": "user",      "content": "Hi!"},
    {"role": "assistant", "content": "Hello! How can I help?"},
    # ... 50 more turns ...
]

result = fit(messages, max_tokens=4000)

send_to_llm(result.messages)     # guaranteed to fit in 4000 tokens
print(result)                    # what got trimmed and why

How it works

  1. If the conversation already fits the budget → returned unchanged.
  2. Otherwise: keep the system prompt + the newest turns that fit.
  3. The older turns are condensed into one [Summary of earlier conversation] message so their gist is preserved.
  4. The result is guaranteed to fit max_tokens.

Bring your own summarizer

chatfit never calls an LLM itself. By default it uses a no-LLM summarizer that lists the topics the user raised. For real AI summaries, pass your own:

def my_summarizer(dropped_messages):
    text = "\n".join(m["content"] for m in dropped_messages)
    return openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize:\n{text}"}],
    ).choices[0].message.content

result = fit(messages, max_tokens=4000, summarizer=my_summarizer)

ChatMemory — rolling memory for ongoing chats

fit() is one-shot. For a live conversation, use ChatMemory: you add() turns as they happen and it keeps recent turns verbatim while incrementally folding older ones into a single rolling summary — far cheaper than re-summarizing from scratch every turn, and always within budget.

from chatfit import ChatMemory

mem = ChatMemory(max_tokens=2000, summarizer=my_llm_summarizer)
mem.set_system("You are a helpful assistant.")

mem.add_user("Hi!")
mem.add_assistant("Hello! How can I help?")
# ... many turns later ...

messages = mem.render()   # always fits 2000 tokens; oldest turns summarized
response = openai.chat.completions.create(model="gpt-4", messages=messages)

The summary stays bounded (hierarchical): each fold re-summarizes the previous summary together with the newly dropped turn, so it never grows without limit.

The fit() function

fit(
    messages,            # list of {"role": ..., "content": ...} dicts
    max_tokens,          # the budget the result must fit within
    pin_system=True,     # never drop system messages
    model="gpt-4",       # used for token counting
    summarizer=None,     # your callable; defaults to a built-in no-LLM one
)

Returns a TrimResult:

Attribute Meaning
.messages the trimmed conversation
.tokens_before / .tokens_after token counts before/after
.tokens_saved tokens removed
.dropped_count / .kept_count original messages dropped / messages kept
.fits is it within budget?
.was_trimmed did anything get dropped?

Run the demo & tests

pip install -e ".[dev]"
python examples/demo.py
python examples/try_it.py
pytest

Roadmap

  • keep_relevant — keep the most relevant old turns, not just the newest (powered by the relevance engine from its sister library, contextfit)
  • semantic de-duplication of repeated turns
  • auto-detect a model's context window

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatfit-0.4.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chatfit-0.4.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file chatfit-0.4.0.tar.gz.

File metadata

  • Download URL: chatfit-0.4.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for chatfit-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b76350adbd69b3f7d5fb26e2288864fa58cd64b23abe9a25aa0cc755ef23f6b3
MD5 bc3332acae95a3d30210c5d6783f318a
BLAKE2b-256 611c68b1d8c1fa7ab644a30d6bd079462dc084f306690964f513d7eaf619320f

See more details on using hashes here.

File details

Details for the file chatfit-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: chatfit-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for chatfit-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 711716a8b09432900f563febc5bfd93f76e5022e55657060087a63c48e7f210f
MD5 1c527d64904c3d200122f9c0cac7dace
BLAKE2b-256 474f9aadd99efa72de5507090d38e5a64f253c9b5dbcacf902bbd66abea38c5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page