Trim conversation history to fit an LLM token budget.

These details have not been verified by PyPI

Project links

Project description

chatfit

Trim conversation history to fit an LLM token budget — without forgetting.

When a chat with an LLM gets long, you eventually blow past the model's context window and the API errors out. chatfit trims the conversation down to a token budget you choose. It keeps the system prompt and the most recent turns, and condenses the older turns into a single summary so the model retains the gist of earlier context instead of forgetting it.

contextfit packs your RAG chunks. chatfit packs your chat history.

🧠 Remembers, doesn't just delete — old turns become a summary
🪶 Tiny & dependency-free — pure Python, tiktoken optional
📌 Pins your system prompt so it's never dropped
✅ Always fits — even an oversized summary is truncated to the budget
📊 Tells you what happened — tokens before/after, messages dropped

Install

pip install chatfit               # pure-Python word-count estimate
pip install "chatfit[tiktoken]"   # accurate token counts

Quick start

from chatfit import fit

messages = [
    {"role": "system",    "content": "You are a helpful assistant."},
    {"role": "user",      "content": "Hi!"},
    {"role": "assistant", "content": "Hello! How can I help?"},
    # ... 50 more turns ...
]

result = fit(messages, max_tokens=4000)

send_to_llm(result.messages)     # guaranteed to fit in 4000 tokens
print(result)                    # what got trimmed and why

How it works

If the conversation already fits the budget → returned unchanged.
Otherwise: keep the system prompt + the newest turns that fit.
The older turns are condensed into one [Summary of earlier conversation] message so their gist is preserved.
The result is guaranteed to fit max_tokens.

Bring your own summarizer

chatfit never calls an LLM itself. By default it uses a no-LLM summarizer that lists the topics the user raised. For real AI summaries, pass your own:

def my_summarizer(dropped_messages):
    text = "\n".join(m["content"] for m in dropped_messages)
    return openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize:\n{text}"}],
    ).choices[0].message.content

result = fit(messages, max_tokens=4000, summarizer=my_summarizer)

`ChatMemory` — rolling memory for ongoing chats

fit() is one-shot. For a live conversation, use ChatMemory: you add() turns as they happen and it keeps recent turns verbatim while incrementally folding older ones into a single rolling summary — far cheaper than re-summarizing from scratch every turn, and always within budget.

from chatfit import ChatMemory

mem = ChatMemory(max_tokens=2000, summarizer=my_llm_summarizer)
mem.set_system("You are a helpful assistant.")

mem.add_user("Hi!")
mem.add_assistant("Hello! How can I help?")
# ... many turns later ...

messages = mem.render()   # always fits 2000 tokens; oldest turns summarized
response = openai.chat.completions.create(model="gpt-4", messages=messages)

The summary stays bounded (hierarchical): each fold re-summarizes the previous summary together with the newly dropped turn, so it never grows without limit.

The `fit()` function

fit(
    messages,            # list of {"role": ..., "content": ...} dicts
    max_tokens,          # the budget the result must fit within
    pin_system=True,     # never drop system messages
    model="gpt-4",       # used for token counting
    summarizer=None,     # your callable; defaults to a built-in no-LLM one
)

Returns a TrimResult:

Attribute	Meaning
`.messages`	the trimmed conversation
`.tokens_before` / `.tokens_after`	token counts before/after
`.tokens_saved`	tokens removed
`.dropped_count` / `.kept_count`	original messages dropped / messages kept
`.fits`	is it within budget?
`.was_trimmed`	did anything get dropped?

Run the demo & tests

pip install -e ".[dev]"
python examples/demo.py
python examples/try_it.py
pytest

Roadmap

keep_relevant — keep the most relevant old turns, not just the newest (powered by the relevance engine from its sister library, contextfit)
semantic de-duplication of repeated turns
auto-detect a model's context window

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatfit-0.4.0.tar.gz (14.8 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chatfit-0.4.0-py3-none-any.whl (12.6 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file chatfit-0.4.0.tar.gz.

File metadata

Download URL: chatfit-0.4.0.tar.gz
Upload date: Jun 26, 2026
Size: 14.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for chatfit-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`b76350adbd69b3f7d5fb26e2288864fa58cd64b23abe9a25aa0cc755ef23f6b3`
MD5	`bc3332acae95a3d30210c5d6783f318a`
BLAKE2b-256	`611c68b1d8c1fa7ab644a30d6bd079462dc084f306690964f513d7eaf619320f`

See more details on using hashes here.

File details

Details for the file chatfit-0.4.0-py3-none-any.whl.

File metadata

Download URL: chatfit-0.4.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for chatfit-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`711716a8b09432900f563febc5bfd93f76e5022e55657060087a63c48e7f210f`
MD5	`1c527d64904c3d200122f9c0cac7dace`
BLAKE2b-256	`474f9aadd99efa72de5507090d38e5a64f253c9b5dbcacf902bbd66abea38c5a`

See more details on using hashes here.

chatfit 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chatfit

Install

Quick start

How it works

Bring your own summarizer

`ChatMemory` — rolling memory for ongoing chats

The `fit()` function

Run the demo & tests

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

chatfit 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chatfit

Install

Quick start

How it works

Bring your own summarizer

ChatMemory — rolling memory for ongoing chats

The fit() function

Run the demo & tests

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ChatMemory` — rolling memory for ongoing chats

The `fit()` function