Trim conversation history to fit an LLM token budget.
Project description
chatfit
Trim conversation history to fit an LLM token budget — without forgetting.
When a chat with an LLM gets long, you eventually blow past the model's context
window and the API errors out. chatfit trims the conversation down to a token
budget you choose. It keeps the system prompt and the most recent turns, and
condenses the older turns into a single summary so the model retains the
gist of earlier context instead of forgetting it.
contextfitpacks your RAG chunks.chatfitpacks your chat history.
- 🧠 Remembers, doesn't just delete — old turns become a summary
- 🪶 Tiny & dependency-free — pure Python,
tiktokenoptional - 📌 Pins your system prompt so it's never dropped
- ✅ Always fits — even an oversized summary is truncated to the budget
- 📊 Tells you what happened — tokens before/after, messages dropped
Install
pip install chatfit # pure-Python word-count estimate
pip install "chatfit[tiktoken]" # accurate token counts
Quick start
from chatfit import fit
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi!"},
{"role": "assistant", "content": "Hello! How can I help?"},
# ... 50 more turns ...
]
result = fit(messages, max_tokens=4000)
send_to_llm(result.messages) # guaranteed to fit in 4000 tokens
print(result) # what got trimmed and why
How it works
- If the conversation already fits the budget → returned unchanged.
- Otherwise: keep the system prompt + the newest turns that fit.
- The older turns are condensed into one
[Summary of earlier conversation]message so their gist is preserved. - The result is guaranteed to fit
max_tokens.
Bring your own summarizer
chatfit never calls an LLM itself. By default it uses a no-LLM summarizer that
lists the topics the user raised. For real AI summaries, pass your own:
def my_summarizer(dropped_messages):
text = "\n".join(m["content"] for m in dropped_messages)
return openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Summarize:\n{text}"}],
).choices[0].message.content
result = fit(messages, max_tokens=4000, summarizer=my_summarizer)
ChatMemory — rolling memory for ongoing chats
fit() is one-shot. For a live conversation, use ChatMemory: you add()
turns as they happen and it keeps recent turns verbatim while incrementally
folding older ones into a single rolling summary — far cheaper than
re-summarizing from scratch every turn, and always within budget.
from chatfit import ChatMemory
mem = ChatMemory(max_tokens=2000, summarizer=my_llm_summarizer)
mem.set_system("You are a helpful assistant.")
mem.add_user("Hi!")
mem.add_assistant("Hello! How can I help?")
# ... many turns later ...
messages = mem.render() # always fits 2000 tokens; oldest turns summarized
response = openai.chat.completions.create(model="gpt-4", messages=messages)
The summary stays bounded (hierarchical): each fold re-summarizes the previous summary together with the newly dropped turn, so it never grows without limit.
The fit() function
fit(
messages, # list of {"role": ..., "content": ...} dicts
max_tokens, # the budget the result must fit within
pin_system=True, # never drop system messages
model="gpt-4", # used for token counting
summarizer=None, # your callable; defaults to a built-in no-LLM one
)
Returns a TrimResult:
| Attribute | Meaning |
|---|---|
.messages |
the trimmed conversation |
.tokens_before / .tokens_after |
token counts before/after |
.tokens_saved |
tokens removed |
.dropped_count / .kept_count |
original messages dropped / messages kept |
.fits |
is it within budget? |
.was_trimmed |
did anything get dropped? |
Run the demo & tests
pip install -e ".[dev]"
python examples/demo.py
python examples/try_it.py
pytest
Roadmap
keep_relevant— keep the most relevant old turns, not just the newest (powered by the relevance engine from its sister library,contextfit)- semantic de-duplication of repeated turns
- auto-detect a model's context window
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chatfit-0.4.0.tar.gz.
File metadata
- Download URL: chatfit-0.4.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b76350adbd69b3f7d5fb26e2288864fa58cd64b23abe9a25aa0cc755ef23f6b3
|
|
| MD5 |
bc3332acae95a3d30210c5d6783f318a
|
|
| BLAKE2b-256 |
611c68b1d8c1fa7ab644a30d6bd079462dc084f306690964f513d7eaf619320f
|
File details
Details for the file chatfit-0.4.0-py3-none-any.whl.
File metadata
- Download URL: chatfit-0.4.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
711716a8b09432900f563febc5bfd93f76e5022e55657060087a63c48e7f210f
|
|
| MD5 |
1c527d64904c3d200122f9c0cac7dace
|
|
| BLAKE2b-256 |
474f9aadd99efa72de5507090d38e5a64f253c9b5dbcacf902bbd66abea38c5a
|