Pluggable context window management strategies for LLM agents. Zero dependencies.
Project description
contexttrim
Pluggable context-window management for LLM agents.
When a conversation exceeds the model's token limit, most code drops the oldest messages (FIFO) — often dropping a critical system prompt while keeping ten redundant tool results. contexttrim lets you choose what gets dropped and why, with swappable strategies. Zero dependencies, no ML, no tokenizer required.
from contexttrim import ContextManager
from contexttrim.strategies import ImportanceWeighted
ctx = ContextManager(token_budget=8_000, strategy=ImportanceWeighted())
ctx.add({"role": "system", "content": "You are a helpful assistant."})
ctx.add({"role": "user", "content": "Find me flights to NYC."})
ctx.add({"role": "tool", "content": "<very long search result>"})
trimmed = ctx.fit() # a new list that fits the budget
report = ctx.last_fit_report() # what was dropped and why
print(report.dropped, report.tokens_used)
Why contexttrim?
Naive FIFO truncation throws away the wrong things. contexttrim gives you a ContextManager plus a set of strategies that make deliberate, inspectable decisions — and it's pure Python stdlib, so it adds nothing to your dependency tree.
Installation
pip install contexttrim
Requires Python 3.9+. No other dependencies, ever.
Strategies
Import from contexttrim.strategies:
| Strategy | What it does |
|---|---|
RecencyDrop |
Drop the oldest messages first. Fast, simple, often wrong. |
MiddleDrop |
Drop from the middle — models attend least there ("lost in the middle"). Head and tail preserved longest. |
RoleWeighted |
Score by role; drop lowest-scored first. system pinned by default. |
ImportanceWeighted |
role_weight × recency_decay^age ÷ (1 + length_penalty·tokens). Keeps short, recent, high-role messages. |
ToolResultMerge |
Merge redundant adjacent tool results (dedup), then truncate the largest if still over budget — no conversational context dropped. |
SemanticCluster |
Drop messages least topically relevant to the recent conversation, via TF-IDF cosine similarity (pure stdlib, no ML). system pinned. |
from contexttrim.strategies import RoleWeighted
RoleWeighted(
role_weights={"system": 10.0, "user": 2.0, "assistant": 1.0, "tool": 0.5},
pin_roles=frozenset({"system"}), # never dropped
)
The fit report
Every fit() records what happened:
trimmed = ctx.fit()
report = ctx.last_fit_report()
report.kept # the messages that survived
report.dropped # list of Dropped(message, reason)
report.tokens_used # total tokens of the kept messages
report.tokens_budget # the budget you set
report.fits # False only if pinned messages alone exceed the budget
for d in report.dropped:
print(d.reason, "->", d.message["role"])
Token counting
By default, contexttrim estimates tokens with a deterministic ~4-characters-per-token heuristic — zero dependencies. Inject your own counter for exact counts (e.g. tiktoken):
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
ctx = ContextManager(
token_budget=8_000,
strategy=ImportanceWeighted(),
token_counter=lambda m: len(enc.encode(m.get("content", "") or "")),
)
Message format
Messages are plain dicts in the common OpenAI/Anthropic shape — {"role": ..., "content": ...}. contexttrim never mutates them; fit() returns a new list. Non-string content (e.g. Anthropic content blocks) is serialized for token counting.
Writing your own strategy
Subclass Strategy and return (kept, dropped):
from contexttrim import Strategy, Dropped
class DropAssistant(Strategy):
def fit(self, messages, budget, count):
kept, dropped = [], []
for msg in messages:
if msg.get("role") == "assistant":
dropped.append(Dropped(msg, "assistant messages disabled"))
else:
kept.append(msg)
return kept, dropped
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Part of the aenealabs AI agent toolkit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contexttrim-0.1.0.tar.gz.
File metadata
- Download URL: contexttrim-0.1.0.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
748151012155b34fe6706b2e44427e424a089491f3654b76c53dde3466e42761
|
|
| MD5 |
02c6fffbe6e98903ae5aa480dbaab3d0
|
|
| BLAKE2b-256 |
f4ee4d96059e8f0019af9a17583b335dd7f0c79ae0d1ee4361d86566a37d916f
|
File details
Details for the file contexttrim-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contexttrim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bd8d5a9d9b8048ee340a1c954e13766bd765dce42f3423ef8e427959b539922
|
|
| MD5 |
10e12ab1f999c22d8e6cf353585175f9
|
|
| BLAKE2b-256 |
254126b6045b6e0994fd6d4cba3dcb5725d35d2260228f546adfa42dc204d48a
|