Skip to main content

Pluggable context window management strategies for LLM agents. Zero dependencies.

Project description

contexttrim

PyPI Python CI License: MIT Zero dependencies

Pluggable context-window management for LLM agents.

When a conversation exceeds the model's token limit, most code drops the oldest messages (FIFO) — often dropping a critical system prompt while keeping ten redundant tool results. contexttrim lets you choose what gets dropped and why, with swappable strategies. Zero dependencies, no ML, no tokenizer required.

from contexttrim import ContextManager
from contexttrim.strategies import ImportanceWeighted

ctx = ContextManager(token_budget=8_000, strategy=ImportanceWeighted())
ctx.add({"role": "system", "content": "You are a helpful assistant."})
ctx.add({"role": "user", "content": "Find me flights to NYC."})
ctx.add({"role": "tool", "content": "<very long search result>"})

trimmed = ctx.fit()              # a new list that fits the budget
report = ctx.last_fit_report()   # what was dropped and why
print(report.dropped, report.tokens_used)

Why contexttrim?

Naive FIFO truncation throws away the wrong things. contexttrim gives you a ContextManager plus a set of strategies that make deliberate, inspectable decisions — and it's pure Python stdlib, so it adds nothing to your dependency tree.

Installation

pip install contexttrim

Requires Python 3.9+. No other dependencies, ever.

Strategies

Import from contexttrim.strategies:

Strategy What it does
RecencyDrop Drop the oldest messages first. Fast, simple, often wrong.
MiddleDrop Drop from the middle — models attend least there ("lost in the middle"). Head and tail preserved longest.
RoleWeighted Score by role; drop lowest-scored first. system pinned by default.
ImportanceWeighted role_weight × recency_decay^age ÷ (1 + length_penalty·tokens). Keeps short, recent, high-role messages.
ToolResultMerge Merge redundant adjacent tool results (dedup), then truncate the largest if still over budget — no conversational context dropped.
SemanticCluster Drop messages least topically relevant to the recent conversation, via TF-IDF cosine similarity (pure stdlib, no ML). system pinned.
from contexttrim.strategies import RoleWeighted

RoleWeighted(
    role_weights={"system": 10.0, "user": 2.0, "assistant": 1.0, "tool": 0.5},
    pin_roles=frozenset({"system"}),   # never dropped
)

The fit report

Every fit() records what happened:

trimmed = ctx.fit()
report = ctx.last_fit_report()

report.kept           # the messages that survived
report.dropped        # list of Dropped(message, reason)
report.tokens_used    # total tokens of the kept messages
report.tokens_budget  # the budget you set
report.fits           # False only if pinned messages alone exceed the budget

for d in report.dropped:
    print(d.reason, "->", d.message["role"])

Token counting

By default, contexttrim estimates tokens with a deterministic ~4-characters-per-token heuristic — zero dependencies. Inject your own counter for exact counts (e.g. tiktoken):

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

ctx = ContextManager(
    token_budget=8_000,
    strategy=ImportanceWeighted(),
    token_counter=lambda m: len(enc.encode(m.get("content", "") or "")),
)

Message format

Messages are plain dicts in the common OpenAI/Anthropic shape — {"role": ..., "content": ...}. contexttrim never mutates them; fit() returns a new list. Non-string content (e.g. Anthropic content blocks) is serialized for token counting.

Writing your own strategy

Subclass Strategy and return (kept, dropped):

from contexttrim import Strategy, Dropped

class DropAssistant(Strategy):
    def fit(self, messages, budget, count):
        kept, dropped = [], []
        for msg in messages:
            if msg.get("role") == "assistant":
                dropped.append(Dropped(msg, "assistant messages disabled"))
            else:
                kept.append(msg)
        return kept, dropped

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.


Part of the aenealabs AI agent toolkit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contexttrim-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contexttrim-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file contexttrim-0.1.0.tar.gz.

File metadata

  • Download URL: contexttrim-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for contexttrim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 748151012155b34fe6706b2e44427e424a089491f3654b76c53dde3466e42761
MD5 02c6fffbe6e98903ae5aa480dbaab3d0
BLAKE2b-256 f4ee4d96059e8f0019af9a17583b335dd7f0c79ae0d1ee4361d86566a37d916f

See more details on using hashes here.

File details

Details for the file contexttrim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contexttrim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for contexttrim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bd8d5a9d9b8048ee340a1c954e13766bd765dce42f3423ef8e427959b539922
MD5 10e12ab1f999c22d8e6cf353585175f9
BLAKE2b-256 254126b6045b6e0994fd6d4cba3dcb5725d35d2260228f546adfa42dc204d48a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page