Fit anything into an LLM context window — a tiny, zero-dependency, priority-aware token-budget packer.
Project description
contextcram
Fit anything into an LLM context window. A tiny, zero-dependency, priority-aware token-budget packer for RAG pipelines and agents.
Every RAG or agent app has the same problem: you have too much stuff — a system
prompt, chat history, retrieved documents, tool output — and a fixed token
budget. contextcram packs it all in by priority, truncating, trimming, or
dropping the least important pieces so the important ones always make it.
from contextcram import Packer
packer = Packer(budget=8000) # token budget
packer.add(system_prompt, priority="required") # never dropped
packer.add(chat_history, priority="high", strategy="trim") # drop oldest turns
packer.add(retrieved_docs, priority="medium", strategy="drop") # all-or-nothing
packer.add(tool_output, priority="low", strategy="truncate") # cut to fit
result = packer.fit()
print(result.text) # assembled, in-budget context
print(result.used_tokens) # e.g. 7840
print(result.dropped_names) # what didn't make the cut
Why
- Zero dependencies. Pure Python. Works out of the box with a fast
characters-per-token heuristic; plug in
tiktokenor any tokenizer when you need exact counts. - Framework-agnostic. Use it with LangChain, LlamaIndex, the raw provider SDKs, or nothing at all.
- Priority-aware. You decide what survives a tight budget, not a blind truncate at the end.
- Observable. Every result tells you what was kept, truncated, and dropped.
Installation
pip install contextcram
# optional: exact token counts via tiktoken
pip install "contextcram[tiktoken]"
Strategies
When an optional item doesn't fully fit, its strategy decides what happens:
| Strategy | Behavior |
|---|---|
drop |
Include the item whole, or not at all |
truncate |
Cut from the end, keeping the head (default) |
truncate_head |
Cut from the start, keeping the tail |
trim |
For list content: drop oldest segments first |
required items are always kept; if they alone exceed the budget, a
BudgetExceeded error is raised.
Model-aware budgets
Skip the magic number — set the budget from the model, and reserve room for the response in one go:
from contextcram import Packer
# 128k window for gpt-4o, holding back 2k tokens for the model's reply
packer = Packer(model="gpt-4o", reserve=2000)
print(packer.full_budget) # 128000
print(packer.budget) # 126000 (effective budget you pack into)
reserve is the easy way to avoid the classic "prompt fit, but no room left to
answer" failure. Unknown model? Pass budget= explicitly or register it:
from contextcram import register_model
register_model("my-internal-llm", 32000)
packer = Packer(model="my-internal-llm", reserve=1000)
Exact token counts
from contextcram import Packer, tiktoken_tokenizer
packer = Packer(budget=8000, tokenizer=tiktoken_tokenizer("gpt-4o"))
Or wrap any tokenizer with CallableTokenizer(lambda s: len(my_encode(s))).
Priorities
Use the named levels "required", "high", "medium", "low", or pass any
integer (higher is kept first):
packer.add(text, priority=42, strategy="truncate")
Real-world usage
With LangChain
Pack a system prompt, retrieved docs, and chat history into a gpt-4o budget —
leaving room for the answer — then hand the result to the model:
from contextcram import Packer
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
llm = ChatOpenAI(model="gpt-4o")
docs = [d.page_content for d in retriever.invoke(question)]
history = [f"{m.type}: {m.content}" for m in memory.messages]
ctx = (
Packer(model="gpt-4o", reserve=1500) # room for the reply
.add(SYSTEM_PROMPT, priority="required") # never dropped
.add(history, priority="high", strategy="trim") # drop oldest turns
.add("\n\n".join(docs), priority="medium", strategy="drop") # whole docs only
.fit()
)
response = llm.invoke([SystemMessage(ctx.text), HumanMessage(question)])
With the raw Anthropic SDK
Tie reserve to max_tokens so the input can never crowd out the response:
import anthropic
from contextcram import Packer
client = anthropic.Anthropic()
REPLY_TOKENS = 4000
ctx = (
Packer(model="claude-opus-4-8", reserve=REPLY_TOKENS)
.add(SYSTEM_PROMPT, priority="required")
.add(chat_history, priority="high", strategy="trim")
.add(retrieved_docs, priority="medium", strategy="drop")
.fit()
)
msg = client.messages.create(
model="claude-opus-4-8",
max_tokens=REPLY_TOKENS, # matches reserve above
system=ctx.text,
messages=[{"role": "user", "content": question}],
)
print(f"packed {ctx.used_tokens} tokens; dropped {ctx.dropped_names}")
Alternatives
Priority-based context assembly isn't a new idea, and depending on your needs
one of these may fit better — contextcram deliberately trades features for
simplicity and zero dependencies:
| Library | Approach | When to prefer it over contextcram |
|---|---|---|
| Priompt / PriomptiPy | Component/JSX-style priority rendering | You want fine-grained, composable prompt components and don't mind a learning curve |
| Prompt Poet | YAML + Jinja2 templating with cache-aware, priority truncation | You need templating and production GPU prefix-cache optimization |
| LLMLingua | Model-based prompt compression | You want to shrink text rather than drop/truncate whole pieces |
Choose contextcram when you want a tiny, zero-dependency, framework-agnostic
helper with a 3-line API (Packer(...).add(...).fit()) that does one thing —
fit prioritized pieces into a budget — and gets out of your way.
Development
git clone https://github.com/Waelr1985/contextcram.git
cd contextcram
uv sync
uv run pytest
uv run ruff check .
uv run mypy
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contextcram-0.2.0.tar.gz.
File metadata
- Download URL: contextcram-0.2.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055ee9fe48850478722af706f043044e971331db5fd4a17e123e85a5cd7a7b00
|
|
| MD5 |
08f7fc8d0e6f420031962058862f8665
|
|
| BLAKE2b-256 |
238780e9effad60d60a892dbaf1665f1610b56860f6d6d6576b9f0400e49b994
|
Provenance
The following attestation bundles were made for contextcram-0.2.0.tar.gz:
Publisher:
publish.yml on Waelr1985/contextcram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contextcram-0.2.0.tar.gz -
Subject digest:
055ee9fe48850478722af706f043044e971331db5fd4a17e123e85a5cd7a7b00 - Sigstore transparency entry: 1828501578
- Sigstore integration time:
-
Permalink:
Waelr1985/contextcram@c80e7a87cee16f52c50d31142edb9791f0aded46 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Waelr1985
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c80e7a87cee16f52c50d31142edb9791f0aded46 -
Trigger Event:
release
-
Statement type:
File details
Details for the file contextcram-0.2.0-py3-none-any.whl.
File metadata
- Download URL: contextcram-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2cc24632d1bd8031be17f9326e1e7dfed3bc342f4fa87b6154c13455711d460
|
|
| MD5 |
9f83b3640b5b63d6cc603129ab9e5b7d
|
|
| BLAKE2b-256 |
a9cef6853cdf07f94a00b2cf46fc85e0fc18baf4bcebb4c89e6b8e764cd37f5d
|
Provenance
The following attestation bundles were made for contextcram-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on Waelr1985/contextcram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contextcram-0.2.0-py3-none-any.whl -
Subject digest:
a2cc24632d1bd8031be17f9326e1e7dfed3bc342f4fa87b6154c13455711d460 - Sigstore transparency entry: 1828501807
- Sigstore integration time:
-
Permalink:
Waelr1985/contextcram@c80e7a87cee16f52c50d31142edb9791f0aded46 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Waelr1985
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c80e7a87cee16f52c50d31142edb9791f0aded46 -
Trigger Event:
release
-
Statement type: