Fit anything into an LLM context window — a tiny, zero-dependency, priority-aware token-budget packer.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Waelr1985

These details have not been verified by PyPI

Project description

contextcram

Fit anything into an LLM context window. A tiny, zero-dependency, priority-aware token-budget packer for RAG pipelines and agents.

Every RAG or agent app has the same problem: you have too much stuff — a system prompt, chat history, retrieved documents, tool output — and a fixed token budget. contextcram packs it all in by priority, truncating, trimming, or dropping the least important pieces so the important ones always make it.

from contextcram import Packer

packer = Packer(budget=8000)  # token budget

packer.add(system_prompt, priority="required")                 # never dropped
packer.add(chat_history, priority="high", strategy="trim")     # drop oldest turns
packer.add(retrieved_docs, priority="medium", strategy="drop") # all-or-nothing
packer.add(tool_output, priority="low", strategy="truncate")   # cut to fit

result = packer.fit()
print(result.text)            # assembled, in-budget context
print(result.used_tokens)     # e.g. 7840
print(result.dropped_names)   # what didn't make the cut

Why

Zero dependencies. Pure Python. Works out of the box with a fast characters-per-token heuristic; plug in tiktoken or any tokenizer when you need exact counts.
Framework-agnostic. Use it with LangChain, LlamaIndex, the raw provider SDKs, or nothing at all.
Priority-aware. You decide what survives a tight budget, not a blind truncate at the end.
Observable. Every result tells you what was kept, truncated, and dropped.

Installation

pip install contextcram
# optional: exact token counts via tiktoken
pip install "contextcram[tiktoken]"

Strategies

When an optional item doesn't fully fit, its strategy decides what happens:

Strategy	Behavior
`drop`	Include the item whole, or not at all
`truncate`	Cut from the end, keeping the head (default)
`truncate_head`	Cut from the start, keeping the tail
`trim`	For list content: drop oldest segments first

required items are always kept; if they alone exceed the budget, a BudgetExceeded error is raised.

Model-aware budgets

Skip the magic number — set the budget from the model, and reserve room for the response in one go:

from contextcram import Packer

# 128k window for gpt-4o, holding back 2k tokens for the model's reply
packer = Packer(model="gpt-4o", reserve=2000)
print(packer.full_budget)  # 128000
print(packer.budget)       # 126000  (effective budget you pack into)

reserve is the easy way to avoid the classic "prompt fit, but no room left to answer" failure. Unknown model? Pass budget= explicitly or register it:

from contextcram import register_model

register_model("my-internal-llm", 32000)
packer = Packer(model="my-internal-llm", reserve=1000)

Exact token counts

from contextcram import Packer, tiktoken_tokenizer

packer = Packer(budget=8000, tokenizer=tiktoken_tokenizer("gpt-4o"))

Or wrap any tokenizer with CallableTokenizer(lambda s: len(my_encode(s))).

Priorities

Use the named levels "required", "high", "medium", "low", or pass any integer (higher is kept first):

packer.add(text, priority=42, strategy="truncate")

Real-world usage

With LangChain

Pack a system prompt, retrieved docs, and chat history into a gpt-4o budget — leaving room for the answer — then hand the result to the model:

from contextcram import Packer
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o")

docs = [d.page_content for d in retriever.invoke(question)]
history = [f"{m.type}: {m.content}" for m in memory.messages]

ctx = (
    Packer(model="gpt-4o", reserve=1500)                          # room for the reply
    .add(SYSTEM_PROMPT, priority="required")                      # never dropped
    .add(history, priority="high", strategy="trim")               # drop oldest turns
    .add("\n\n".join(docs), priority="medium", strategy="drop")   # whole docs only
    .fit()
)

response = llm.invoke([SystemMessage(ctx.text), HumanMessage(question)])

With the raw Anthropic SDK

Tie reserve to max_tokens so the input can never crowd out the response:

import anthropic
from contextcram import Packer

client = anthropic.Anthropic()
REPLY_TOKENS = 4000

ctx = (
    Packer(model="claude-opus-4-8", reserve=REPLY_TOKENS)
    .add(SYSTEM_PROMPT, priority="required")
    .add(chat_history, priority="high", strategy="trim")
    .add(retrieved_docs, priority="medium", strategy="drop")
    .fit()
)

msg = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=REPLY_TOKENS,        # matches reserve above
    system=ctx.text,
    messages=[{"role": "user", "content": question}],
)
print(f"packed {ctx.used_tokens} tokens; dropped {ctx.dropped_names}")

Alternatives

Priority-based context assembly isn't a new idea, and depending on your needs one of these may fit better — contextcram deliberately trades features for simplicity and zero dependencies:

Library	Approach	When to prefer it over `contextcram`
Priompt / PriomptiPy	Component/JSX-style priority rendering	You want fine-grained, composable prompt components and don't mind a learning curve
Prompt Poet	YAML + Jinja2 templating with cache-aware, priority truncation	You need templating and production GPU prefix-cache optimization
LLMLingua	Model-based prompt compression	You want to shrink text rather than drop/truncate whole pieces

Choose contextcram when you want a tiny, zero-dependency, framework-agnostic helper with a 3-line API (Packer(...).add(...).fit()) that does one thing — fit prioritized pieces into a budget — and gets out of your way.

Development

git clone https://github.com/Waelr1985/contextcram.git
cd contextcram
uv sync
uv run pytest
uv run ruff check .
uv run mypy

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Waelr1985

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 15, 2026

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextcram-0.2.0.tar.gz (12.8 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextcram-0.2.0-py3-none-any.whl (11.3 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file contextcram-0.2.0.tar.gz.

File metadata

Download URL: contextcram-0.2.0.tar.gz
Upload date: Jun 15, 2026
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextcram-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`055ee9fe48850478722af706f043044e971331db5fd4a17e123e85a5cd7a7b00`
MD5	`08f7fc8d0e6f420031962058862f8665`
BLAKE2b-256	`238780e9effad60d60a892dbaf1665f1610b56860f6d6d6576b9f0400e49b994`

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextcram-0.2.0.tar.gz:

Publisher: publish.yml on Waelr1985/contextcram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: contextcram-0.2.0.tar.gz
- Subject digest: 055ee9fe48850478722af706f043044e971331db5fd4a17e123e85a5cd7a7b00
- Sigstore transparency entry: 1828501578
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: Waelr1985/contextcram@c80e7a87cee16f52c50d31142edb9791f0aded46
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Waelr1985
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c80e7a87cee16f52c50d31142edb9791f0aded46
- Trigger Event: release

File details

Details for the file contextcram-0.2.0-py3-none-any.whl.

File metadata

Download URL: contextcram-0.2.0-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextcram-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a2cc24632d1bd8031be17f9326e1e7dfed3bc342f4fa87b6154c13455711d460`
MD5	`9f83b3640b5b63d6cc603129ab9e5b7d`
BLAKE2b-256	`a9cef6853cdf07f94a00b2cf46fc85e0fc18baf4bcebb4c89e6b8e764cd37f5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextcram-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Waelr1985/contextcram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: contextcram-0.2.0-py3-none-any.whl
- Subject digest: a2cc24632d1bd8031be17f9326e1e7dfed3bc342f4fa87b6154c13455711d460
- Sigstore transparency entry: 1828501807
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: Waelr1985/contextcram@c80e7a87cee16f52c50d31142edb9791f0aded46
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Waelr1985
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c80e7a87cee16f52c50d31142edb9791f0aded46
- Trigger Event: release

contextcram 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

contextcram

Why

Installation

Strategies

Model-aware budgets

Exact token counts

Priorities

Real-world usage

With LangChain

With the raw Anthropic SDK

Alternatives

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance