Composable building blocks for LLM context engineering
Project description
LLM ContextKit
Composable building blocks for LLM context engineering.
The Problem
Every team building production GenAI applications ends up building the same context management infrastructure from scratch:
- How do you assemble multiple layers of context (system prompts, user context, conversation history, retrieved documents, tool results) into a single coherent payload?
- How do you manage token budgets so the context fits within model limits?
- How do you handle conversation history (sliding window, summarization, selective inclusion)?
- How do you debug what actually went into the context window when something goes wrong?
There's no focused, well-designed library that owns this layer cleanly. LangChain gives you orchestration, vector databases give you retrieval, but nobody owns the "context assembly and management" layer.
LLM ContextKit fills that gap.
Quick Start
pip install llm-contextkit
from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer
# Create a token budget
budget = TokenBudget(total=4096, reserve_for_output=1000)
budget.allocate("system", 500, priority=10)
budget.allocate("history", 2500, priority=5)
# Assemble context
assembler = ContextAssembler(budget=budget)
assembler.add_layer(
SystemLayer(instructions="You are a helpful assistant.")
)
assembler.add_layer(
HistoryLayer(messages=conversation_history, strategy="sliding_window")
)
# Build for your LLM provider
messages = assembler.build_for_openai()
# Or: payload = assembler.build_for_anthropic()
# Debug what was built
print(assembler.inspect_pretty())
Key Features
- Token Budget Management — Allocate tokens across layers, automatic truncation by priority
- Composable Context Layers — System prompts, user context, history, RAG chunks, tool results
- Multiple History Strategies — Sliding window, summarization, selective inclusion
- Context Inspection & Debugging — See exactly what went into your context and why
- OpenAI + Anthropic Output Formats — Build once, deploy to any provider
- Zero Required Dependencies — Works with just Python stdlib; tiktoken optional for accurate counting
Examples
Basic Chatbot
from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer
budget = TokenBudget(total=4096, reserve_for_output=1000)
budget.allocate("system", 500, priority=10)
budget.allocate("history", 2500, priority=5)
assembler = ContextAssembler(budget=budget)
assembler.add_layer(
SystemLayer(
instructions="You are a helpful customer support agent.",
few_shot_examples=[
{"input": "I can't log in", "output": "I'd be happy to help with your login issue..."}
]
)
)
assembler.add_layer(
HistoryLayer(
messages=conversation_history,
strategy="sliding_window",
strategy_config={"max_turns": 10}
)
)
messages = assembler.build_for_openai()
RAG Application
from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, UserContextLayer, HistoryLayer, RetrievedLayer
budget = TokenBudget(total=8000, reserve_for_output=1500)
budget.allocate("system", 500, priority=10)
budget.allocate("user_context", 300, priority=7)
budget.allocate("retrieved_docs", 3500, priority=6)
budget.allocate("history", 2000, priority=5)
assembler = ContextAssembler(budget=budget)
assembler.add_layer(SystemLayer(instructions="Answer based only on the provided documents."))
assembler.add_layer(UserContextLayer(context={"name": "Jane", "tier": "Enterprise"}))
assembler.add_layer(RetrievedLayer(chunks=retrieved_chunks, max_chunks=5))
assembler.add_layer(HistoryLayer(messages=history, strategy="sliding_window_with_summary"))
payload = assembler.build_for_anthropic()
Agentic Workflow
from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer, RetrievedLayer, ToolResultsLayer
budget = TokenBudget(total=16000, reserve_for_output=2000)
budget.allocate("system", 800, priority=10)
budget.allocate("tool_results", 4000, priority=8)
budget.allocate("retrieved_docs", 4000, priority=6)
budget.allocate("history", 5000, priority=5)
assembler = ContextAssembler(budget=budget)
assembler.add_layer(SystemLayer(instructions="You are a research agent with tool access."))
assembler.add_layer(ToolResultsLayer(results=tool_outputs, include_inputs=True))
assembler.add_layer(RetrievedLayer(chunks=context_docs))
assembler.add_layer(HistoryLayer(messages=agent_history))
messages = assembler.build_for_openai()
print(assembler.inspect_pretty())
Context Inspection
from llm_contextkit import ContextInspector
inspector = ContextInspector(tokenizer="cl100k")
# Analyze any messages payload
report = inspector.analyze(messages)
print(report.pretty())
# Compare two payloads
diff = inspector.diff(before_messages, after_messages)
print(diff.pretty())
Context Layers
| Layer | Purpose | Default Priority |
|---|---|---|
SystemLayer |
System instructions + few-shot examples | 10 (highest) |
ToolResultsLayer |
Tool/API call results for agents | 8 |
UserContextLayer |
User metadata and session context | 7 |
RetrievedLayer |
RAG chunks with source/relevance metadata | 6 |
HistoryLayer |
Conversation history with strategies | 5 (lowest) |
Lower priority layers are truncated first when the context exceeds the budget.
History Strategies
# Keep last N turns
HistoryLayer(messages, strategy="sliding_window", strategy_config={"max_turns": 10})
# Summarize older turns, keep recent in full
HistoryLayer(messages, strategy="sliding_window_with_summary",
strategy_config={"max_recent_turns": 5, "summarizer": my_summarizer})
# Include only messages relevant to current query
HistoryLayer(messages, strategy="selective",
strategy_config={"query": user_query, "relevance_threshold": 0.5})
Formatters
from llm_contextkit.formatting import DefaultFormatter, XMLFormatter, MinimalFormatter
# Markdown-style (default)
assembler = ContextAssembler(budget, formatter=DefaultFormatter())
# XML tags (preferred by Claude)
assembler = ContextAssembler(budget, formatter=XMLFormatter())
# Minimal overhead for token-constrained scenarios
assembler = ContextAssembler(budget, formatter=MinimalFormatter())
API Reference
TokenBudget
TokenBudget(
total=4096, # Total context window
tokenizer="cl100k", # "cl100k", "o200k", "approximate", or callable
reserve_for_output=1000 # Reserve for model response
)
budget.allocate(layer_name, tokens, priority=0)
budget.count_tokens(text)
budget.summary()
ContextAssembler
ContextAssembler(budget, formatter=None)
assembler.add_layer(layer)
assembler.remove_layer(name)
assembler.build() # Returns generic dict
assembler.build_for_openai() # Returns OpenAI messages format
assembler.build_for_anthropic() # Returns Anthropic API format
assembler.inspect() # Returns build metadata
assembler.inspect_pretty() # Returns formatted summary
ContextInspector
ContextInspector(tokenizer="cl100k")
inspector.analyze(messages) # Returns InspectionReport
inspector.diff(before, after) # Returns InspectionDiff
inspector.trace(assembler) # Returns BuildTrace
Installation
# Basic install (uses approximate token counting)
pip install llm-contextkit
# With accurate OpenAI token counting
pip install llm-contextkit[tiktoken]
# Development install
pip install llm-contextkit[dev]
Design Philosophy
- Composable, not monolithic — Pick the pieces you need, no forced framework adoption
- Opinionated defaults, full override — Sensible defaults, everything configurable
- Model-agnostic — Works with OpenAI, Anthropic, open-source models, any LLM
- Observable by default — Every operation is inspectable and debuggable
- Library, not a service — No infrastructure dependency, just
pip install
What ContextKit Does NOT Do
- Retrieval / Embeddings — Use your vector database (Pinecone, Weaviate, Qdrant)
- LLM API calls — We assemble the context; you send it however you want
- Model-specific prompt tuning — Too opinionated, varies by model
- Authentication / Hosting — Service territory, not library territory
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Clone and install for development
git clone https://github.com/ajay555/llm-contextkit.git
cd contextkit
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=llm_contextkit --cov-report=term-missing
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_contextkit-0.1.1.tar.gz.
File metadata
- Download URL: llm_contextkit-0.1.1.tar.gz
- Upload date:
- Size: 52.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e93b82e3381cf35f8d326af60eaba35fe41d1f6dc5167e4c941b44d99a255d5a
|
|
| MD5 |
df33bab2cee1979bd200909f46158199
|
|
| BLAKE2b-256 |
0fee3c42e3b68b423a3b6f8fd7e672d656107d77c38ffa29f0bb01ca41cc07e6
|
Provenance
The following attestation bundles were made for llm_contextkit-0.1.1.tar.gz:
Publisher:
publish.yml on ajay555/llm-contextkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_contextkit-0.1.1.tar.gz -
Subject digest:
e93b82e3381cf35f8d326af60eaba35fe41d1f6dc5167e4c941b44d99a255d5a - Sigstore transparency entry: 999864057
- Sigstore integration time:
-
Permalink:
ajay555/llm-contextkit@ef9d1b6371d2821b472c2a90c53d79e5ddc95285 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ajay555
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ef9d1b6371d2821b472c2a90c53d79e5ddc95285 -
Trigger Event:
release
-
Statement type:
File details
Details for the file llm_contextkit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_contextkit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 40.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4878d00ff5b858e80d1b6ffc247352fee67dfb140af9a15339bce03d2f62632
|
|
| MD5 |
3c6f0cf4a252d89c524e7c1448783516
|
|
| BLAKE2b-256 |
e857437ce094fb42c9bd91c666f93a9d1a82eb5e06e0120790d60ef5cef13888
|
Provenance
The following attestation bundles were made for llm_contextkit-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on ajay555/llm-contextkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_contextkit-0.1.1-py3-none-any.whl -
Subject digest:
c4878d00ff5b858e80d1b6ffc247352fee67dfb140af9a15339bce03d2f62632 - Sigstore transparency entry: 999864063
- Sigstore integration time:
-
Permalink:
ajay555/llm-contextkit@ef9d1b6371d2821b472c2a90c53d79e5ddc95285 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ajay555
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ef9d1b6371d2821b472c2a90c53d79e5ddc95285 -
Trigger Event:
release
-
Statement type: