Skip to main content

Composable building blocks for LLM context engineering

Project description

LLM ContextKit

Composable building blocks for LLM context engineering.

PyPI version Python versions License: MIT Tests


The Problem

Every team building production GenAI applications ends up building the same context management infrastructure from scratch:

  • How do you assemble multiple layers of context (system prompts, user context, conversation history, retrieved documents, tool results) into a single coherent payload?
  • How do you manage token budgets so the context fits within model limits?
  • How do you handle conversation history (sliding window, summarization, selective inclusion)?
  • How do you debug what actually went into the context window when something goes wrong?

There's no focused, well-designed library that owns this layer cleanly. LangChain gives you orchestration, vector databases give you retrieval, but nobody owns the "context assembly and management" layer.

LLM ContextKit fills that gap.


Quick Start

pip install llm-contextkit
from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer

# Create a token budget
budget = TokenBudget(total=4096, reserve_for_output=1000)
budget.allocate("system", 500, priority=10)
budget.allocate("history", 2500, priority=5)

# Assemble context
assembler = ContextAssembler(budget=budget)
assembler.add_layer(
    SystemLayer(instructions="You are a helpful assistant.")
)
assembler.add_layer(
    HistoryLayer(messages=conversation_history, strategy="sliding_window")
)

# Build for your LLM provider
messages = assembler.build_for_openai()
# Or: payload = assembler.build_for_anthropic()

# Debug what was built
print(assembler.inspect_pretty())

Key Features

  • Token Budget Management — Allocate tokens across layers, automatic truncation by priority
  • Composable Context Layers — System prompts, user context, history, RAG chunks, tool results
  • Multiple History Strategies — Sliding window, summarization, selective inclusion
  • Context Inspection & Debugging — See exactly what went into your context and why
  • OpenAI + Anthropic Output Formats — Build once, deploy to any provider
  • Zero Required Dependencies — Works with just Python stdlib; tiktoken optional for accurate counting

Examples

Basic Chatbot

from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer

budget = TokenBudget(total=4096, reserve_for_output=1000)
budget.allocate("system", 500, priority=10)
budget.allocate("history", 2500, priority=5)

assembler = ContextAssembler(budget=budget)
assembler.add_layer(
    SystemLayer(
        instructions="You are a helpful customer support agent.",
        few_shot_examples=[
            {"input": "I can't log in", "output": "I'd be happy to help with your login issue..."}
        ]
    )
)
assembler.add_layer(
    HistoryLayer(
        messages=conversation_history,
        strategy="sliding_window",
        strategy_config={"max_turns": 10}
    )
)

messages = assembler.build_for_openai()

RAG Application

from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, UserContextLayer, HistoryLayer, RetrievedLayer

budget = TokenBudget(total=8000, reserve_for_output=1500)
budget.allocate("system", 500, priority=10)
budget.allocate("user_context", 300, priority=7)
budget.allocate("retrieved_docs", 3500, priority=6)
budget.allocate("history", 2000, priority=5)

assembler = ContextAssembler(budget=budget)
assembler.add_layer(SystemLayer(instructions="Answer based only on the provided documents."))
assembler.add_layer(UserContextLayer(context={"name": "Jane", "tier": "Enterprise"}))
assembler.add_layer(RetrievedLayer(chunks=retrieved_chunks, max_chunks=5))
assembler.add_layer(HistoryLayer(messages=history, strategy="sliding_window_with_summary"))

payload = assembler.build_for_anthropic()

Agentic Workflow

from llm_contextkit import ContextAssembler, TokenBudget
from llm_contextkit.layers import SystemLayer, HistoryLayer, RetrievedLayer, ToolResultsLayer

budget = TokenBudget(total=16000, reserve_for_output=2000)
budget.allocate("system", 800, priority=10)
budget.allocate("tool_results", 4000, priority=8)
budget.allocate("retrieved_docs", 4000, priority=6)
budget.allocate("history", 5000, priority=5)

assembler = ContextAssembler(budget=budget)
assembler.add_layer(SystemLayer(instructions="You are a research agent with tool access."))
assembler.add_layer(ToolResultsLayer(results=tool_outputs, include_inputs=True))
assembler.add_layer(RetrievedLayer(chunks=context_docs))
assembler.add_layer(HistoryLayer(messages=agent_history))

messages = assembler.build_for_openai()
print(assembler.inspect_pretty())

Context Inspection

from llm_contextkit import ContextInspector

inspector = ContextInspector(tokenizer="cl100k")

# Analyze any messages payload
report = inspector.analyze(messages)
print(report.pretty())

# Compare two payloads
diff = inspector.diff(before_messages, after_messages)
print(diff.pretty())

Context Layers

Layer Purpose Default Priority
SystemLayer System instructions + few-shot examples 10 (highest)
ToolResultsLayer Tool/API call results for agents 8
UserContextLayer User metadata and session context 7
RetrievedLayer RAG chunks with source/relevance metadata 6
HistoryLayer Conversation history with strategies 5 (lowest)

Lower priority layers are truncated first when the context exceeds the budget.


History Strategies

# Keep last N turns
HistoryLayer(messages, strategy="sliding_window", strategy_config={"max_turns": 10})

# Summarize older turns, keep recent in full
HistoryLayer(messages, strategy="sliding_window_with_summary",
             strategy_config={"max_recent_turns": 5, "summarizer": my_summarizer})

# Include only messages relevant to current query
HistoryLayer(messages, strategy="selective",
             strategy_config={"query": user_query, "relevance_threshold": 0.5})

Formatters

from llm_contextkit.formatting import DefaultFormatter, XMLFormatter, MinimalFormatter

# Markdown-style (default)
assembler = ContextAssembler(budget, formatter=DefaultFormatter())

# XML tags (preferred by Claude)
assembler = ContextAssembler(budget, formatter=XMLFormatter())

# Minimal overhead for token-constrained scenarios
assembler = ContextAssembler(budget, formatter=MinimalFormatter())

API Reference

TokenBudget

TokenBudget(
    total=4096,              # Total context window
    tokenizer="cl100k",      # "cl100k", "o200k", "approximate", or callable
    reserve_for_output=1000  # Reserve for model response
)
budget.allocate(layer_name, tokens, priority=0)
budget.count_tokens(text)
budget.summary()

ContextAssembler

ContextAssembler(budget, formatter=None)
assembler.add_layer(layer)
assembler.remove_layer(name)
assembler.build()                # Returns generic dict
assembler.build_for_openai()     # Returns OpenAI messages format
assembler.build_for_anthropic()  # Returns Anthropic API format
assembler.inspect()              # Returns build metadata
assembler.inspect_pretty()       # Returns formatted summary

ContextInspector

ContextInspector(tokenizer="cl100k")
inspector.analyze(messages)  # Returns InspectionReport
inspector.diff(before, after)  # Returns InspectionDiff
inspector.trace(assembler)  # Returns BuildTrace

Installation

# Basic install (uses approximate token counting)
pip install llm-contextkit

# With accurate OpenAI token counting
pip install llm-contextkit[tiktoken]

# Development install
pip install llm-contextkit[dev]

Design Philosophy

  • Composable, not monolithic — Pick the pieces you need, no forced framework adoption
  • Opinionated defaults, full override — Sensible defaults, everything configurable
  • Model-agnostic — Works with OpenAI, Anthropic, open-source models, any LLM
  • Observable by default — Every operation is inspectable and debuggable
  • Library, not a service — No infrastructure dependency, just pip install

What ContextKit Does NOT Do

  • Retrieval / Embeddings — Use your vector database (Pinecone, Weaviate, Qdrant)
  • LLM API calls — We assemble the context; you send it however you want
  • Model-specific prompt tuning — Too opinionated, varies by model
  • Authentication / Hosting — Service territory, not library territory

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Clone and install for development
git clone https://github.com/ajay555/llm-contextkit.git
cd contextkit
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=llm_contextkit --cov-report=term-missing

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_contextkit-0.1.1.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_contextkit-0.1.1-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_contextkit-0.1.1.tar.gz.

File metadata

  • Download URL: llm_contextkit-0.1.1.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_contextkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e93b82e3381cf35f8d326af60eaba35fe41d1f6dc5167e4c941b44d99a255d5a
MD5 df33bab2cee1979bd200909f46158199
BLAKE2b-256 0fee3c42e3b68b423a3b6f8fd7e672d656107d77c38ffa29f0bb01ca41cc07e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_contextkit-0.1.1.tar.gz:

Publisher: publish.yml on ajay555/llm-contextkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_contextkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_contextkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_contextkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4878d00ff5b858e80d1b6ffc247352fee67dfb140af9a15339bce03d2f62632
MD5 3c6f0cf4a252d89c524e7c1448783516
BLAKE2b-256 e857437ce094fb42c9bd91c666f93a9d1a82eb5e06e0120790d60ef5cef13888

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_contextkit-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ajay555/llm-contextkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page