Context compaction for the OpenAI Agents SDK Runner loop
Project description
openai-agents-context-compaction
Context compaction support for the OpenAI Agents SDK, enabling intelligent management of conversation history token size for multi-turn agent interactions.
Note: This package was created in response to openai/openai-agents-python#2244 to provide runner-level context compaction capabilities.
Problem
As agent conversations grow longer, token counts sent to the LLM increase, leading to:
- Higher costs – LLM pricing is token-based
- Degraded reasoning – Models can lose focus with very long contexts
- Context window limits – Models have maximum token limits
- Latency increases – Longer prompts take more time to process
Solution
This package extends the existing OpenAI Agents SDK Session protocol with local compaction strategies, making it provider-agnostic and independent of the OpenAI Responses API compaction endpoint.
⚠️ Early-stage alpha: The current release implements a minimal sliding window approach. Future releases will add token-aware, LLM-based, and pluggable strategies (see Roadmap).
Installation
pip install openai-agents-context-compaction
Optional: accurate token counting
By default, token counts are estimated using a simple ~4 chars/token heuristic. For accurate counts, install the tiktoken extra:
pip install 'openai-agents-context-compaction[tiktoken]'
First-run download: tiktoken downloads a small vocabulary file (~1 MB) from OpenAI's CDN on first use and caches it locally — subsequent runs are fully offline. If you use Docker, add both lines to your Dockerfile so the download happens at build time, not at runtime:
RUN pip install 'openai-agents-context-compaction[tiktoken]'
RUN python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"
Token counts are logged for observability. Token-budget compaction (keeping as many recent items as fit within N tokens) is also supported via the token_budget parameter — see Token-budget compaction below.
Usage
Wrap any existing session with LocalCompactionSession to add automatic compaction support:
from agents import Agent, Runner, SQLiteSession
from openai_agents_context_compaction import LocalCompactionSession
# Create your agent
agent = Agent(name="Assistant", instructions="You are a helpful assistant.")
# Wrap an existing session with compaction support
underlying = SQLiteSession("conversation_123")
session = LocalCompactionSession(underlying, window_size=30)
# Use normally - compaction happens automatically when needed
result = await Runner.run(agent, "Hello!", session=session)
- Compaction is boundary-aware – preserves function call pairs atomically
limitparameter: Also boundary-aware — not a simple tail slice; function call pairs are kept atomic even when usinglimit- When both
window_sizeandtoken_budgetare set, compaction stops when either limit is reached
Token-budget compaction
Use token_budget to limit context by token count rather than (or in addition to) item count:
from openai_agents_context_compaction import LocalCompactionSession, TiktokenCounter
# Token-budget only — keeps the most recent items that fit within 8000 tokens
session = LocalCompactionSession(
underlying,
token_budget=8000,
token_counter=TiktokenCounter(), # accurate OpenAI counts
)
# Both constraints — compaction stops when either is exhausted
session = LocalCompactionSession(
underlying,
window_size=50,
token_budget=8000,
token_counter=TiktokenCounter(),
)
# Custom tokenizer (e.g. Anthropic) — adapt to your SDK version
def my_counter(text: str) -> int:
response = client.beta.messages.count_tokens(
model="claude-haiku-4-5-20251001", # any valid model works
messages=[{"role": "user", "content": text}],
)
return response.input_tokens
session = LocalCompactionSession(underlying, token_budget=8000, token_counter=my_counter)
Illustrative token_budget starting points (tune for your workload):
- Tight budget / small models:
token_budget=4096 - Moderate:
token_budget=16384 - Large models:
token_budget=32768
The default token_counter uses ~4 chars/token (no dependencies). For accurate counts pass TiktokenCounter() (requires pip install 'openai-agents-context-compaction[tiktoken]').
Choosing window_size
window_size is measured in items, not tokens. Each conversation turn adds multiple items:
| Scenario | Items per turn | Guidance |
|---|---|---|
| Simple Q&A | 2 (user + assistant) | window_size=20 keeps ~10 exchanges |
| Single tool call | 4 (user + fc + fco + assistant) | window_size=20 keeps ~5 tool-using turns |
| Batch tool calls | 2n+2 (user + n×fc + n×fco + assistant) | 3 parallel tools = 8 items/turn |
Illustrative starting points (not recommendations — tune for your workload):
- Light tool usage:
window_size=30–50 - Heavy tool usage:
window_size=50–100
Example: If your agent typically calls 2 tools per turn, each turn produces ~6 items. With window_size=30, you retain roughly 5 recent exchanges.
Technical Note
The OpenAI Agents SDK stores session data in Responses API format. Tool calls appear as separate function_call and function_call_output items matched by call_id. This package handles this transparently.
Performance Considerations
For very large sessions (thousands of items), compaction runs on every get_items() call. No in-process cache is kept — each call fetches from the underlying session to avoid stale reads when the session is backed by a shared database with concurrent writers. The compaction algorithm itself is O(n) where n is the total session size. If performance becomes a concern:
- Consider periodic session pruning at the storage layer
- Use a reasonable
window_sizethat balances context retention with processing cost
Roadmap
| Feature | Status |
|---|---|
| Sliding window compaction | ✅ Implemented |
| Token-based limits | ✅ Implemented (token_budget parameter) |
| LLM-based summarization | 🟡 Planned |
| Write-time compaction | 🟡 Planned |
| Pluggable compaction policies | 🟡 Planned |
Write-time Compaction Caveats
| Aspect | Read-time (current) | Write-time |
|---|---|---|
| Full history | ✅ Preserved for audit/debug | ❌ Lost forever |
| Change window_size | ✅ Retroactive | ❌ Requires re-import |
| Storage | ❌ Unbounded growth | ✅ Bounded |
| Read cost | ❌ Compaction on every read | ✅ Cheap reads |
Write-time compaction caveat: Function call pairs arrive in stages:
add_items([function_call])— incomplete (no output yet)add_items([function_call_output])— now complete
If compaction runs at write-time, incomplete pairs may be dropped, breaking session integrity.
Recommendation: Use read-time compaction (current default) to guarantee atomic function call pairs unless storage size is critical. Write-time compaction requires redesign of add_items to handle incomplete pairs safely.
Future Considerations
The following ideas are documented for future reference. Build them when there's a concrete need:
| Feature | Add when... |
|---|---|
| Time-based window | Sessions span days and old context becomes stale |
| Importance scoring | Tool outputs vary wildly in value |
| Pluggable policy interface | Multiple policies need swapping |
| Role-based prioritization | Certain messages must never be evicted |
| Hybrid window | Last N items + always keep M most recent function call pairs |
A pluggable policy interface would look like:
class CompactionPolicy(Protocol):
def compact(self, items: list[TResponseInputItem]) -> list[TResponseInputItem]:
"""Return compacted items. Must preserve function call pair atomicity."""
...
Compatibility
Tested weekly against the latest OpenAI Agents SDK to ensure compatibility.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openai_agents_context_compaction-0.3.0.tar.gz.
File metadata
- Download URL: openai_agents_context_compaction-0.3.0.tar.gz
- Upload date:
- Size: 149.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d249430c440e2f9d2b61eb274d2e198a943423988619d212b02a3b32c86e92ea
|
|
| MD5 |
6e258339f1f53bd167f3d8071579a7d2
|
|
| BLAKE2b-256 |
c991f725c3c1cf2fb5cb477d9ea39ad98d8cd881ee6260dc2b85fbdfbd67a20e
|
File details
Details for the file openai_agents_context_compaction-0.3.0-py3-none-any.whl.
File metadata
- Download URL: openai_agents_context_compaction-0.3.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0491a35d40f61085118afde2e606d5e3a1b87511855ae911bdc6c08559bd009a
|
|
| MD5 |
11d53cf1c04fa01589ec2d0a88c29d81
|
|
| BLAKE2b-256 |
b76a8bc762eca4cad941cf829d72522073e66ec069b297dfcf234e8ff5ea6a61
|