Prevent LLM context window overflow with token counting, smart truncation, and budget management
Project description
token-overflow
Prevent LLM context window overflow. Count tokens, enforce budgets, and smart-truncate prompts before they silently fail.
The Problem
LLM APIs silently truncate or error when you exceed context windows. You don't know until you get garbage output or a 400 error. token-overflow catches this before the API call.
Install
pip install token-overflow
Quick Start
from token_overflow import TokenGuard, fit
# Simple: fit text into a budget
short = fit("very long document...", max_tokens=1000)
# Guard a full prompt
guard = TokenGuard(model="gpt-4") # auto-detects 8192 limit
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": huge_document},
]
# Check before calling API
if guard.would_overflow(messages):
messages = guard.truncate(messages, strategy="tail")
# Or one-liner: safe messages guaranteed to fit
safe = guard.safe(messages)
Token Counting
from token_overflow import count_tokens, count_messages
# Count tokens in text
n = count_tokens("Hello world", model="gpt-4") # 2
# Count tokens in chat messages
n = count_messages([
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hi"},
], model="gpt-4")
Truncation Strategies
guard = TokenGuard(model="gpt-4")
# Keep the tail (most recent), trim from head
guard.truncate(messages, strategy="tail")
# Keep the head (system prompt), trim from tail
guard.truncate(messages, strategy="head")
# Keep head and tail, trim middle
guard.truncate(messages, strategy="middle")
# Custom reserve: keep 500 tokens for response
guard.truncate(messages, strategy="tail", reserve=500)
Model Limits
from token_overflow import get_limit
get_limit("gpt-4") # 8192
get_limit("gpt-4-turbo") # 128000
get_limit("gpt-3.5-turbo") # 16385
get_limit("claude-3-opus") # 200000
get_limit("claude-3-sonnet") # 200000
Features
- Zero config: Knows context limits for 30+ models
- Multiple strategies: head, tail, middle truncation
- Message-aware: Understands chat message format, preserves system prompts
- Reserve tokens: Leave room for the response
- Fast counting: Uses tiktoken when available, falls back to approximation
- No dependencies required: tiktoken optional for exact counts
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file token_overflow-0.1.0.tar.gz.
File metadata
- Download URL: token_overflow-0.1.0.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13179351749aeb535d3a9ce9a2bbf27d157af5876b2a69d21fe73f4fbfef1f7b
|
|
| MD5 |
6baa2588e6cb0677f03448a80ffc91ef
|
|
| BLAKE2b-256 |
7e31e594fe039421c2702258b0dd940bb67e35d2eab07517f30113afd9290b5e
|
File details
Details for the file token_overflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: token_overflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24b6ada14460c2a0d1aee1d5b567005b485e8ea6b46fe5c56eaa1ad062ca1d1c
|
|
| MD5 |
a8b1743be919a470b19770ffd7708f93
|
|
| BLAKE2b-256 |
ec61d274b56aa813f1055e80a89dd7c880930fb138cd54cb212bf81fc4096c00
|