Skip to main content

Prevent LLM context window overflow with token counting, smart truncation, and budget management

Project description

token-overflow

Prevent LLM context window overflow. Count tokens, enforce budgets, and smart-truncate prompts before they silently fail.

The Problem

LLM APIs silently truncate or error when you exceed context windows. You don't know until you get garbage output or a 400 error. token-overflow catches this before the API call.

Install

pip install token-overflow

Quick Start

from token_overflow import TokenGuard, fit

# Simple: fit text into a budget
short = fit("very long document...", max_tokens=1000)

# Guard a full prompt
guard = TokenGuard(model="gpt-4")  # auto-detects 8192 limit

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": huge_document},
]

# Check before calling API
if guard.would_overflow(messages):
    messages = guard.truncate(messages, strategy="tail")

# Or one-liner: safe messages guaranteed to fit
safe = guard.safe(messages)

Token Counting

from token_overflow import count_tokens, count_messages

# Count tokens in text
n = count_tokens("Hello world", model="gpt-4")  # 2

# Count tokens in chat messages
n = count_messages([
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hi"},
], model="gpt-4")

Truncation Strategies

guard = TokenGuard(model="gpt-4")

# Keep the tail (most recent), trim from head
guard.truncate(messages, strategy="tail")

# Keep the head (system prompt), trim from tail
guard.truncate(messages, strategy="head")

# Keep head and tail, trim middle
guard.truncate(messages, strategy="middle")

# Custom reserve: keep 500 tokens for response
guard.truncate(messages, strategy="tail", reserve=500)

Model Limits

from token_overflow import get_limit

get_limit("gpt-4")           # 8192
get_limit("gpt-4-turbo")     # 128000
get_limit("gpt-3.5-turbo")   # 16385
get_limit("claude-3-opus")   # 200000
get_limit("claude-3-sonnet") # 200000

Features

  • Zero config: Knows context limits for 30+ models
  • Multiple strategies: head, tail, middle truncation
  • Message-aware: Understands chat message format, preserves system prompts
  • Reserve tokens: Leave room for the response
  • Fast counting: Uses tiktoken when available, falls back to approximation
  • No dependencies required: tiktoken optional for exact counts

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_overflow-0.1.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_overflow-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file token_overflow-0.1.0.tar.gz.

File metadata

  • Download URL: token_overflow-0.1.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for token_overflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 13179351749aeb535d3a9ce9a2bbf27d157af5876b2a69d21fe73f4fbfef1f7b
MD5 6baa2588e6cb0677f03448a80ffc91ef
BLAKE2b-256 7e31e594fe039421c2702258b0dd940bb67e35d2eab07517f30113afd9290b5e

See more details on using hashes here.

File details

Details for the file token_overflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: token_overflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for token_overflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24b6ada14460c2a0d1aee1d5b567005b485e8ea6b46fe5c56eaa1ad062ca1d1c
MD5 a8b1743be919a470b19770ffd7708f93
BLAKE2b-256 ec61d274b56aa813f1055e80a89dd7c880930fb138cd54cb212bf81fc4096c00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page