Make your AI agents leaner, faster, and cheaper — smart context management and token compression
Project description
agentslim 🪶
Make your AI agents leaner, faster, and cheaper.
agentslim is a zero-dependency Python toolkit that reduces token consumption in LLM-powered agents — without sacrificing reasoning quality.
Why?
Every token counts — literally. When building agents you routinely waste tokens on:
| Problem | Typical waste |
|---|---|
| Verbose JSON tool schemas | 200–800 tokens per request |
| Raw HTML web scrapes fed to the LLM | 60–80% noise |
| Naively truncated chat history | Lost context, broken reasoning |
| Sending entire source files to coding agents | 10× more than needed |
agentslim solves all four with one clean API.
Install
pip install agentslim
For accurate token counting (uses tiktoken under the hood):
pip install agentslim[tiktoken]
Quick Start
from agentslim import Compressor, AgentMemory, ToolMinifier, CodeContext
# 1 ── Compress any content before sending to LLM
c = Compressor()
slim = c.compress(raw_html_or_json_or_text) # auto-detects format
# 2 ── Smart context window with auto-summarization
mem = AgentMemory(max_tokens=6000)
mem.add("user", "Build me a FastAPI app")
mem.add("assistant", "Sure! Here's the plan...")
messages = mem.get_messages() # ready for openai.chat.completions.create()
# 3 ── Minify tool schemas
slim_tools = ToolMinifier.minify(my_tools) # shorter descriptions
hint_str = ToolMinifier.to_compact_str(my_tools) # one-liner per tool
# 4 ── Send only the relevant code chunk, not the whole file
snippet = CodeContext.extract_function("app.py", "handle_request")
outline = CodeContext.outline("app.py") # class/function map
Modules
🗜️ Compressor — Text / HTML / JSON compressor
Strips noise from content before it hits your LLM.
from agentslim import Compressor
from agentslim.compressor import CompressorConfig
# Defaults — safe for most use cases
c = Compressor()
# Fine-grained control
c = Compressor(config=CompressorConfig(
strip_html=True,
remove_decorative_html=True, # drops <script>, <style>, <nav>, etc.
collapse_whitespace=True,
remove_filler_phrases=True, # "Certainly! As an AI language model..."
compact_json=True,
remove_python_comments=False, # keep comments by default
))
clean = c.compress(raw_content) # auto-detects JSON / HTML / text
clean = c.compress_html(html_string)
clean = c.compress_json(json_string)
clean = c.compress_text(plain_text)
clean = c.compress_code(source, language="python") # or "js" / "ts"
Savings report:
from agentslim.utils import tokens_saved_report
report = tokens_saved_report(original, compressed, model="gpt-4o")
# {
# 'original_tokens': 1842,
# 'compressed_tokens': 612,
# 'tokens_saved': 1230,
# 'percent_saved': 66.8,
# 'cost_saved_usd': 0.003075
# }
🧠 AgentMemory — Smart sliding-window context manager
Instead of naively cutting old messages (which breaks reasoning), AgentMemory auto-summarizes the oldest messages into a compact system note.
from agentslim import AgentMemory
mem = AgentMemory(
max_tokens=6000, # soft limit on the active window
archive_ratio=0.4, # archive the oldest 40% when limit is hit
summarize_fn=None, # optional: plug in your LLM for better summaries
)
mem.add("system", "You are a helpful assistant.")
mem.add("user", "Hello!")
mem.add("assistant", "Hi! How can I help?")
messages = mem.get_messages() # list[dict] — pass directly to any OpenAI-compatible API
print(mem.stats())
# MemoryStats(active_messages=3, archived=0, active_tokens=24, ...)
With a real LLM summarizer:
import openai
def gpt_summarize(messages):
history = "\n".join(f"{m.role}: {m.content}" for m in messages)
resp = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize in 3 sentences."},
{"role": "user", "content": history},
],
)
return resp.choices[0].message.content
mem = AgentMemory(max_tokens=8000, summarize_fn=gpt_summarize)
🛠️ ToolMinifier — Tool schema minifier
OpenAI function schemas are JSON-heavy. ToolMinifier cuts them down.
from agentslim import ToolMinifier
# Option A: minify but keep JSON format (for the API)
slim_tools = ToolMinifier.minify(tools, max_desc=80)
# Option B: ultra-compact one-liner hint for system prompts
print(ToolMinifier.to_compact_str(tools))
# get_weather(location:string, unit:string?) -> Any # Get current weather…
# send_email(to:string, subject:string, body:string) -> Any
# Option C: auto-generate schemas from Python functions
def search(query: str, max_results: int) -> str:
"""Search the web for real-time info."""
...
tools = ToolMinifier.from_python_functions(search)
| Format | Tokens (example) |
|---|---|
| Full verbose JSON | ~520 |
minify() |
~310 |
to_compact_str() |
~40 |
📄 CodeContext — Code-aware chunk extractor
Don't send 500-line files to your coding agent — send only what it needs.
from agentslim import CodeContext
# Extract a single function (+ N lines of context)
snippet = CodeContext.extract_function("app.py", "process_payment", context_lines=3)
# Extract a class skeleton (signatures only)
skeleton = CodeContext.extract_class("service.py", "PaymentService", methods_only=True)
# Outline: class/function map of the whole file
outline = CodeContext.outline("app.py")
# ['class PaymentService (L12)', 'def charge (L28)', 'def refund (L45)']
# Folded view: function bodies replaced with '...'
folded = CodeContext.folded("large_module.py")
# Extract specific line range
chunk = CodeContext.extract_lines("app.py", start_line=120, end_line=145, context_lines=5)
| View | Tokens saved |
|---|---|
| Full source | 0% |
| Folded | ~55% |
| Outline only | ~85% |
📊 utils — Token counting & cost estimation
from agentslim.utils import count_tokens, estimate_cost
tokens = count_tokens("Hello, world!") # uses tiktoken if available
cost = estimate_cost(input_tokens=1000, output_tokens=200, model="gpt-4o")
# {'input_usd': 0.0025, 'output_usd': 0.002, 'total_usd': 0.0045}
Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo,
claude-3-5-sonnet, claude-3-haiku, gemini-1.5-pro, gemini-1.5-flash.
Compatibility
agentslim is framework-agnostic. It works with anything that accepts a list of {"role": ..., "content": ...} dicts:
- ✅ OpenAI Python SDK
- ✅ LangChain / LangGraph
- ✅ LlamaIndex
- ✅ Anthropic SDK
- ✅ Google Generative AI SDK
- ✅ Any custom agent framework
Running tests
pip install -e ".[dev]"
pytest
Contributing
PRs and issues welcome! See CONTRIBUTING.md.
License
MIT © agentslim contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentslim-0.1.0.tar.gz.
File metadata
- Download URL: agentslim-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8390ad7dc1c6659c601a3bc6f09efe805c6c4432a87eee9894158218c339d59c
|
|
| MD5 |
c13afed7053c2d09c141765392dad3e2
|
|
| BLAKE2b-256 |
6cb2a14ff51283f8c23ef8b1719df09f5fd4a56896e77cbc5475e86e3f88401c
|
File details
Details for the file agentslim-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentslim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c58c1f014955ffe00f8a1bc9c645bf3522da2da509a311b7875ef680eea6657
|
|
| MD5 |
99c363b0ea5a699d87e11b8900277e93
|
|
| BLAKE2b-256 |
5918c30afbfbd00fa13a0fdd611f6c14ce46ace449f9da80c202b41c18f4eb35
|