Skip to main content

Make your AI agents leaner, faster, and cheaper — smart context management and token compression

Project description

agentslim 🪶

Make your AI agents leaner, faster, and cheaper.

agentslim is a zero-dependency Python toolkit that reduces token consumption in LLM-powered agents — without sacrificing reasoning quality.

PyPI version Python License: MIT


Why?

Every token counts — literally. When building agents you routinely waste tokens on:

Problem Typical waste
Verbose JSON tool schemas 200–800 tokens per request
Raw HTML web scrapes fed to the LLM 60–80% noise
Naively truncated chat history Lost context, broken reasoning
Sending entire source files to coding agents 10× more than needed

agentslim solves all four with one clean API.


Install

pip install agentslim

For accurate token counting (uses tiktoken under the hood):

pip install agentslim[tiktoken]

Quick Start

from agentslim import Compressor, AgentMemory, ToolMinifier, CodeContext

# 1 ── Compress any content before sending to LLM
c = Compressor()
slim = c.compress(raw_html_or_json_or_text)   # auto-detects format

# 2 ── Smart context window with auto-summarization
mem = AgentMemory(max_tokens=6000)
mem.add("user", "Build me a FastAPI app")
mem.add("assistant", "Sure! Here's the plan...")
messages = mem.get_messages()   # ready for openai.chat.completions.create()

# 3 ── Minify tool schemas
slim_tools = ToolMinifier.minify(my_tools)          # shorter descriptions
hint_str   = ToolMinifier.to_compact_str(my_tools)  # one-liner per tool

# 4 ── Send only the relevant code chunk, not the whole file
snippet = CodeContext.extract_function("app.py", "handle_request")
outline = CodeContext.outline("app.py")   # class/function map

Modules

🗜️ Compressor — Text / HTML / JSON compressor

Strips noise from content before it hits your LLM.

from agentslim import Compressor
from agentslim.compressor import CompressorConfig

# Defaults — safe for most use cases
c = Compressor()

# Fine-grained control
c = Compressor(config=CompressorConfig(
    strip_html=True,
    remove_decorative_html=True,   # drops <script>, <style>, <nav>, etc.
    collapse_whitespace=True,
    remove_filler_phrases=True,    # "Certainly! As an AI language model..."
    compact_json=True,
    remove_python_comments=False,  # keep comments by default
))

clean = c.compress(raw_content)          # auto-detects JSON / HTML / text
clean = c.compress_html(html_string)
clean = c.compress_json(json_string)
clean = c.compress_text(plain_text)
clean = c.compress_code(source, language="python")  # or "js" / "ts"

Savings report:

from agentslim.utils import tokens_saved_report

report = tokens_saved_report(original, compressed, model="gpt-4o")
# {
#   'original_tokens': 1842,
#   'compressed_tokens': 612,
#   'tokens_saved': 1230,
#   'percent_saved': 66.8,
#   'cost_saved_usd': 0.003075
# }

🧠 AgentMemory — Smart sliding-window context manager

Instead of naively cutting old messages (which breaks reasoning), AgentMemory auto-summarizes the oldest messages into a compact system note.

from agentslim import AgentMemory

mem = AgentMemory(
    max_tokens=6000,       # soft limit on the active window
    archive_ratio=0.4,     # archive the oldest 40% when limit is hit
    summarize_fn=None,     # optional: plug in your LLM for better summaries
)

mem.add("system", "You are a helpful assistant.")
mem.add("user", "Hello!")
mem.add("assistant", "Hi! How can I help?")

messages = mem.get_messages()   # list[dict] — pass directly to any OpenAI-compatible API
print(mem.stats())
# MemoryStats(active_messages=3, archived=0, active_tokens=24, ...)

With a real LLM summarizer:

import openai

def gpt_summarize(messages):
    history = "\n".join(f"{m.role}: {m.content}" for m in messages)
    resp = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Summarize in 3 sentences."},
            {"role": "user",   "content": history},
        ],
    )
    return resp.choices[0].message.content

mem = AgentMemory(max_tokens=8000, summarize_fn=gpt_summarize)

🛠️ ToolMinifier — Tool schema minifier

OpenAI function schemas are JSON-heavy. ToolMinifier cuts them down.

from agentslim import ToolMinifier

# Option A: minify but keep JSON format (for the API)
slim_tools = ToolMinifier.minify(tools, max_desc=80)

# Option B: ultra-compact one-liner hint for system prompts
print(ToolMinifier.to_compact_str(tools))
# get_weather(location:string, unit:string?) -> Any  # Get current weather…
# send_email(to:string, subject:string, body:string) -> Any

# Option C: auto-generate schemas from Python functions
def search(query: str, max_results: int) -> str:
    """Search the web for real-time info."""
    ...

tools = ToolMinifier.from_python_functions(search)
Format Tokens (example)
Full verbose JSON ~520
minify() ~310
to_compact_str() ~40

📄 CodeContext — Code-aware chunk extractor

Don't send 500-line files to your coding agent — send only what it needs.

from agentslim import CodeContext

# Extract a single function (+ N lines of context)
snippet = CodeContext.extract_function("app.py", "process_payment", context_lines=3)

# Extract a class skeleton (signatures only)
skeleton = CodeContext.extract_class("service.py", "PaymentService", methods_only=True)

# Outline: class/function map of the whole file
outline = CodeContext.outline("app.py")
# ['class PaymentService (L12)', 'def charge (L28)', 'def refund (L45)']

# Folded view: function bodies replaced with '...'
folded = CodeContext.folded("large_module.py")

# Extract specific line range
chunk = CodeContext.extract_lines("app.py", start_line=120, end_line=145, context_lines=5)
View Tokens saved
Full source 0%
Folded ~55%
Outline only ~85%

📊 utils — Token counting & cost estimation

from agentslim.utils import count_tokens, estimate_cost

tokens = count_tokens("Hello, world!")  # uses tiktoken if available

cost = estimate_cost(input_tokens=1000, output_tokens=200, model="gpt-4o")
# {'input_usd': 0.0025, 'output_usd': 0.002, 'total_usd': 0.0045}

Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, claude-3-5-sonnet, claude-3-haiku, gemini-1.5-pro, gemini-1.5-flash.


Compatibility

agentslim is framework-agnostic. It works with anything that accepts a list of {"role": ..., "content": ...} dicts:

  • ✅ OpenAI Python SDK
  • ✅ LangChain / LangGraph
  • ✅ LlamaIndex
  • ✅ Anthropic SDK
  • ✅ Google Generative AI SDK
  • ✅ Any custom agent framework

Running tests

pip install -e ".[dev]"
pytest

Contributing

PRs and issues welcome! See CONTRIBUTING.md.


License

MIT © agentslim contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentslim-0.1.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentslim-0.1.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file agentslim-0.1.0.tar.gz.

File metadata

  • Download URL: agentslim-0.1.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for agentslim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8390ad7dc1c6659c601a3bc6f09efe805c6c4432a87eee9894158218c339d59c
MD5 c13afed7053c2d09c141765392dad3e2
BLAKE2b-256 6cb2a14ff51283f8c23ef8b1719df09f5fd4a56896e77cbc5475e86e3f88401c

See more details on using hashes here.

File details

Details for the file agentslim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentslim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for agentslim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c58c1f014955ffe00f8a1bc9c645bf3522da2da509a311b7875ef680eea6657
MD5 99c363b0ea5a699d87e11b8900277e93
BLAKE2b-256 5918c30afbfbd00fa13a0fdd611f6c14ce46ace449f9da80c202b41c18f4eb35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page