Skip to main content

Deterministic, type-aware reduction of agent tool outputs at the source. Cut LLM token cost without making the agent do less.

Project description

LeanContext

Trim the tool output your AI agent re-sends every turn. Keep the signal, drop the noise.

PyPI Downloads License: Apache 2.0 Python 3.10+ CI Ruff mypy


AI agents re-send every tool result (logs, JSON, diffs, stack traces, HTML) to the model on every turn, and most of it is redundancy you pay for again and again. LeanContext sits between your agent and the model and reduces those payloads to their signal: deterministically, with a fidelity score on every reduction, and without ever breaking the agent.

from leancontext import reduce

@reduce
def search_logs(query: str) -> str:
    return run_log_search(query)   # ~10k tokens of logs in, ~1k out, error lines kept

See it

$ python bench.py
sample              kind          before   after  saved  fidelity
-----------------------------------------------------------------
log (incident)      log            52642     100   100%      100%
json (RAG chunks)   json            1862    1391    25%      100%
html (web fetch)    html            1672    1093    35%      100%
diff (patch)        diff             639      81    87%      100%
stacktrace          stacktrace       896      94    90%      100%
-----------------------------------------------------------------
TOTAL                              57711    2759    95%

Counts above use the built-in heuristic tokenizer (≈4 chars/token). Install the tiktoken extra for exact model token counts — the ratios are similar (~92% on this sample). The reduced text is identical either way.

A real incident log, before and after:

# before  (902 lines)
2026-06-21T09:00:01Z INFO  [gateway] req id=a1 path="/v1/render" status=200 ms=12
... 900 near-identical INFO lines ...
2026-06-21T09:10:43Z FATAL [render] OOM killed worker=7 doc="deck-8842" root cause

# after
2026-06-21T09:00:01Z INFO  [gateway] req id=a1 path="/v1/render" status=200 ms=12   ⟪×900 similar⟫
2026-06-21T09:10:43Z FATAL [render] OOM killed worker=7 doc="deck-8842" root cause

The redundant lines collapse to a count. The FATAL line that explains the crash is kept intact.

Why it works

The model API is the bulk of an agent's cost, and most of that is input tokens. A tool result added on one turn is re-sent on every later turn, so the bill grows with the length of the conversation, not just the work done. Those payloads are mostly repetition. LeanContext keeps the errors, anomalies, and identifiers, and collapses the rest.

How it compares

LeanContext LLM-based compressor Wire-level proxy
No model in the reduction path varies
Deterministic varies
Prompt-cache safe often ✗ often ✗
Type-aware (keeps error lines)
Fidelity score per reduction
Added latency / cost none a model call a network hop

Install

pip install leancontext                  # core, standard library only
pip install "leancontext[integrations]"  # openai, anthropic, litellm, fastapi adapters
pip install "leancontext[otel]"          # OpenTelemetry metrics
pip install "leancontext[mcp]"           # MCP server
pip install "leancontext[tiktoken]"      # exact token counts (used automatically when present)

Use it

Three levels, one core. Every path fails open: if anything goes wrong, you get the original text back.

import leancontext

clean = leancontext.reduce(tool_output).text     # 1) manual

@leancontext.reduce                              # 2) decorator, one line per tool
def search_logs(q: str) -> str:
    ...

tools  = leancontext.wrap(tools)                 # 3) wrap all tools, or an SDK client
client = leancontext.wrap(openai_client)         #    (wrap_anthropic / wrap_gemini too)

Every reduction is inspectable:

r = leancontext.reduce(tool_output)
r.text                            # what to send to the model
r.tokens_before, r.tokens_after
r.ratio                           # fraction saved
r.fidelity                        # 0..1 signal preserved

Integrations

Surface How
Decorator / tools @leancontext.reduce, leancontext.wrap(tools)
OpenAI / Anthropic / Gemini SDK wrap_openai(c), wrap_anthropic(c), wrap_gemini(c)
LiteLLM (proxy) callbacks: leancontext.integrations.litellm.proxy_handler_instance
LiteLLM (SDK) import leancontext.integrations.litellm as ll; ll.patch()
Standalone proxy from leancontext.integrations.proxy import create_app (OpenAI-compatible, any language)
Messages leancontext.reduce_messages(messages) (OpenAI, Anthropic, Gemini)
Telemetry import leancontext.integrations.otel as o; o.instrument()
Anthropic native wrap_anthropic_native(client, ...) composes with clear_tool_uses context editing
Frameworks LangChain, LangGraph, Agno via wrap(tools); any framework via @reduce on tool functions (sync or async)
MCP server python -m leancontext.integrations.mcp_server — reduce / expand / stats over stdio

CI exercises OpenAI (chat + Responses), Anthropic, LiteLLM, the standalone proxy, OpenTelemetry, and the MCP server against the real packages. Message reduction for all formats (including Gemini) is unit-tested directly. The framework adapters (LangChain / LangGraph / Agno) and the SDK-level Gemini client wrapper are provided best-effort and are not yet covered in CI against the live SDKs.

Reducers

Kind What it does
log Collapse near-identical lines, keep every error, anomaly, and unique line verbatim
json Factor repeated keys out once, lay values out columnar (near-lossless)
diff Keep all change, hunk, and header lines, collapse unchanged context
stacktrace Keep the exception and boundary frames, collapse the deep middle
html Strip tags, scripts, and styles, keep visible text and links
table Collapse whitespace-aligned command-line tables, keep header and data

Anything else, or any payload below the size, saving, or fidelity thresholds, passes through unchanged.

How it works

Each tool output flows through fail-open gates (hash, size check, type detection, the typed reducer, then a saving and fidelity check) and returns either the reduced text or the original. Results are cached by content hash, so a payload re-sent across turns is reduced only once. See docs/ARCHITECTURE.md for diagrams.

Cost and telemetry

from leancontext.cost import CostTracker

tracker = CostTracker(model="claude-sonnet-4-6").install()
# ... run your agent ...
tracker.report()    # {tokens_saved, usd_saved, ratio, cache_safe: True}

Configuration

leancontext.disable()                         # global kill switch (or env LEANCONTEXT_DISABLED=1)
leancontext.reduce(x, min_saving=0.1, min_fidelity=0.85)
leancontext.on_reduction(callback)            # telemetry hook (composable)
leancontext.use_tiktoken("gpt-4o")            # force a specific model's tokenizer

Roadmap

CI-verified LangChain / LlamaIndex / CrewAI / Agno adapters, accurate provider tokenizers by default, and broader Anthropic native interop.

Contributing

Issues and PRs welcome. Run pytest. Reducers are pure functions, str -> (reduced, notes), and must be deterministic and value-preserving. See AGENTS.md for the design rules.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leancontext-2.1.0.tar.gz (702.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leancontext-2.1.0-py3-none-any.whl (45.0 kB view details)

Uploaded Python 3

File details

Details for the file leancontext-2.1.0.tar.gz.

File metadata

  • Download URL: leancontext-2.1.0.tar.gz
  • Upload date:
  • Size: 702.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for leancontext-2.1.0.tar.gz
Algorithm Hash digest
SHA256 91d3e69b715100a712df16899aaf1e6cc5f088d6f21759284a7791cf5d6be1cf
MD5 bd59f16271063d19965cacc4c8b52152
BLAKE2b-256 317283236568bde033b46657de10686de59c6a8887fbaf7a2875761f6f65326b

See more details on using hashes here.

File details

Details for the file leancontext-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for leancontext-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1eaae74b57440d1d5fde6f6240c5303bbc6c229ebc5156835fe9bd3c5ff20bab
MD5 1bf1ddbae544bf7b645ad4828d61e8e5
BLAKE2b-256 3b82f041605f481d429c42e5e591e95791ab7ffaac184d381e351ca198747e74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page