Trim noisy logs, diffs, and chat transcripts before they become LLM tokens

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

headWw

These details have not been verified by PyPI

Project description

toktrim

Trim noisy logs, diffs, and chat transcripts before they become LLM tokens.

What is it

toktrim is a small command-line tool that takes the kind of bulk text you tend to paste into an LLM — a 5,000-line CI log, a multi-file git diff, a long chat transcript — and rewrites it so the important parts stay and the noise drops out. The result is shorter, cheaper, and usually clearer for the model to read.

It is intentionally not a coding agent. It does one job: shrink context. You can pipe its output into your existing workflow, into a provider call (toktrim review, toktrim debug, toktrim ask), or wire it into Claude Code so trimming happens automatically.

The trims are lossy by design. Every mode preserves what matters most — errors, changed lines, the most recent turns — and drops what almost certainly does not — ANSI escape codes, hundreds of identical warnings, unchanged file regions, stale chat history. A bundled benchmark harness measures recall against fixed test cases so the trade-off is visible, not vibes-based.

Before and after

A 156-line build log with 150 repeated [WARN] lines and one real failure at the end:

[OK] build started
[OK] compile module a
[OK] compile module b
[WARN] deprecated symbol legacyFn at module 0
[WARN] deprecated symbol legacyFn at module 1
... 148 more identical warnings ...
[WARN] deprecated symbol legacyFn at module 149
[FAIL] TypeError: cannot access undefined.id
[FAIL] build aborted

Run through toktrim logs:

[OK] build started
[OK] compile module a
[OK] compile module b
[WARN] deprecated symbol legacyFn at module 0
... (head retained: 37 lines)
[76 log lines omitted]
... (tail retained: 40 lines, including:)
[FAIL] TypeError: cannot access undefined.id
[FAIL] build aborted

2,766 → 929 tokens. 66% saved. ANSI codes stripped, warning storm collapsed, both [FAIL] lines preserved verbatim. (Source: toktrim benchmark, fixture ansi_heavy_log, cl100k_base.)

Install

pip install toktrim                  # base install
pipx install toktrim                 # recommended for a CLI tool
pip install 'toktrim[openai]'        # add the OpenAI provider
pip install 'toktrim[anthropic]'     # add the Anthropic provider
pip install 'toktrim[openai,anthropic,bench,dashboard]'   # everything

For provider-backed commands, set the matching key:

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."

Five-minute tour

The local transforms are pure: text in, smaller text out. No network, no API key needed.

`toktrim auto` — the one to remember

Hand it a path. It detects whether the file is a log, a diff, or a transcript, and trims it accordingly. Tiny files pass through unchanged.

toktrim auto /tmp/build.log
toktrim auto failing-test-output.txt --show-mode

`toktrim logs`

Strips ANSI escapes, collapses repeated lines, and keeps a head and tail of the file with everything in between marked [N log lines omitted].

pytest 2>&1 | toktrim logs
toktrim logs --input failing.log --max-lines 120

`toktrim diff`

Keeps file headers, hunk headers, and changed lines. Drops most surrounding context (a couple of lines stay for readability). Caps at 12 hunks by default.

git diff | toktrim diff
toktrim diff --input patch.diff --context-lines 2

`toktrim convo`

Keeps the most recent turns of a transcript verbatim and converts older content into a short bulleted summary.

toktrim convo --input transcript.txt
cat transcript.txt | toktrim convo --recent-lines 30

`toktrim stats`

Show the savings without printing the trimmed text.

git diff | toktrim stats --mode diff
pytest 2>&1 | toktrim stats --mode logs --json

stats uses tiktoken if installed; otherwise it falls back to a byte-based estimate.

Sending to an LLM

Three opinionated developer verbs that combine "trim" + "send" + "store the result locally."

`toktrim review` — review a diff

git diff | toktrim review --provider openai --show-stats

Trims the diff first, then asks the model for a code review. Add --dry-run to see the request without sending.

`toktrim debug` — root-cause from a log

pytest 2>&1 | toktrim debug --provider openai --prompt "why is this failing?"

`toktrim ask` — anything else

toktrim ask --provider openai --prompt "summarize the tradeoffs of this approach"

Switch providers per call with --provider openai|anthropic. When --model is omitted, toktrim picks a sensible default per provider. Reasoning effort defaults are also provider-specific (Anthropic defaults to medium and enables adaptive thinking; OpenAI defaults to low).

There is also a low-level toktrim send primitive that requires you to spell out --mode and --prompt explicitly. It's the foundation the developer verbs are built on; reach for it if you want full control.

Local sessions and the dashboard

Every successful run stores a normalized message log and a metrics record under your user state directory (or ./.toktrim-state/ if that's read-only). Sessions persist across provider switches, so a conversation that started on OpenAI can continue on Anthropic without losing history.

toktrim sessions list
toktrim sessions show <session_id>

If you installed the dashboard extra, toktrim dashboard serves a small local web UI over the same data:

pip install 'toktrim[dashboard]'
toktrim dashboard          # http://127.0.0.1:8765

Claude Code integration

If you use Claude Code, toktrim ships an auto-invoking skill that trims logs and diffs on your behalf whenever you reference one. A PreToolUse(Read) hook also intercepts agent-self-produced bulk files (e.g., pdflatex > /tmp/x.log followed by Read) so trimming applies even when you don't reference the file in chat. A PostToolUse(Bash) hook goes further and trims verbose command output (pytest, builds, linters) in place — leaving file reads, patches, and structured JSON byte-for-byte intact.

toktrim install-claude-code
toktrim doctor                  # verify the install
toktrim uninstall-claude-code   # reverse it (idempotent)

The integration is opt-in, scoped narrowly, and includes safety guarantees against prompt-injection abuse. See docs/CLAUDE_CODE.md for the full setup, per-Read enforcement details, tunable thresholds, security model, and --skill-mode lockdown details.

Security

toktrim is a local CLI. The main attack surface is toktrim auto being pointed at a sensitive path by a malicious or prompt-injected caller. The tool refuses well-known sensitive patterns (SSH keys, AWS creds, .env files, Claude Code's own state, and more) after symlink resolution, and --skill-mode pins the safety knobs so they cannot be widened by passed flags.

This is one layer, not a guarantee. Continue to protect true secrets with filesystem permissions (chmod 0600).

For the full threat model, mitigation list, and how to report a vulnerability, see SECURITY.md.

Configuration

toktrim follows the standard CLI convention: per-user knobs are environment variables. Each is read at invocation time, so changing a value takes effect on the next call.

Variable	Default	Purpose
`TOKTRIM_MIN_AUTO_TOKENS`	`500`	Token-count floor below which `toktrim auto` returns `raw` instead of trimming. Raise to skip more small inputs; lower (or set `0`) to trim aggressively.
`TOKTRIM_AUTO_ALLOW_SENSITIVE`	unset	Set to any truthy value to bypass the sensitive-path denylist. Honored only outside `--skill-mode`; the Claude Code skill ignores it.
`TOKTRIM_STATE_DIR`	`$XDG_STATE_HOME/toktrim` or `~/.local/state/toktrim`	Override where session history and run metrics are written.
`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`	unset	Required only for provider-backed commands (`send`, `review`, `debug`, `ask`).

Quick example — raise the floor for a single invocation:

TOKTRIM_MIN_AUTO_TOKENS=2000 git diff | toktrim auto --show-mode

Migration note (0.2.0): the TOKTRIM_MIN_AUTO_TOKENS default was raised from 200 to 500. Inputs in the 200–500 token range that previously trimmed will now route to raw (skip trim) by default. Set TOKTRIM_MIN_AUTO_TOKENS=200 to restore the prior behavior.

Quality benchmarks

Trims are lossy, so the project measures the trade-off explicitly. The bundled toktrim benchmark command runs every transform against fixture inputs and reports three metrics per fixture:

savings — token reduction vs the original input (using cl100k_base).
recall — fraction of declared must_retain facts that survive the trim.
prefix stability — how much of the trimmed output's token prefix stays identical when an elidable region of the input is mutated. Measures prompt-cache friendliness.

pip install 'toktrim[bench]'
toktrim benchmark
toktrim benchmark --fail-under 90 --fail-savings-under 50 --fail-prefix-under 80

The committed baseline lives in src/toktrim/benchmarks/baseline.json. See docs/benchmarks.md for how fixtures work and how to add new ones.

Roadmap

More task profiles such as fix, explain, and commit-msg
Provider adapters beyond OpenAI and Anthropic
Git-aware file scoping beyond raw diff input
Cache-stable prompt ordering for providers that support prompt caching
Benchmark fixtures for additional CI logs, stack traces, and multi-turn coding chats

License and contributing

MIT. Issues and pull requests welcome at github.com/headWw/toktrim — see CONTRIBUTING.md to get set up.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

headWw

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toktrim-0.2.0.tar.gz (253.2 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toktrim-0.2.0-py3-none-any.whl (184.7 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file toktrim-0.2.0.tar.gz.

File metadata

Download URL: toktrim-0.2.0.tar.gz
Upload date: Jun 22, 2026
Size: 253.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for toktrim-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3163046ea0d1890bbb2ec331edac9044e1bbb542cf5f56717cbac4eb0cd8b4b8`
MD5	`5986c429738eb98496187cb0d6bb556e`
BLAKE2b-256	`4e088da19d95e63cc9533a68b380d1980cab288eaad2926966c4e774742affd5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for toktrim-0.2.0.tar.gz:

Publisher: publish.yml on headWw/toktrim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: toktrim-0.2.0.tar.gz
- Subject digest: 3163046ea0d1890bbb2ec331edac9044e1bbb542cf5f56717cbac4eb0cd8b4b8
- Sigstore transparency entry: 1913286382
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: headWw/toktrim@ee985596ba88556e6f9de33e59721c8f01c14ebd
- Branch / Tag:
- Owner: https://github.com/headWw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ee985596ba88556e6f9de33e59721c8f01c14ebd
- Trigger Event: release

File details

Details for the file toktrim-0.2.0-py3-none-any.whl.

File metadata

Download URL: toktrim-0.2.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 184.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for toktrim-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8211bb6569f0567310d168c8634011c4ee1be02bd98ff4a35ff1f3238cdc6cce`
MD5	`2afef0754cf866785beb94880e9f29bf`
BLAKE2b-256	`3daf2c9b56156693c4553af1903544780aded6b8ce446244e583852bb294e024`

See more details on using hashes here.

Provenance

The following attestation bundles were made for toktrim-0.2.0-py3-none-any.whl:

Publisher: publish.yml on headWw/toktrim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: toktrim-0.2.0-py3-none-any.whl
- Subject digest: 8211bb6569f0567310d168c8634011c4ee1be02bd98ff4a35ff1f3238cdc6cce
- Sigstore transparency entry: 1913286513
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: headWw/toktrim@ee985596ba88556e6f9de33e59721c8f01c14ebd
- Branch / Tag:
- Owner: https://github.com/headWw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ee985596ba88556e6f9de33e59721c8f01c14ebd
- Trigger Event: release

toktrim 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

toktrim

What is it

Before and after

Install

Five-minute tour

toktrim auto — the one to remember

toktrim logs

toktrim diff

toktrim convo

toktrim stats

Sending to an LLM

toktrim review — review a diff

toktrim debug — root-cause from a log

toktrim ask — anything else

Local sessions and the dashboard

Claude Code integration

Security

Configuration

Quality benchmarks

Roadmap

License and contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`toktrim auto` — the one to remember

`toktrim logs`

`toktrim diff`

`toktrim convo`

`toktrim stats`

`toktrim review` — review a diff

`toktrim debug` — root-cause from a log

`toktrim ask` — anything else