Squeeze verbose LLM agent tool output down to only the relevant lines

These details have not been verified by PyPI

Project links

Project description

Squeez

Squeez Logo
Squeeze out the juice, leave the pulp behind.

Tool output pruner for LLM coding agents
Pipe any tool output (pytest, grep, git log, npm build, kubectl, ...) through squeez with a task description, get back only the relevant lines
Two models, same CLI: a generative Qwen 3.5 2B (0.80 F1, 92% compression) or a smaller extractive ModernBERT alternative
CLI pipe, Python library, or vLLM server

Existing context pruning tools (SWE-Pruner, Zilliz Semantic Highlight, Provence) are built for source code or document paragraphs. They don't handle the mixed, unstructured format of tool output (stack traces interleaved with passing tests, grep matches with context lines, build logs with timestamps). Squeez is trained on 27 types of tool output from real SWE-bench workflows and synthetic multi-ecosystem observations.

pip install squeez
python -m pytest tests/ -v 2>&1 | squeez "find the test failure related to authentication"

Example

Task: "Find the test failure related to authentication"

Before (45 lines, ~1,500 tokens) After (6 lines, ~200 tokens)

Before (45 lines, ~1,500 tokens)	After (6 lines, ~200 tokens)
$ python -m pytest tests/ -v ======================== test session starts ======================== platform linux -- Python 3.12.1, pytest-8.1.1 collected 23 items tests/test_auth.py::test_login_valid PASSED tests/test_auth.py::test_login_invalid PASSED tests/test_auth.py::test_token_refresh FAILED tests/test_auth.py::test_logout PASSED tests/test_users.py::test_create_user PASSED tests/test_users.py::test_delete_user PASSED tests/test_users.py::test_list_users PASSED tests/test_middleware.py::test_csrf_check PASSED tests/test_middleware.py::test_rate_limit PASSED tests/test_middleware.py::test_cors_headers PASSED ======================= FAILURES ================================ _____ test_token_refresh ________________________________________ def test_token_refresh(self): token = self.client.get_token(expired=True) > refreshed = self.client.refresh(token) E AuthenticationError: Token refresh window expired E Expected: new token within 30m window E Got: rejection after 15m (timeout changed?) tests/test_auth.py:47: AuthenticationError ================ short test summary info ======================== FAILED tests/test_auth.py::test_token_refresh ================== 1 failed, 9 passed ==========================	`tests/test_auth.py::test_token_refresh FAILED def test_token_refresh(self): token = self.client.get_token(expired=True) > refreshed = self.client.refresh(token) E AuthenticationError: Token refresh window expired E Expected: new token within 30m window E Got: rejection after 15m (timeout changed?)` 87% compression. Only the failing test and its traceback survive.

$ python -m pytest tests/ -v
======================== test session starts ========================
platform linux -- Python 3.12.1, pytest-8.1.1
collected 23 items

tests/test_auth.py::test_login_valid PASSED
tests/test_auth.py::test_login_invalid PASSED
tests/test_auth.py::test_token_refresh FAILED
tests/test_auth.py::test_logout PASSED
tests/test_users.py::test_create_user PASSED
tests/test_users.py::test_delete_user PASSED
tests/test_users.py::test_list_users PASSED
tests/test_middleware.py::test_csrf_check PASSED
tests/test_middleware.py::test_rate_limit PASSED
tests/test_middleware.py::test_cors_headers PASSED

======================= FAILURES ================================
_____ test_token_refresh ________________________________________

    def test_token_refresh(self):
        token = self.client.get_token(expired=True)
>       refreshed = self.client.refresh(token)
E       AuthenticationError: Token refresh window expired
E       Expected: new token within 30m window
E       Got: rejection after 15m (timeout changed?)

tests/test_auth.py:47: AuthenticationError
================ short test summary info ========================
FAILED tests/test_auth.py::test_token_refresh
================== 1 failed, 9 passed ==========================

tests/test_auth.py::test_token_refresh FAILED

    def test_token_refresh(self):
        token = self.client.get_token(expired=True)
>       refreshed = self.client.refresh(token)
E       AuthenticationError: Token refresh window expired
E       Expected: new token within 30m window
E       Got: rejection after 15m (timeout changed?)

87% compression. Only the failing test and its traceback survive.

More examples

Filtering git log:

$ git log --oneline -25 | squeez "find the commit that changed the authentication timeout"

u6v7w8x Change auth timeout from 30m to 1h

Filtering build output:

$ npm run build 2>&1 | squeez "find the TypeScript error"

src/components/Auth.tsx(34,5): error TS2345: Argument of type 'string' is
  not assignable to parameter of type 'AuthToken'.

Filtering kubectl output:

$ kubectl describe pod api-server-7d4b | squeez "why is the pod failing"

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
  Warning  BackOff  3m (x5)  kubelet  Back-off restarting failed container

Results

Evaluated on 618 manually curated held-out examples spanning 27 tool types:

Model	Precision	Recall	F1	Compression
Squeez-2B	0.80	0.86	0.80	0.92
Qwen 3.5 35B A3B (zero-shot)	0.74	0.75	0.73	0.92
Kimi K2 (zero-shot)	0.61	0.53	0.68	0.94
Qwen 3.5 2B (untrained)	0.42	0.53	0.55	0.82
BM25 (10%)	0.13	0.22	0.23	0.90
Random (10%)	0.07	0.10	0.20	0.91

Squeez-2B (2B params) outperforms the 18x larger Qwen 3.5 35B A3B by 11 recall points at the same compression level.

Quick start

With vLLM (recommended)

pip install vllm
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384

# Use from squeez CLI
pip install squeez
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
cat output.txt | squeez "find the bug"

vLLM keeps the model warm in memory with batched inference and high throughput.

Local inference (no server)

pip install squeez

cat output.txt | squeez "Find the failing traceback block"
squeez "Fix the CSRF bug" --input-file output.txt

Note: Local mode loads the model on every call. Fine for one-off use, but for repeated calls (e.g. an agent piping every tool through squeez), use vLLM.

Any OpenAI-compatible API

Works with Groq, Together, or any OpenAI-compatible server. Set the URL, model name, and API key:

export SQUEEZ_SERVER_URL=https://api.groq.com/openai/v1
export SQUEEZ_SERVER_MODEL=squeez
export SQUEEZ_API_KEY=gsk_...

Python API

from squeez.inference.extractor import ToolOutputExtractor

# Default: loads KRLabsOrg/squeez-2b locally
extractor = ToolOutputExtractor()

# Or connect to a server
extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")

filtered = extractor.extract(
    task="Find the referer validation block",
    tool_output=raw_output,
)

Use with Claude Code

Add to your CLAUDE.md:

Always when you invoke a shell command, pipe it through `squeez` and tell exactly what you want to know.

Examples:
- `bun test 2>&1 | squeez "did the tests pass?"`
- `git log --oneline -50 | squeez "find the commit that broke CSRF"`
- `cat src/auth/middleware.py | squeez "find the referer validation logic"`

Do NOT use squeez when:
- You need exact, uncompressed output (e.g. writing a patch)
- The command is interactive

Works with other coding agents (Codex CLI, OpenCode, etc.) via their equivalent instruction files.

Advanced

Configuration

Resolved in order: CLI flags > environment variables > config file.

Config file is loaded from the first found: ./squeez.yaml, ./configs/default.yaml, ~/.config/squeez/config.yaml.

# squeez.yaml
server_url: "http://localhost:8000/v1"
# local_model_path: "./output/squeez_qwen"  # for local inference instead
# backend: null  # auto-detect; or "transformers", "vllm", "encoder"

Environment variables:

Variable	Description
`SQUEEZ_SERVER_URL`	Server URL (vLLM, Ollama, etc.)
`SQUEEZ_LOCAL_MODEL`	Path to local model directory
`SQUEEZ_SERVER_MODEL`	Model name on the server
`SQUEEZ_API_KEY`	API key (if needed)
`SQUEEZ_BACKEND`	Force backend (rarely needed; auto-detected from the model)

Use the extractive model instead

If you don't need the 2B generative model, point squeez at a smaller extractive one — same CLI, same Python API. Configure once, then use squeez normally:

export SQUEEZ_LOCAL_MODEL=KRLabsOrg/verbatim-rag-modern-bert-v2

pytest -q 2>&1 | squeez "find the failing test"
git log --oneline -50 | squeez "find the auth commit"

KRLabsOrg/verbatim-rag-modern-bert-v2 is a 150M ModernBERT span model trained on a multi-domain mix that includes Squeez tool-output — its model card has the head-to-head numbers.

To train your own extractive model, see TRAINING.md.

Training

See TRAINING.md for full training and evaluation commands.

# Download dataset
python scripts/download_data.py

# Train generative model (Qwen 3.5 2B + LoRA)
squeez train --train-file data/train.jsonl --eval-file data/dev.jsonl

# Train token encoder
python -m squeez.encoder.train \
    --classifier-type token \
    --train-file data/encoder_train.jsonl \
    --eval-file data/encoder_dev.jsonl \
    --base-model answerdotai/ModernBERT-base \
    --output-dir output/squeez_encoder

# Evaluate
squeez eval --extractor-model output/squeez_qwen --eval-file data/test.jsonl

Dataset

Training data: KRLabsOrg/tool-output-extraction-swebench

Built from SWE-bench repositories and synthetic multi-ecosystem tool outputs. Each sample has:

query: a focused extraction request or agent subgoal
tool_output: raw tool output as seen by the agent
gold_spans: contiguous spans over the raw output

From this canonical format, Squeez derives generative SFT files and encoder training files.

To regenerate from scratch:

python scripts/build_full_dataset.py \
    --output-dir data/v3 \
    --teacher-model openai/gpt-oss-120b \
    --teacher-base-url http://localhost:8000/v1

Citation

@misc{kovács2026squeeztaskconditionedtooloutputpruning,
      title={Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents}, 
      author={Ádám Kovács},
      year={2026},
      eprint={2604.04979},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2604.04979}, 
}

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.4

Apr 27, 2026

0.1.3

Mar 18, 2026

0.1.2

Mar 8, 2026

0.1.1

Mar 8, 2026

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

squeez-0.1.4.tar.gz (83.3 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

squeez-0.1.4-py3-none-any.whl (92.3 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file squeez-0.1.4.tar.gz.

File metadata

Download URL: squeez-0.1.4.tar.gz
Upload date: Apr 27, 2026
Size: 83.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for squeez-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`5c5043bc96f1d94b286704fbe0c198114370aff5220e222895308a877d8da059`
MD5	`e2507137c54ab9ee7bb370561f4110a5`
BLAKE2b-256	`814568da8cdb8c4be4a5488bfdea11dd8e2a52b7131d3cdab28d1d716a44595e`

See more details on using hashes here.

File details

Details for the file squeez-0.1.4-py3-none-any.whl.

File metadata

Download URL: squeez-0.1.4-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 92.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for squeez-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e6941a823a37e5ab75164a70b9c49a62479d7efafce4827fcffd7440bbd161c`
MD5	`5c7e2d797469733c9c50df1b22cd214a`
BLAKE2b-256	`bbdd09586cc6f7d4108e78e4d7ae91a65f2932ddd30d0659ba05b7e41b2bce9a`

See more details on using hashes here.

squeez 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Squeez

Example

Results

Quick start

With vLLM (recommended)

Local inference (no server)

Any OpenAI-compatible API

Python API

Use with Claude Code

Advanced

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes