Compress logs for LLM analysis (Rust-powered)
Project description
logzip (Rust)
Compress logs before sending to LLM. Powered by Rust & PyO3.
raw log → [logzip compress] → compressed text → LLM (Claude Code / Cursor / API)
Before / After
Raw Log (Uvicorn):
INFO: 127.0.0.1:45678 - "GET /api/v1/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:45679 - "GET /api/v1/status HTTP/1.1" 200 OK
... (100 similar lines) ...
logzip output:
--- PREFIX ---
INFO: 127.0.0.1:
--- LEGEND ---
#0# = - "GET /api/v1/status HTTP/1.1" 200 OK
--- BODY ---
45678 #0#
45679 #0#
...
Typical savings: 52–58% on structured logs (systemd, uvicorn, docker).
Anomalies and unique lines stay uncompressed — visible at a glance in the BODY.
Why use logzip? (RAG & LLM)
When working with logs in LLMs (Claude, GPT, RAG systems), you face two problems:
- Context Limit: Logs are huge. A 10MB log is ~2.5M tokens.
- Noise: 90% of the log consists of repeating
INFOand identical requests that drown out the real error.
logzip is well-suited for RAG pipelines: it compresses the context before sending it to the model, saving money on tokens and increasing answer accuracy by highlighting anomalies.
Performance (7.96 MB Log, ~2M tokens)
Benchmarked on a real 7.96 MB production log.
logzip modes
| Mode | CLI | Time (ms) | Size (KB) | Saved (%) | Output type |
|---|---|---|---|---|---|
| fast | --quality fast |
~200 | ~4,900 | ~40% | text/LLM |
| balanced | --quality balanced |
404 | 3,928 | 52% | text/LLM |
| recursive ★ | --quality balanced --bpe-passes 2 |
418 | 3,404 | 58% | text/LLM |
| max | --quality max |
507 | 3,511 | 57% | text/LLM |
★ recursive (balanced + 2 BPE passes) beats max in both size and speed — recommended for production.
vs. binary compressors (for context)
| Tool | Time (ms) | Size (KB) | Saved (%) | LLM-readable? |
|---|---|---|---|---|
| lz4 | 6 | 1,280 | 84% | No |
| zstd (lvl 3) | 14 | 819 | 90% | No |
| zlib (lvl 6) | 69 | 840 | 90% | No |
| logzip (recursive) | 418 | 3,404 | 58% | Yes |
Binary compressors produce opaque binary blobs — LLMs cannot read them. logzip trades ~30% size for fully human- and LLM-readable output.
Token estimation: 1 token ≈ 4 characters (rough estimate for English-like logs).
Economic Impact
┌──────────────────────────────────────────────────────────┐
│ logzip Savings (7.96 MB Production Log) │
├──────────────────────────────────────────────────────────┤
│ Raw Size: 8,151 KB (~1,990,000 tokens) │
│ After balanced: 3,928 KB (~959,000 tokens, -52%) │
│ After recursive: 3,404 KB (~831,000 tokens, -58%) │
├──────────────────────────────────────────────────────────┤
│ Cost Before: $5.97 │
│ Cost After: $2.49 (Claude 3.5 Sonnet Input) │
│ LLM Efficiency: 2.4x larger context for the same price │
└──────────────────────────────────────────────────────────┘
Install
pip install logzip
CLI
# stdin → stdout (default mode)
logzip compress < app.log
# quality preset (fast|balanced|max)
logzip compress --quality balanced < app.log
# explicit BPE passes (overrides --quality default)
logzip compress --quality balanced --bpe-passes 3 < app.log
# with preamble (LLM decode instructions at the top)
logzip compress --preamble < app.log > compressed.txt
# save + show stats
logzip compress --stats -i app.log -o app.logzip
# explicit profile (otherwise auto-detected)
logzip compress --profile journalctl < /tmp/syslog.txt
# decompress
logzip decompress -i app.logzip
Python API
from logzip import compress, decompress
# compress
result = compress(raw_log_text)
print(result.render(with_preamble=True)) # → for LLM
print(result.stats_str()) # → for logs
# fine-grained control
result = compress(
raw_log_text,
max_legend_entries=128, # legend size
bpe_passes=2, # recursive BPE passes (1–3)
do_normalize=True, # collapse timestamps, ANSI, IPs
do_templates=True, # structural template extraction
)
# decompress
original = decompress(result.render())
Through the eyes of an LLM
Unlike gzip/zstd which produce binary noise, logzip produces structured text. The model can reliably interpret the legend and reconstruct repeated patterns, allowing it to analyze the log directly in compressed form.
Input for LLM:
This is a compressed log. Rules:
#0#is replaced byGET /api/v1/status.--- BODY --- 12:00:01 #0# 200 OK 12:00:02 #0# 500 ERR <-- Boom, anomaly!
The model instantly spots the 500 error without wading through thousands of identical successful requests.
Architecture & Safety
- Normalizer: Collapses ANSI, timestamps, IPs, and common prefixes.
- Frequency Analysis: Parallel n-gram counting using
rayon. - Greedy Legend: Optimized selection using a positional index (O(N)).
- Direct Replacement: Fast substitution without re-scanning.
- Recursive BPE: Second-pass compression on already-compressed text — finds repeated tag sequences for extra savings.
- Templates: Structural template extraction.
Safety First
- Pure Rust: Core logic is 100% Rust.
- Zero
unsafe: The codebase contains no unsafe blocks, ensuring memory safety within the Python runtime. - Stress-tested: Handled multi-GB logs without memory leaks or crashes.
Reproducibility
Want to verify our benchmarks? Run the included script:
python benchmark.py
Roadmap / v2
- MCP server for Claude Code
- Suffix automaton for arbitrary repetition search
- Streaming mode for massive files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logzip-1.1.0.tar.gz.
File metadata
- Download URL: logzip-1.1.0.tar.gz
- Upload date:
- Size: 30.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16c8d2a20308af2c5dc9d58dc55310bccac092d6955f11b36d03faaef42141fa
|
|
| MD5 |
c82b9d81006a57907061eca8c1a19480
|
|
| BLAKE2b-256 |
25daa6af20f94fbbc361293d265c16dae193ddf9368276876fa5496af62d731a
|
Provenance
The following attestation bundles were made for logzip-1.1.0.tar.gz:
Publisher:
publish.yml on NailShakurov/logzip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logzip-1.1.0.tar.gz -
Subject digest:
16c8d2a20308af2c5dc9d58dc55310bccac092d6955f11b36d03faaef42141fa - Sigstore transparency entry: 1356962706
- Sigstore integration time:
-
Permalink:
NailShakurov/logzip@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/NailShakurov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file logzip-1.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: logzip-1.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 835.7 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c110f4d52e24d721c0233299b6c615801f2879f9ab40b15fc00f03d1b35851e
|
|
| MD5 |
24eb69f2c726bf30f34fd9ad8b678e74
|
|
| BLAKE2b-256 |
c55d44fd937bea5f010d05ccdbb7107b207ed0255e6848a126aba710f3de14dd
|
Provenance
The following attestation bundles were made for logzip-1.1.0-cp39-abi3-win_amd64.whl:
Publisher:
publish.yml on NailShakurov/logzip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logzip-1.1.0-cp39-abi3-win_amd64.whl -
Subject digest:
3c110f4d52e24d721c0233299b6c615801f2879f9ab40b15fc00f03d1b35851e - Sigstore transparency entry: 1356962753
- Sigstore integration time:
-
Permalink:
NailShakurov/logzip@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/NailShakurov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 913.7 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b14abce7000b9654c51c3bca098c1d9d305d3643aee5ba2fb73215eb9cf1f94
|
|
| MD5 |
74d80e1727e3c88227f4e28839a5e876
|
|
| BLAKE2b-256 |
d1e1f26cf476d4cef516c09d926dd6ef7f245c6886f7b675b061bcb6dc59bab5
|
Provenance
The following attestation bundles were made for logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on NailShakurov/logzip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
9b14abce7000b9654c51c3bca098c1d9d305d3643aee5ba2fb73215eb9cf1f94 - Sigstore transparency entry: 1356962732
- Sigstore integration time:
-
Permalink:
NailShakurov/logzip@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/NailShakurov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.9+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69c0a7a088ea885ea471cb97a32e7b385ca3a09af015fff618d3717f53220f07
|
|
| MD5 |
855cacd4145b6255b7233858c26cae66
|
|
| BLAKE2b-256 |
1891b4113ea07930ad77b9488d441683e487bc1302ae2c184842b4d66dee9472
|
Provenance
The following attestation bundles were made for logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
publish.yml on NailShakurov/logzip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
69c0a7a088ea885ea471cb97a32e7b385ca3a09af015fff618d3717f53220f07 - Sigstore transparency entry: 1356962775
- Sigstore integration time:
-
Permalink:
NailShakurov/logzip@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/NailShakurov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21e3ff81f7ae9dd38bab4c745c9c2556355cef65 -
Trigger Event:
push
-
Statement type: