Skip to main content

Compress logs for LLM analysis (Rust-powered)

Project description

logzip (Rust)

PyPI version PyPI downloads Python 3.9+ License: MIT Rust

Compress logs before sending to LLM. Powered by Rust & PyO3.

raw log → [logzip compress] → compressed text → LLM (Claude Code / Cursor / API)

Before / After

Raw Log (Uvicorn):

INFO: 127.0.0.1:45678 - "GET /api/v1/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:45679 - "GET /api/v1/status HTTP/1.1" 200 OK
... (100 similar lines) ...

logzip output:

--- PREFIX ---
INFO: 127.0.0.1:
--- LEGEND ---
#0# = - "GET /api/v1/status HTTP/1.1" 200 OK
--- BODY ---
45678 #0#
45679 #0#
...

Typical savings: 52–58% on structured logs (systemd, uvicorn, docker).
Anomalies and unique lines stay uncompressed — visible at a glance in the BODY.

Why use logzip? (RAG & LLM)

When working with logs in LLMs (Claude, GPT, RAG systems), you face two problems:

  1. Context Limit: Logs are huge. A 10MB log is ~2.5M tokens.
  2. Noise: 90% of the log consists of repeating INFO and identical requests that drown out the real error.

logzip is well-suited for RAG pipelines: it compresses the context before sending it to the model, saving money on tokens and increasing answer accuracy by highlighting anomalies.


Performance (7.96 MB Log, ~2M tokens)

Benchmarked on a real 7.96 MB production log.

logzip modes

Mode CLI Time (ms) Size (KB) Saved (%) Output type
fast --quality fast ~200 ~4,900 ~40% text/LLM
balanced --quality balanced 404 3,928 52% text/LLM
recursive --quality balanced --bpe-passes 2 418 3,404 58% text/LLM
max --quality max 507 3,511 57% text/LLM

recursive (balanced + 2 BPE passes) beats max in both size and speed — recommended for production.

vs. binary compressors (for context)

Tool Time (ms) Size (KB) Saved (%) LLM-readable?
lz4 6 1,280 84% No
zstd (lvl 3) 14 819 90% No
zlib (lvl 6) 69 840 90% No
logzip (recursive) 418 3,404 58% Yes

Binary compressors produce opaque binary blobs — LLMs cannot read them. logzip trades ~30% size for fully human- and LLM-readable output.

Token estimation: 1 token ≈ 4 characters (rough estimate for English-like logs).

Economic Impact

┌──────────────────────────────────────────────────────────┐
│  logzip Savings (7.96 MB Production Log)                 │
├──────────────────────────────────────────────────────────┤
│  Raw Size:        8,151 KB  (~1,990,000 tokens)          │
│  After balanced:  3,928 KB  (~959,000 tokens,  -52%)     │
│  After recursive: 3,404 KB  (~831,000 tokens,  -58%)     │
├──────────────────────────────────────────────────────────┤
│  Cost Before:     $5.97                                  │
│  Cost After:      $2.49      (Claude 3.5 Sonnet Input)   │
│  LLM Efficiency:  2.4x larger context for the same price │
└──────────────────────────────────────────────────────────┘

Install

pip install logzip

CLI

# stdin → stdout (default mode)
logzip compress < app.log

# quality preset (fast|balanced|max)
logzip compress --quality balanced < app.log

# explicit BPE passes (overrides --quality default)
logzip compress --quality balanced --bpe-passes 3 < app.log

# with preamble (LLM decode instructions at the top)
logzip compress --preamble < app.log > compressed.txt

# save + show stats
logzip compress --stats -i app.log -o app.logzip

# explicit profile (otherwise auto-detected)
logzip compress --profile journalctl < /tmp/syslog.txt

# decompress
logzip decompress -i app.logzip

Python API

from logzip import compress, decompress

# compress
result = compress(raw_log_text)
print(result.render(with_preamble=True))   # → for LLM
print(result.stats_str())                  # → for logs

# fine-grained control
result = compress(
    raw_log_text,
    max_legend_entries=128,   # legend size
    bpe_passes=2,             # recursive BPE passes (1–3)
    do_normalize=True,        # collapse timestamps, ANSI, IPs
    do_templates=True,        # structural template extraction
)

# decompress
original = decompress(result.render())

Through the eyes of an LLM

Unlike gzip/zstd which produce binary noise, logzip produces structured text. The model can reliably interpret the legend and reconstruct repeated patterns, allowing it to analyze the log directly in compressed form.

Input for LLM:

This is a compressed log. Rules: #0# is replaced by GET /api/v1/status.

--- BODY --- 12:00:01 #0# 200 OK 12:00:02 #0# 500 ERR <-- Boom, anomaly!

The model instantly spots the 500 error without wading through thousands of identical successful requests.

Architecture & Safety

  1. Normalizer: Collapses ANSI, timestamps, IPs, and common prefixes.
  2. Frequency Analysis: Parallel n-gram counting using rayon.
  3. Greedy Legend: Optimized selection using a positional index (O(N)).
  4. Direct Replacement: Fast substitution without re-scanning.
  5. Recursive BPE: Second-pass compression on already-compressed text — finds repeated tag sequences for extra savings.
  6. Templates: Structural template extraction.

Safety First

  • Pure Rust: Core logic is 100% Rust.
  • Zero unsafe: The codebase contains no unsafe blocks, ensuring memory safety within the Python runtime.
  • Stress-tested: Handled multi-GB logs without memory leaks or crashes.

Reproducibility

Want to verify our benchmarks? Run the included script:

python benchmark.py

Roadmap / v2

  • MCP server for Claude Code
  • Suffix automaton for arbitrary repetition search
  • Streaming mode for massive files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logzip-1.1.0.tar.gz (30.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

logzip-1.1.0-cp39-abi3-win_amd64.whl (835.7 kB view details)

Uploaded CPython 3.9+Windows x86-64

logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (913.7 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.9+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file logzip-1.1.0.tar.gz.

File metadata

  • Download URL: logzip-1.1.0.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for logzip-1.1.0.tar.gz
Algorithm Hash digest
SHA256 16c8d2a20308af2c5dc9d58dc55310bccac092d6955f11b36d03faaef42141fa
MD5 c82b9d81006a57907061eca8c1a19480
BLAKE2b-256 25daa6af20f94fbbc361293d265c16dae193ddf9368276876fa5496af62d731a

See more details on using hashes here.

Provenance

The following attestation bundles were made for logzip-1.1.0.tar.gz:

Publisher: publish.yml on NailShakurov/logzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file logzip-1.1.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: logzip-1.1.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 835.7 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for logzip-1.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3c110f4d52e24d721c0233299b6c615801f2879f9ab40b15fc00f03d1b35851e
MD5 24eb69f2c726bf30f34fd9ad8b678e74
BLAKE2b-256 c55d44fd937bea5f010d05ccdbb7107b207ed0255e6848a126aba710f3de14dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for logzip-1.1.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on NailShakurov/logzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9b14abce7000b9654c51c3bca098c1d9d305d3643aee5ba2fb73215eb9cf1f94
MD5 74d80e1727e3c88227f4e28839a5e876
BLAKE2b-256 d1e1f26cf476d4cef516c09d926dd6ef7f245c6886f7b675b061bcb6dc59bab5

See more details on using hashes here.

Provenance

The following attestation bundles were made for logzip-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on NailShakurov/logzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 69c0a7a088ea885ea471cb97a32e7b385ca3a09af015fff618d3717f53220f07
MD5 855cacd4145b6255b7233858c26cae66
BLAKE2b-256 1891b4113ea07930ad77b9488d441683e487bc1302ae2c184842b4d66dee9472

See more details on using hashes here.

Provenance

The following attestation bundles were made for logzip-1.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish.yml on NailShakurov/logzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page