Skip to main content

Pure in-memory ZPAQ compression for Python (real pybind11 bindings, prebuilt wheels, no C++ toolchain needed to install).

Project description

zpaq

pypi Downloads github stars

Pure in-memory ZPAQ compression for Python. I made this because every other zpaq package on PyPI is just a wrapper around the zpaq executable, they shell out to the CLI as a subprocess, which means temp files for every operation, fork overhead, and the user needing zpaq.exe on their PATH in the first place. None of them are actual bindings. I wanted real bytes->bytes zpaq from Python that just works.

import zpaq

blob = zpaq.compress(b"hello world " * 1_000, level=3)
assert zpaq.decompress(blob) == b"hello world " * 1_000

It also ended up alot faster than the official zpaq.exe itself, up to 6.6× faster on decompress and 6.3× faster on compress at 10 MB, with the same compression ratio. Multi-threaded both directions, JIT-compiled predictor on x86_64, libsais for the suffix-array pass, AVX2 auto-vec compile flag. Default threads=0 auto-scales across all CPU cores. Pass threads=1 if you want the absolute best ratio:

blob = zpaq.compress(big_data, level=5, threads=1)   # max ratio, single block

For inputs with repeated content (logs, large text corpora, similar binaries, snapshots), pass dedup=True to get fragment-level deduplication, input gets split into ~64 KB content-defined chunks and identical chunks are stored once. Matches what the zpaq a CLI produces, so the output is fully zpaq x extractable:

blob = zpaq.compress(repetitive_data, level=5, dedup=True)   # JIDAC archive

Prebuilt wheels for Windows / Linux / macOS (including Apple Silicon) across Python 3.9 through 3.13. Installing it never compiles anything. On Windows the wheel statically links the C and C++ runtimes so there's no "Visual C++ Redistributable" requirement, if Python runs, zpaq works.

Performance

benchmark

Benchmarks vs the official zpaq.exe -m5 (Ryzen-class 12-core x86_64, level 5). Both compress and decompress are parallel block-wise; mem wins both directions at every size from ~1 MB up:

workload CLI comp best mem comp CLI decomp best mem decomp
40 KB text 0.14 s 0.13 s (1.1×) 0.14 s 0.12 s (1.2×)
1 MB text 2.28 s 0.58 s (3.9×) 2.23 s 0.58 s (3.8×)
10 MB text 24.1 s 3.83 s (6.3×) 25.1 s 3.83 s (6.6×)
100 MB text 252.5 s 74.1 s (3.4×) 250.7 s 73.2 s (3.4×)

Full breakdown by thread count below. CLI is the official zpaq.exe v7.15 invoked with -m5 (its speeds already include the -t0 default of two worker threads). mem(t=N) is zpaq.compress(data, level=5, threads=N). Times in seconds; ratio is bytes-reduced over original.

40 KB text:

algo compress decompress ratio %
zpaq.exe -m5 0.14 s 0.14 s 71.5 %
zpaq.compress(t=1) 0.13 s 0.13 s 73.5 %
zpaq.compress(t=0) 0.13 s 0.12 s 73.5 %

1 MB text:

algo compress decompress ratio %
zpaq.exe -m5 2.28 s 2.23 s 80.0 %
zpaq.compress(t=1) 2.08 s 2.12 s 80.1 %
zpaq.compress(t=4) 0.70 s 0.75 s 79.3 %
zpaq.compress(t=12) 0.58 s 0.58 s 77.6 %

10 MB text:

algo compress decompress ratio %
zpaq.exe -m5 24.14 s 25.13 s 84.2 %
zpaq.compress(t=1) 20.92 s 21.50 s 84.2 %
zpaq.compress(t=4) 6.72 s 6.92 s 82.8 %
zpaq.compress(t=12) 3.83 s 3.83 s 81.2 %

100 MB text:

algo compress decompress ratio %
zpaq.exe -m5 252.5 s 250.7 s 86.7 %
zpaq.compress(t=1) 324.2 s 85.9 s 85.0 %
zpaq.compress(t=0) (12 cores) 74.1 s 73.2 s 84.5 %
zpaq.compress(dedup=True) 325.4 s 120 s 85.06 %

Why this is faster than the official CLI

libzpaq's reference compiler ships an interpreter for the per-byte context-mixing predictor used at compression levels 3-5. The official zpaq.exe on x86_64 ships with that interpreter replaced by a JIT that translates the predictor bytecode into native machine code at archive-open time, that's where most of its speed comes from. This package's x86_64 wheels enable that same JIT path, plus a handful of additions the CLI doesn't have:

  • multi-threaded block compression via threads=N (the official CLI tops out at two cores by default)
  • multi-threaded block decompression, I scan the archive for ZPAQ locator-tag block boundaries up front, dispatch each block to a worker, and concatenate. libzpaq's decompress API is sequential, so this layer sits above it.
  • skip-checksum-by-default (verify=False); the SHA-1 per block that zpaq.exe always computes isn't free, and most "compress these bytes please" workflows don't need it
  • AVX2-enabled compile flags so the optimizer auto-vectorizes where it can (x86_64 wheels assume AVX2; CPUs from 2013+ are covered, anything older falls back to the sdist build)
  • libsais (Apache 2.0) for level-3 BWT suffix-array construction instead of the libdivsufsort-lite that libzpaq ships internally
  • a faster decompress path for archives produced by zpaq.compress that skips the JIDAC-aware per-segment buffering

Compress scales nearly linearly with thread count up to ~12 cores. Compression ratio drops slightly as threads increase (more block boundaries = less context per block); the ratio at t=1 matches or beats the CLI on every workload. For inputs with actual repetition dedup=True closes the remaining gap.

ARM / Apple Silicon wheels disable the x86-only JIT and AVX2 flags but still benefit from threading, libsais, and the fast decompress path.

API

zpaq.compress(
    data,                    # bytes-like
    level=5,                 # 0..5 (0=store, 5=strongest)
    threads=0,               # 0 (default) = auto-detect host CPU count, clamped
                             # by input size (64KB minimum chunk per worker).
                             # 1 = single-thread, deterministic, best ratio.
                             # N>1 = pin to exactly N workers.
    hints=False,             # If True, scan input for text/exe signatures + order-1
                             # redundancy, pass them to libzpaq via the method string.
                             # Slight overhead, can help ratio on mixed/binary data.
    verify=False,            # If True, compute & embed SHA-1 per segment.
                             # zpaq.exe also writes these by default; turning them off
                             # makes both us and zpaq.exe skip verification on extract.
    method=None,             # Optional raw libzpaq method-string override (e.g.
                             # "x4,4,1"). Overrides level/hints when set.
    dedup=False,             # If True, emit a JIDAC-format archive with fragment
                             # dedup. Output is `zpaq x`-extractable. Currently
                             # single-threaded encode; matches CLI ratio on repetitive
                             # inputs.
) -> bytes

zpaq.decompress(
    data,                    # bytes-like ZPAQ stream
    verify=False,            # If True, recompute SHA-1 of each segment and compare
                             # to the one in the archive. Raises zpaq.Error on
                             # mismatch.
    threads=0,               # 0 (default) = auto-detect, clamped by block count.
                             # 1 = single-thread.
) -> bytes

zpaq.Error                   # Raised on libzpaq failures (corrupt stream, bad header)

Both compress and decompress release the GIL while libzpaq runs, so this plays well with threaded workloads.

CLI interoperability

zpaq.compress() emits the same on-disk format libzpaq itself writes, and zpaq.decompress() understands archives produced by zpaq a (filters out the JIDAC index/hash/info segments, follows the file's fragment-ID list, etc.). Tested on ten varied real files (1 KB to 25 MB, text/image/csv/jar/png/jpg/svg/exe/binary, levels 1-5):

Direction Result
zpaq.compresszpaq.decompress 10 / 10 byte-exact
zpaq.compresszpaq x (official CLI) 10 / 10 byte-exact
zpaq a (official CLI) → zpaq.decompress 10 / 10 byte-exact
import zpaq

# Pipe to the official CLI
with open("out.zpaq", "wb") as f:
    f.write(zpaq.compress(my_bytes, level=5))
# ... later, from any machine with the zpaq executable installed:
#   $ zpaq x out.zpaq

# Read an archive someone else produced with `zpaq a`
with open("their.zpaq", "rb") as f:
    file_bytes = zpaq.decompress(f.read())

For multi-file zpaq a archives, zpaq.decompress currently returns the concatenated bytes of every file in storage order. A per-segment iterator API for addressing files by name is on the v0.3 list.

Future work

Plenty of levers I haven't pulled yet, PRs welcome:

  • PGO (profile-guided optimization). Adding /GENPROFILE + /USEPROFILE to the MSVC build (and the gcc/clang equivalent) usually adds another 5-15%. Skipped here because cibuildwheel doesn't expose a clean two-stage build hook yet.
  • AVX2 in the JIT predictor. AVX2 is on at the C++ compile-flag level. The JIT-emitted predictor already uses SSE2 SIMD (pmaddwd / paddd for the MIX dot-product, lines 4126-4170 in vendored libzpaq.cpp). Upgrading the JIT codegen to AVX2 256-bit ymm registers would handle 16 mixer inputs per iteration instead of 8, but real gain depends heavily on the per-method MIX m parameter, and the work is byte-level instruction re-encoding which is fragile. Skipped for now.
  • Parallel JIDAC encode. dedup=True is currently single-threaded; splitting the fragment-build pass across cores would speed up large dedup compresses.
  • Per-segment archive API. As mentioned above, let callers address individual files inside multi-file zpaq a archives by name.

License

Released under the same terms as the underlying libzpaq sources: public domain. See src/zpaq/vendor/COPYING.

The vendored libsais suffix-array library is Apache 2.0 (Ilya Grebnov). See src/zpaq/vendor/LICENSE-libsais.


Not affiliated with Matt Mahoney. libzpaq was released into the public domain by its original author; this Python package wraps those sources and is an independent community project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zpaq-0.3.3.tar.gz (155.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zpaq-0.3.3-cp311-cp311-win_amd64.whl (414.5 kB view details)

Uploaded CPython 3.11Windows x86-64

File details

Details for the file zpaq-0.3.3.tar.gz.

File metadata

  • Download URL: zpaq-0.3.3.tar.gz
  • Upload date:
  • Size: 155.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for zpaq-0.3.3.tar.gz
Algorithm Hash digest
SHA256 4493303881da5f02ef8896139d385867386edec5ac6a5714edb13bdf5fc5a522
MD5 fbacc6183a64ff845344f8a93d435c02
BLAKE2b-256 f4f5e4f123e779057a493ac9fcaf645e58a729508b28fff9fd9d1f8605637cfe

See more details on using hashes here.

File details

Details for the file zpaq-0.3.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: zpaq-0.3.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 414.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for zpaq-0.3.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5ed40eb9f18b6d5cbc510d272f37a5e8627de74c0b98402ddd8acd1eefdf05d1
MD5 3d79a1bb3d5a39cf2b85cb791cf3ccd2
BLAKE2b-256 82645939efd90ff19967fd96263cf9a72b80dc46535ea6133a54e1d4d28e68ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page