Pure in-memory ZPAQ compression for Python (real pybind11 bindings, prebuilt wheels, no C++ toolchain needed to install).

These details have not been verified by PyPI

Project links

License
- Public Domain
Operating System
Programming Language
- C++
- Python :: 3
Topic
- System :: Archiving :: Compression

Project description

`zpaq`

Pure in-memory ZPAQ compression for Python — faster than the official zpaq CLI on x86_64, same algorithm, prebuilt wheels for every modern Python on Windows / Linux / macOS.

import zpaq

blob = zpaq.compress(b"hello world " * 1_000, level=3)   # bytes -> bytes
assert zpaq.decompress(blob) == b"hello world " * 1_000

Why this exists

Every other ZPAQ binding on PyPI either shells out to the zpaq executable (which forces temp files and a subprocess) or ships only an sdist (which forces every user to have a working C++ toolchain). This package is both:

A real pybind11 binding around libzpaq — the same C++ library the official zpaq CLI is built on top of — wrapping abstract Reader/Writer adapters that read from and write to bytes objects with no filesystem detour.
Distributed as prebuilt wheels for Windows, Linux, and macOS (including Apple Silicon) across Python 3.8 through 3.13. Installing it never compiles anything.

On Windows, the wheel statically links the C and C++ runtimes so users don't need any "Visual C++ Redistributable" installed — if Python runs, zpaq works.

Performance

libzpaq's reference compiler emits an interpreter for the per-byte context-mixing predictor at compression levels 3-5. The official zpaq.exe on x86_64 ships with that interpreter replaced by a JIT that translates the predictor bytecode into native machine code at archive-open time. This package's x86_64 wheels enable the same JIT path plus:

multi-threaded block compression via threads=N (the official CLI tops out at 2 cores by default)
skip-checksum-by-default (verify=False), since pure-data workflows rarely need the SHA-1 per block that zpaq.exe always computes
a libsais-backed suffix array constructor for level-3 BWT mode (Apache 2.0, several times faster than libzpaq's vendored libdivsufsort-lite)
a faster decompress path for archives produced by zpaq.compress (avoids the JIDAC-aware per-segment buffering)

Benchmarks below are at compression level 5 (the strongest), run on a Ryzen-class 12-core x86_64 box. CLI is the official zpaq.exe v7.15 invoked with -m5 (the speeds shown include its -t0 default of two worker threads). mem(t=N) is zpaq.compress(data, level=5, threads=N). Times in seconds; ratio is bytes-reduced over original.

40 KB text:

algo	compress	decompress	ratio %
`zpaq.exe -m5`	0.16 s	0.15 s	71.5 %
`zpaq.compress(t=1)`	0.11 s	0.11 s	73.4 %

1 MB text:

algo	compress	decompress	ratio %
`zpaq.exe -m5`	2.13 s	2.16 s	80.0 %
`zpaq.compress(t=1)`	1.97 s	2.03 s	80.1 %
`zpaq.compress(t=4)`	0.73 s	2.27 s	79.3 %
`zpaq.compress(t=12)`	0.49 s	2.31 s	77.6 %

10 MB text:

algo	compress	decompress	ratio %
`zpaq.exe -m5`	23.17 s	23.82 s	84.2 %
`zpaq.compress(t=1)`	21.07 s	22.23 s	84.2 %
`zpaq.compress(t=4)`	6.83 s	21.17 s	82.8 %
`zpaq.compress(t=12)`	3.91 s	21.37 s	81.2 %

125 MB text:

algo	compress	decompress	ratio %
`zpaq.exe -m5`	183.77 s	187.99 s	86.7 %
`zpaq.compress(t=4)`	101.40 s	296.47 s	85.8 %
`zpaq.compress(t=8)`	66.03 s	297.45 s	85.0 %
`zpaq.compress(t=12)`	55.42 s	289.96 s	84.5 %

Compress scales nearly linearly with thread count up to ~12 cores. Compression ratio drops slightly as threads increase (more block boundaries reduce per-block context size); the ratio for t=1 matches or beats the CLI on every workload.

Decompress is currently single-threaded for archives we produce — libzpaq's decompress API doesn't expose a per-block worker model, and parallelizing it cleanly is on the v0.2 roadmap. On small/medium files decompress is competitive with or faster than the CLI; on the 125 MB sample the CLI's threaded extract pulls ahead.

ARM / Apple Silicon wheels disable the x86-only JIT but still benefit from threading, libsais, and the fast decompress path.

API

zpaq.compress(
    data,                    # bytes-like
    level=5,                 # 0..5 (0=store, 5=strongest)
    threads=1,               # 1 (default) = single-thread.
                             # >1 = split input into N blocks compressed in parallel.
                             # 0  = auto-detect host CPU count.
                             # Inputs <64KB*threads are forced to single-thread.
    hints=False,             # If True, scan input for text/exe signatures and order-1
                             # redundancy, pass them to libzpaq via the method string.
                             # Slight overhead, helps ratio on some mixed/binary data.
    verify=False,            # If True, compute & embed SHA-1 per segment. zpaq.exe
                             # also writes these by default; turning them off makes
                             # both this package and zpaq.exe skip verification on
                             # extract, which is faster but won't catch corruption.
    method=None,             # Optional raw libzpaq method-string override (e.g. "x4,4,1"
                             # for custom predictor specs). Overrides level/hints when set.
) -> bytes

zpaq.decompress(
    data,                    # bytes-like ZPAQ stream
    verify=False,            # If True, recompute SHA-1 of each segment and compare to
                             # the one stored in the archive. Raises zpaq.Error on
                             # mismatch. Default off for speed.
) -> bytes

zpaq.Error                   # Raised on libzpaq failures (corrupt stream, bad header, etc.)

Both compress and decompress release the GIL while libzpaq runs, so zpaq plays well with threaded workloads.

Compatibility with the `zpaq` CLI

zpaq.compress() emits the same on-disk format libzpaq itself writes, and zpaq.decompress() understands archives produced by the zpaq a journaling archiver (it identifies the JIDAC index/hash/info segments, discards them, and strips each data segment's trailing fragment-size footer so the recovered bytes match the original file exactly).

Tested on ten varied real files (1 KB to 25 MB, text/image/csv/jar/png/jpg/svg/exe/binary, compression levels 1-5):

Direction	Result
`zpaq.compress` → `zpaq.decompress`	10 / 10 byte-exact
`zpaq.compress` → official `zpaq x` CLI	10 / 10 byte-exact
official `zpaq a` CLI → `zpaq.decompress`	10 / 10 byte-exact

import zpaq

# Pipe to the official CLI
with open("out.zpaq", "wb") as f:
    f.write(zpaq.compress(my_bytes, level=5))
# ...later, from any machine with the zpaq executable installed:
#   $ zpaq x out.zpaq

# Read an archive that someone else produced with `zpaq a`
with open("their.zpaq", "rb") as f:
    file_bytes = zpaq.decompress(f.read())

When zpaq.decompress is fed a multi-file archive it returns the concatenated bytes of every file in the order the CLI stored them. A future release will expose a per-segment iterator API so individual files can be addressed by name.

Future work

The current release leaves a few performance levers untouched; pull requests welcome:

Profile-guided optimization (PGO). Adding /GENPROFILE + /USEPROFILE to the MSVC build (and equivalents on gcc/clang) typically gains another 5-15%. Skipped here because cibuildwheel doesn't expose a clean two-stage build hook yet.
AVX2 SIMD. libzpaq's predictor inner loop is small and serial; adding hand-written SIMD would require a deeper rewrite than a one-pass speedup.
Per-segment archive API. zpaq.decompress currently returns the concatenated bytes of every segment in a multi-file zpaq a archive. A future iterator API would let callers address individual files by name.

License

This package is released under the same terms as the underlying libzpaq sources: public domain. See src/zpaq/vendor/COPYING.

The vendored libsais suffix array library is Apache 2.0 (Ilya Grebnov). See src/zpaq/vendor/LICENSE-libsais.

Not affiliated with Matt Mahoney. libzpaq was released into the public domain by its original author; this Python package wraps those sources and is an independent community project.

Project details

These details have not been verified by PyPI

Project links

License
- Public Domain
Operating System
Programming Language
- C++
- Python :: 3
Topic
- System :: Archiving :: Compression

Release history Release notifications | RSS feed

0.3.5

May 19, 2026

0.3.4

May 18, 2026

0.3.3

May 18, 2026

0.3.2

May 18, 2026

0.3.1

May 18, 2026

0.3.0

May 18, 2026

0.2.9

May 18, 2026

0.2.8

May 18, 2026

0.2.7

May 18, 2026

0.2.6

May 18, 2026

0.2.5

May 18, 2026

0.2.4

May 18, 2026

0.2.3

May 18, 2026

0.2.2

May 18, 2026

0.2.1

May 18, 2026

0.2.0

May 18, 2026

0.1.2

May 18, 2026

0.1.1

May 18, 2026

This version

0.1.0

May 18, 2026

0.0.4

May 16, 2026

0.0.3

May 16, 2026

0.0.2

May 16, 2026

0.0.1

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zpaq-0.1.0.tar.gz (146.4 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zpaq-0.1.0-cp311-cp311-win_amd64.whl (391.9 kB view details)

Uploaded May 18, 2026 CPython 3.11Windows x86-64

File details

Details for the file zpaq-0.1.0.tar.gz.

File metadata

Download URL: zpaq-0.1.0.tar.gz
Upload date: May 18, 2026
Size: 146.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for zpaq-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ccfc3c74cb55324c863a0969722ee4df30062905ce3f3362350453c9665f912f`
MD5	`8fb046f7ee0fafeef2b114b0734abf19`
BLAKE2b-256	`1c28803014c006c3d1e53d2683ad9add44c5aaf340e36625843b619966284e63`

See more details on using hashes here.

File details

Details for the file zpaq-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

Download URL: zpaq-0.1.0-cp311-cp311-win_amd64.whl
Upload date: May 18, 2026
Size: 391.9 kB
Tags: CPython 3.11, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for zpaq-0.1.0-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`d28c05b40b42036f0275b8f02b317b879309f9038f8e4e1e10bd7471099cb7e1`
MD5	`58d44fca717a0df543a3153a054d2670`
BLAKE2b-256	`79c75b750bbee5595096d579687bd10e72495c94b8d405ea174bd2abb1c4cb13`

See more details on using hashes here.

zpaq 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`zpaq`

Why this exists

Performance

API

Compatibility with the `zpaq` CLI

Future work

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

zpaq 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

zpaq

Why this exists

Performance

API

Compatibility with the zpaq CLI

Future work

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`zpaq`

Compatibility with the `zpaq` CLI