Skip to main content

Blazing fast, multi-threaded SIMD ZIP library

Project description

Hayazip

日本語 | English

🚀 Blazing Fast, Multi-Threaded SIMD ZIP Library for Rust & Python

hayazip is an ultra-fast ZIP archive library designed from the ground up to leverage modern hardware capabilities. It combines memory-mapped I/O, SIMD-accelerated compression and decompression (via libdeflater), and thread-pool-based parallelism (via rayon) to accelerate both ZIP extraction and ZIP creation.

Features

  • Zero-Copy Parsers: Uses memmap2 to map the ZIP file directly into memory, skipping expensive kernel-to-user-space copies.
  • SIMD Optimized Compression and Decompression: Powered by libdeflater to leverage AVX2, AVX-512, or NEON depending on the architecture.
  • Multi-threaded ZIP Creation and Extraction: Uses rayon to process independent files in parallel.
  • Hardware-accelerated CRC32: Validates integrity using hardware instructions through crc32fast.
  • Low-footprint Archive Writing: Spools compressed members to temporary files instead of holding the full archive in memory.
  • Path-safe Extraction: Normalizes entry separators and rejects traversal, absolute, and drive-prefixed output paths before writing starts.
  • Archive Preflight: Validates every central-directory entry up front so callers can inspect safe output paths before extraction.
  • Cross-platform Python Bindings: Built with PyO3 for easy, out-of-the-box integration in any Python environment.

Python Quick Start

Installation

You can install hayazip directly from PyPI with uv or pip. Prebuilt abi3 wheels are published for CPython 3.8+ on Linux, macOS, and Windows, and a source distribution is published as a fallback:

uv add hayazip
# or
pip install hayazip

Usage

Creating and extracting archives in Python is straightforward:

import hayazip

source_dir = "project_files"
archive_path = "project_files.zip"
output_dir = "extracted_files"

hayazip.create_zip(source_dir, archive_path)
hayazip.extract_zip(archive_path, output_dir)
print("Done!")

If you already have ZIP bytes in memory, you can preflight and extract them directly without a temporary file:

import hayazip

entries = hayazip.preflight_zip_bytes(pptx_bytes)
for entry in entries:
    print(entry["path"], entry["compress_type"])

hayazip.extract_zip_bytes(pptx_bytes, "workdir/unpacked")

Rust Quick Start

Add hayazip to your Cargo.toml:

[dependencies]
hayazip = "0.3.0"

Usage

use hayazip::{create_zip, extract, extract_from_bytes, preflight};

fn main() {
    let source_dir = "project_files";
    let archive_path = "project_files.zip";
    let output_dir = "extracted_files";

    create_zip(source_dir, archive_path).expect("Archive creation failed");

    if let Err(e) = extract(archive_path, output_dir) {
        eprintln!("Extraction failed: {}", e);
    } else {
        println!("Extraction successful!");
    }

    let safe_entries = preflight(archive_path).expect("Preflight failed");
    println!("{} entries validated", safe_entries.len());

    let archive_bytes = std::fs::read(archive_path).expect("read failed");
    extract_from_bytes(&archive_bytes, "extracted_from_bytes").expect("bytes extraction failed");
}

Extraction Safety

hayazip performs a metadata-only preflight before it creates files or directories. During that pass it:

  • normalizes separator variants to forward-slash archive paths,
  • rejects .., absolute paths, and Windows drive prefixes,
  • detects duplicate or conflicting output paths such as dir and dir/file.txt,
  • validates that each entry's local header and compressed payload are structurally readable.

Use preflight / preflight_bytes in Rust or preflight_zip / preflight_zip_bytes in Python if you want the validated path list without extracting yet.

Compression Method Support

Current extraction support:

  • 0 (Stored / no compression)
  • 8 (Deflate)

Current archive creation support:

  • Stored for directories, symlinks, empty files, and files where compression is not beneficial
  • Deflate for regular files when it reduces size

Currently unsupported for extraction and creation:

  • any other ZIP compression method, including Deflate64 (9), BZIP2 (12), LZMA (14), PPMd (98), and Zstandard (93)
  • encrypted ZIP entries

Benchmarks

On modern CPUs, hayazip uses libdeflater for SIMD-accelerated DEFLATE and rayon for parallel file processing. Archive creation writes members with bounded worker parallelism and a temporary spool to keep memory usage predictable while still saturating multiple cores.

Current Scope

create_zip is the only public write API today. A lower-level metadata-preserving writer for explicit entry order, timestamps, compression method, and external attributes is not exposed yet.

Build from Source (Python)

To compile from source and install into your local Python environment:

pip install maturin
maturin develop --release

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hayazip-0.3.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hayazip-0.3.0-cp38-abi3-win_amd64.whl (273.8 kB view details)

Uploaded CPython 3.8+Windows x86-64

hayazip-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

hayazip-0.3.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (768.4 kB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file hayazip-0.3.0.tar.gz.

File metadata

  • Download URL: hayazip-0.3.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for hayazip-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ca28b3bc47e6a6dcd655bc0a25005f695218e13a2ae2fb9678b6a7584a59834e
MD5 195a4c365a7678d194fff3953480b0a4
BLAKE2b-256 3bb4a54a421122ac53ad540a31aea6212f1ebe169e07ea7b794ee38258d6f678

See more details on using hashes here.

File details

Details for the file hayazip-0.3.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: hayazip-0.3.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 273.8 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for hayazip-0.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e61e72055585ffba8cb2fa12bf02f870d3963e29b2c54a866d8523085725a27d
MD5 1d4cbe84763c2a5780ecf38fe13d0ce9
BLAKE2b-256 0dd15064a92afbffcc6ec319766f16bc5808f8af4a57ee3c39c55714e8a9e340

See more details on using hashes here.

File details

Details for the file hayazip-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hayazip-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 88bacc1f41dfcd8a4d5ffc54d3a9ecf92e7a7a389d4cc7a5dd2077950c93f7f1
MD5 652ab05682d73b132545d9c7558b9b94
BLAKE2b-256 7370f9aecd6c8f5873e7013017ae807de39e7afae40ab6c801bcfe51aa2a4463

See more details on using hashes here.

File details

Details for the file hayazip-0.3.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for hayazip-0.3.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 a3d2380566da3c8e44c149795dd7c6229072994922d0e324f40bc610dc793600
MD5 80b81e02cb36652569652c8afb03f937
BLAKE2b-256 b84e9aa2bd6d7174716ff2fb1f49e4c474dd51e6099bce8975009aaa683d5f56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page