Blazing fast, multi-threaded SIMD ZIP library
Project description
Hayazip
日本語 | English
🚀 Blazing Fast, Multi-Threaded SIMD ZIP Library for Rust & Python
hayazip is an ultra-fast ZIP archive library designed from the ground up to leverage modern hardware capabilities. It combines memory-mapped I/O, SIMD-accelerated compression and decompression (via libdeflater), and thread-pool-based parallelism (via rayon) to accelerate both ZIP extraction and ZIP creation.
Features
- Zero-Copy Parsers: Uses
memmap2to map the ZIP file directly into memory, skipping expensive kernel-to-user-space copies. - SIMD Optimized Compression and Decompression: Powered by
libdeflaterto leverage AVX2, AVX-512, or NEON depending on the architecture. - Multi-threaded ZIP Creation and Extraction: Uses
rayonto process independent files in parallel. - Hardware-accelerated CRC32: Validates integrity using hardware instructions through
crc32fast. - Low-footprint Archive Writing: Spools compressed members to temporary files instead of holding the full archive in memory.
- Path-safe Extraction: Normalizes entry separators and rejects traversal, absolute, and drive-prefixed output paths before writing starts.
- Archive Preflight: Validates every central-directory entry up front so callers can inspect safe output paths before extraction.
- Cross-platform Python Bindings: Built with PyO3 for easy, out-of-the-box integration in any Python environment.
Python Quick Start
Installation
You can install hayazip directly from PyPI with uv or pip. Prebuilt abi3 wheels are published for CPython 3.8+ on Linux, macOS, and Windows, and a source distribution is published as a fallback:
uv add hayazip
# or
pip install hayazip
Usage
Creating and extracting archives in Python is straightforward:
import hayazip
source_dir = "project_files"
archive_path = "project_files.zip"
output_dir = "extracted_files"
hayazip.create_zip(source_dir, archive_path)
hayazip.extract_zip(archive_path, output_dir)
print("Done!")
If you already have ZIP bytes in memory, you can preflight and extract them directly without a temporary file:
import hayazip
entries = hayazip.preflight_zip_bytes(pptx_bytes)
for entry in entries:
print(entry["path"], entry["compress_type"])
hayazip.extract_zip_bytes(pptx_bytes, "workdir/unpacked")
Rust Quick Start
Add hayazip to your Cargo.toml:
[dependencies]
hayazip = "0.3.0"
Usage
use hayazip::{create_zip, extract, extract_from_bytes, preflight};
fn main() {
let source_dir = "project_files";
let archive_path = "project_files.zip";
let output_dir = "extracted_files";
create_zip(source_dir, archive_path).expect("Archive creation failed");
if let Err(e) = extract(archive_path, output_dir) {
eprintln!("Extraction failed: {}", e);
} else {
println!("Extraction successful!");
}
let safe_entries = preflight(archive_path).expect("Preflight failed");
println!("{} entries validated", safe_entries.len());
let archive_bytes = std::fs::read(archive_path).expect("read failed");
extract_from_bytes(&archive_bytes, "extracted_from_bytes").expect("bytes extraction failed");
}
Extraction Safety
hayazip performs a metadata-only preflight before it creates files or directories. During that pass it:
- normalizes separator variants to forward-slash archive paths,
- rejects
.., absolute paths, and Windows drive prefixes, - detects duplicate or conflicting output paths such as
diranddir/file.txt, - validates that each entry's local header and compressed payload are structurally readable.
Use preflight / preflight_bytes in Rust or preflight_zip / preflight_zip_bytes in Python if you want the validated path list without extracting yet.
Compression Method Support
Current extraction support:
0(Stored/ no compression)8(Deflate)
Current archive creation support:
Storedfor directories, symlinks, empty files, and files where compression is not beneficialDeflatefor regular files when it reduces size
Currently unsupported for extraction and creation:
- any other ZIP compression method, including
Deflate64(9),BZIP2(12),LZMA(14),PPMd(98), andZstandard(93) - encrypted ZIP entries
Benchmarks
On modern CPUs, hayazip uses libdeflater for SIMD-accelerated DEFLATE and rayon for parallel file processing. Archive creation writes members with bounded worker parallelism and a temporary spool to keep memory usage predictable while still saturating multiple cores.
Current Scope
create_zip is the only public write API today. A lower-level metadata-preserving writer for explicit entry order, timestamps, compression method, and external attributes is not exposed yet.
Build from Source (Python)
To compile from source and install into your local Python environment:
pip install maturin
maturin develop --release
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hayazip-0.3.0.tar.gz.
File metadata
- Download URL: hayazip-0.3.0.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca28b3bc47e6a6dcd655bc0a25005f695218e13a2ae2fb9678b6a7584a59834e
|
|
| MD5 |
195a4c365a7678d194fff3953480b0a4
|
|
| BLAKE2b-256 |
3bb4a54a421122ac53ad540a31aea6212f1ebe169e07ea7b794ee38258d6f678
|
File details
Details for the file hayazip-0.3.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: hayazip-0.3.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 273.8 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e61e72055585ffba8cb2fa12bf02f870d3963e29b2c54a866d8523085725a27d
|
|
| MD5 |
1d4cbe84763c2a5780ecf38fe13d0ce9
|
|
| BLAKE2b-256 |
0dd15064a92afbffcc6ec319766f16bc5808f8af4a57ee3c39c55714e8a9e340
|
File details
Details for the file hayazip-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: hayazip-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 447.8 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88bacc1f41dfcd8a4d5ffc54d3a9ecf92e7a7a389d4cc7a5dd2077950c93f7f1
|
|
| MD5 |
652ab05682d73b132545d9c7558b9b94
|
|
| BLAKE2b-256 |
7370f9aecd6c8f5873e7013017ae807de39e7afae40ab6c801bcfe51aa2a4463
|
File details
Details for the file hayazip-0.3.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: hayazip-0.3.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 768.4 kB
- Tags: CPython 3.8+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3d2380566da3c8e44c149795dd7c6229072994922d0e324f40bc610dc793600
|
|
| MD5 |
80b81e02cb36652569652c8afb03f937
|
|
| BLAKE2b-256 |
b84e9aa2bd6d7174716ff2fb1f49e4c474dd51e6099bce8975009aaa683d5f56
|