Skip to main content

Async unzipping to prevent asyncio timeout errors and decrease the memory usage for bigger zip files

Project description

async-unzip

Asynchronous unzipping of big files with low memory usage in Python Helps with big zip files unpacking (memory usage + buffer_size could be changed). Also, prevents having Asyncio Timeout errors especially in case of many workers using same CPU cores.

Fully tested on Python 3.7 through 3.14.

By default the extractor schedules up to 4 concurrent workers. Tune concurrency via the max_workers argument:

asyncio.run(unzip('archive.zip', path='output', max_workers=8))

When uvloop is installed, the event loop policy switches automatically to leverage its faster reactor.

When python-isal or zlib-ng is installed, async-unzip automatically switches to their faster zlib-compatible decompressors; otherwise it falls back to the standard library zlib.

Benchmarks

Numbers below were captured on an Apple Silicon macOS Sonoma machine (ARM64). Each measurement extracts into a fresh temporary directory and averages three runs.

Synthetic archive (tests/test_files/fixture_gamma.zip, 23.7 MB)

Backend Workers Avg time (s) CPU avg / max (%) RAM avg / max (MB)
zlib 1 0.91 85.7 / 89.3 29.46 / 29.52
zlib 2 0.80 117.6 / 133.3 32.32 / 32.52
zlib 4 0.70 162.3 / 167.4 33.24 / 33.34
zlib-ng 1 1.00 81.7 / 87.7 29.44 / 29.59
zlib-ng 2 0.80 119.4 / 133.7 32.62 / 32.86
zlib-ng 4 0.82 134.4 / 168.3 33.59 / 33.70
python-isal 1 1.12 76.8 / 92.6 29.71 / 29.84
python-isal 2 0.91 112.4 / 132.0 33.01 / 33.11
python-isal 4 0.80 146.1 / 163.1 34.08 / 34.19

Real dataset (external ZIP, ≈1.10 GB)

Backend Workers Avg time (s) CPU avg / max (%) RAM avg / max (MB)
zlib 1 9.49 81.2 / 98.4 75.21 / 79.25
zlib 2 8.84 87.6 / 126.2 78.88 / 79.38
zlib 4 8.56 90.2 / 128.1 84.87 / 84.94
zlib-ng 1 13.35 73.0 / 96.7 37.95 / 38.95
zlib-ng 2 13.15 84.1 / 120.1 205.45 / 243.17
zlib-ng 4 12.12 92.4 / 121.7 218.62 / 244.89
python-isal 1 20.00 95.8 / 100.0 37.58 / 38.33
python-isal 2 21.76 96.2 / 110.5 202.98 / 244.09
python-isal 4 22.00 96.2 / 112.5 217.48 / 246.03

The large archive is not part of this repository; download any similarly sized ZIP manually if you want to reproduce the numbers.

Synchronous zipfile.ZipFile.extractall() (same 1.10 GB dataset)

Backend Avg seconds Samples (s)
zlib 14.42 14.58, 14.53, 14.16
zlib-ng 14.94 14.98, 14.99, 14.83
python-isal 14.04 13.92, 14.24, 13.94

zipfile is single-threaded, so concurrency does not apply in this scenario.

From version 0.3.6 module doesn't require, but expects to have aiofile OR aiofiles to be installed for I/O operations. However, aiofile is recommended for linux, just don't forget to install libaio (libaio1) linux module (e.g., apt install -y libaio1 for debian)

from async_unzip.unzipper import unzip
import asyncio

asyncio.run(unzip('tests/test_files/fixture_beta.zip', path='some_dir'))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

async_unzip-0.5.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

async_unzip-0.5.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file async_unzip-0.5.1.tar.gz.

File metadata

  • Download URL: async_unzip-0.5.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for async_unzip-0.5.1.tar.gz
Algorithm Hash digest
SHA256 2aef2d6559e6403f0d2b435798dabf7e50d4f501c24e4664b1da587dc4646b47
MD5 7d6e1ac8be353a18a1f159a3c11cb80b
BLAKE2b-256 7d3bd72680cfd3f3087760780b1405871b9b003d8d9dca0c7dc41037dd75a387

See more details on using hashes here.

File details

Details for the file async_unzip-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: async_unzip-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for async_unzip-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5c4d80069d3e2e50a4fe6bd178d2462aa444b9179b5d4cba8bb84a2182ab636
MD5 762ebd79c7cb31a9004273b40b99543f
BLAKE2b-256 f40c316b90ae8bcb0b2654d09ad8507e14bc0d674b31dc8fe9c6fe87602919ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page