Skip to main content

A multi-threading implementation of Python gzip module

Project description

mgzip

A multi-threading implement of Python gzip module

Using a block indexed GZIP file format to enable compress and decompress in parallel. This implement use 'FEXTRA' to record the index of compressed member, which is defined in offical GZIP file format specification version 4.3, so it is fully compatible with normal GZIP implement.

This module is ~25X faster for compression and ~7X faster for decompression (limited by IO and Python implementation) with a 24 CPUs computer.

In theoretical, compression and decompression acceleration should be linear according to the CPU cores. In fact, the performance is limited by IO and program language implementation.

Usage

Use same method as gzip module

import mgzip

s = "a big string..."

## Use 8 threads to compress.
## None or 0 means using all CPUs (default)
## Compression block size is set to 200MB
with mgzip.open("test.txt.gz", "wt", thread=8, blocksize=2*10**8) as fw:
    fw.write(s)

with mgzip.open("test.txt.gz", "rt", thread=8) as fr:
    assert fr.read(len(s)) == s

Performance

Compression:

Compression Performance

Decompression:

Decompression Performance

Brenchmarked on a 24 cores, 48 threads server (Xeon(R) CPU E5-2650 v4 @ 2.20GHz) with 8.0GB FASTQ text file.

Using parameters thread=42 and blocksize=200000000

Warning

This package only replace the 'GzipFile' class and 'open', 'compress', 'decompress' functions of standard gzip module. It is not well tested for other class and function.

As the first release version, some features are not yet supported, such as seek() and tell(). Any contribution or improvement is appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mgzip-0.2.2.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mgzip-0.2.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file mgzip-0.2.2.tar.gz.

File metadata

  • Download URL: mgzip-0.2.2.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mgzip-0.2.2.tar.gz
Algorithm Hash digest
SHA256 726bc2a7023f9564d4bf9d9a50f4aed54ea57f22d5b472645c9dcc747f169377
MD5 1fefa0d9d981821718c928d9324dd867
BLAKE2b-256 cf4ff374eb74009570fd1bc2029f89c4db4e4c33aa3f9342adefaa1831ff157e

See more details on using hashes here.

File details

Details for the file mgzip-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: mgzip-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mgzip-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c42c924123def538c63c01230fee91d1bdf6aad5d6b89334daca581bf9af0ed3
MD5 daf3025bf697f822d97a9fe0c774da09
BLAKE2b-256 6c8a8d9daabb4c9984a9964c4bb244f21e35d73dd86148b4db03a921a6af1cf5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page