Skip to main content

Compress and Decompress MSZIP data

Project description

pymszip

This library fills the very niche use-case where you have data compressed using CreateCompressor / Compress with the MSZIP algorithm, and want to decompress it without the Windows API (e.g. under Linux), or the other way around and you want to create compressed data that can be decompressed by the Windows API.

Installation

pip install pymszip

Alternatively, install directly from GitHub:

pip install git+https://github.com/frereit/pymszip

Usage

import pymszip

compressed = pymszip.compress(b"Hello, world!")
decompressed = pymszip.decompress(compressed)

print(decompressed)

Goals and non-goals

This repo aims to provide full compatibility with the Windows API. This means that:

  • Any data compressed using the Windows API can be decompressed by pymszip
  • Any data compressed using pymszip can be decompressed by the Windows API

If you find data where either of this isn't the case, please file an issue if you can!

However, this library does not aim to produce identical results to the Windows compression. This means data compressed using pymszip may yield different results than if it was compressed with the Windows API. This is to be expected because of slightly differing zlib parameters, but not an issue, as long as compatibility is preserved.

MSZIP format

The MSZIP compression format is a proprietary compression format developed by Microsoft, based on the zlib compression library.

Under the hood, MSZIP compressed data is prefixed with a 24 byte header, and an arbitrary number of compressed chunks following it.

The header consiss of 6 magic bytes (0a51e5c01800)[^magic], followed by 1 CRC byte, 1 byte to identify the algorithm (MSZIP / 02), followed by 8 bytes little-endian integer to specify the decompressed size of the data, and another 8 bytes little-endian integer to specify the decompressed size of the first chunk.

Each chunk is prefixed with a 4 byte little-endian integer to specify its size, and 2 magic bytes ("CK"), after which a zlib-compressed stream follows. The "size" of the chunk includes the zlib stream, and the 2 magic bytes, but not the size header itself.

To decompress data, each zlib-compressed stream is decompressed individually, however each chunk must be given all previously decompressed data as the zdict used during decompression.

To compress data, a similar process is followed. Experimentially, I came to the conclusion that the memLevel must be set to 9`, or Windows will not be able to decompress the compressed data again in some cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymszip-1.0.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymszip-1.0.1-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file pymszip-1.0.1.tar.gz.

File metadata

  • Download URL: pymszip-1.0.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.1

File hashes

Hashes for pymszip-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c4b8ea7047933f08f05ce7aba3eddcf1a453bb570e40092390250a8e258f65e2
MD5 d2e7cc6d7c6281a29bd163238a8e5eb0
BLAKE2b-256 1a5a9bdb96479c587d9e42f970f456f72162e314d64398acee7499214f88ef8a

See more details on using hashes here.

File details

Details for the file pymszip-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pymszip-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.1

File hashes

Hashes for pymszip-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 843e807e6caa95690fadda8c426ce9c48c71313f6c3ee27dc303ff139579a54f
MD5 94d590968cbfcb29e221911636a082de
BLAKE2b-256 ec503db29f37126648c8e3c9a065148c4fc5eb698611c825d3190d48c600d30a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page