Skip to main content

Compress and Decompress MSZIP data

Project description

pymszip

This library fills the very niche use-case where you have data compressed using CreateCompressor / Compress with the MSZIP algorithm, and want to decompress it without the Windows API (e.g. under Linux), or the other way around and you want to create compressed data that can be decompressed by the Windows API.

Installation

pip install pymszip

Alternatively, install directly from GitHub:

pip install git+https://github.com/frereit/pymszip

Usage

import pymszip

compressed = pymszip.compress(b"Hello, world!")
decompressed = pymszip.decompress(compressed)

print(decompressed)

Goals and non-goals

This repo aims to provide full compatibility with the Windows API. This means that:

  • Any data compressed using the Windows API can be decompressed by pymszip
  • Any data compressed using pymszip can be decompressed by the Windows API

If you find data where either of this isn't the case, please file an issue if you can!

However, this library does not aim to produce identical results to the Windows compression. This means data compressed using pymszip may yield different results than if it was compressed with the Windows API. This is to be expected because of slightly differing zlib parameters, but not an issue, as long as compatibility is preserved.

MSZIP format

The MSZIP compression format is a proprietary compression format developed by Microsoft, based on the zlib compression library.

Under the hood, MSZIP compressed data is prefixed with a 24 byte header, and an arbitrary number of compressed chunks following it.

The header consiss of 6 magic bytes (0a51e5c01800)[^magic], followed by 1 CRC byte, 1 byte to identify the algorithm (MSZIP / 02), followed by 8 bytes little-endian integer to specify the decompressed size of the data, and another 8 bytes little-endian integer to specify the decompressed size of the first chunk.

Each chunk is prefixed with a 4 byte little-endian integer to specify its size, and 2 magic bytes ("CK"), after which a zlib-compressed stream follows. The "size" of the chunk includes the zlib stream, and the 2 magic bytes, but not the size header itself.

To decompress data, each zlib-compressed stream is decompressed individually, however each chunk must be given all previously decompressed data as the zdict used during decompression.

To compress data, a similar process is followed. Experimentially, I came to the conclusion that the memLevel must be set to 9`, or Windows will not be able to decompress the compressed data again in some cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymszip-1.0.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymszip-1.0.0-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file pymszip-1.0.0.tar.gz.

File metadata

  • Download URL: pymszip-1.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.1

File hashes

Hashes for pymszip-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9dba50e6320b3bcddd6754f2f0e0bb0781f08d51f03048a66294ab85755724b0
MD5 c0154abb70c0d4700f6f34c77f1ae7f1
BLAKE2b-256 abbdab86fe6a808e74055728810b64d88077c8d08ce1ba1e0ff70b88789a5889

See more details on using hashes here.

File details

Details for the file pymszip-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pymszip-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.1

File hashes

Hashes for pymszip-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa906ec9e111eb255a8df5750317f076eec1547906c7654bc112ed4f3b89c4f2
MD5 613c07977af8d78bde93f104df5f987d
BLAKE2b-256 698b63d7ef085406cb075a753e8f30abc34da3368137630936b267f7cd22cd4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page