Skip to main content

UnZip for non-UTF8 encoding such as cp949, sjis, gbk, euc-kr, euc-jp, and gb2312

Project description

UnZip for non-UTF8 encoding

Extract zip files that MBCS(multi-byte character set) encoded file names, such as ZIP files created in MS Windows, especially East Asian environment.

Major non-UTF8 encodings by languages:

  • Korean: cp949, euc-kr
  • Japanese: sjis (shift_jis), cp932, euc-jp
  • Chinese: gbk, gb18030, gb2312, cp936, hkscs, big5, cp950

Install

pip install unzipmbcs

CLI Usage

usage: unzipmbcs [-h] [-e ENCODING] cmd zipfile [target [target ...]]

unzip for non-UTF8 filenames in zip archive

positional arguments:
  cmd                   commands: l(list), x(extract)
  zipfile               .zip file to unzip
  target                file prefix to extract

optional arguments:
  -h, --help            show this help message and exit
  -e ENCODING, --encoding ENCODING
                        character encoding of filename in the .zip
  -p PASSWORD, --password PASSWORD
                        password for encrypted .zip

API

listZip(filename, encoding='utf-8')

Return the information of the files in zip archive filename with character encoding

extractZip(filename, encoding='utf-8', filters=None, password=None)

Extract files in zip archive filename on current directory. Assume that the file names in zip archive are encoded as encoding. Only the files prefixed the values of filters list are extracted if filters are provided. Use password on encrypted zip archive.

fixZipFilename(filename, enc)

Fix filename as UNICODE string which is originally encoded as enc. Works for both Python 2 and 3.

Motivation

The .ZIP format, PKZIP compression, have been widely used. Some valuable data are archived as .zip file. But, in non-ASCII, non-Western environment, it makes trouble due to filenames.

Since ZIP format was created too old (1993), there is no standard character encoding about the file name of zip archive entries. Most of zip file entries are encoded as legacy character encoding, local charset.

In modern UNICODE based environment or global data processing environment such as Linux, this makes inconvenience, less portability, mangled file names, fail to extract the file, and so on.

This module may mitigate the inconveniences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unzipmbcs-0.2.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

unzipmbcs-0.2.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file unzipmbcs-0.2.0.tar.gz.

File metadata

  • Download URL: unzipmbcs-0.2.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for unzipmbcs-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a5ab99716fa97aaa0ffad59178c72a2f9407fae7d487b4e4999cd9c2ef60bb4e
MD5 9aad029a93942e15ff1bdd19bb3e74e9
BLAKE2b-256 d46c46d6e2c0e8631746807b59b6215473513cabc53b29d445d9f30b91849afb

See more details on using hashes here.

File details

Details for the file unzipmbcs-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: unzipmbcs-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for unzipmbcs-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c2397a9e6f2a5c64a5e5d0d834959abea70a819b69ffcfd492f3b45edbe6f5d
MD5 0cff4426ec250277eca01acdc0b2e393
BLAKE2b-256 ce603bbaec325d75c3f25e81e85f277ff33b05a8b13bc72c2b59d604fbc0f08c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page