UnZip for non-UTF8 encoding such as cp949, sjis, gbk, euc-kr, euc-jp, and gb2312

These details have not been verified by PyPI

Project links

repository

Project description

UnZip for non-UTF8 encoding

Extract zip files that MBCS(multi-byte character set) encoded file names, such as ZIP files created in MS Windows, especially East Asian environment.

Major non-UTF8 encodings by languages:

Korean: cp949, euc-kr
Japanese: sjis (shift_jis), cp932, euc-jp
Chinese: gbk, gb18030, gb2312, cp936, hkscs, big5, cp950

Install

pip install unzipmbcs

CLI Usage

usage: unzipmbcs [-h] [-e ENCODING] cmd zipfile [target [target ...]]

unzip for non-UTF8 filenames in zip archive

positional arguments:
  cmd                   commands: l(list), x(extract)
  zipfile               .zip file to unzip
  target                file prefix to extract

optional arguments:
  -h, --help            show this help message and exit
  -e ENCODING, --encoding ENCODING
                        character encoding of filename in the .zip
  -p PASSWORD, --password PASSWORD
                        password for encrypted .zip

API

listZip(filename, encoding='utf-8')

Return the information of the files in zip archive filename with character encoding

extractZip(filename, encoding='utf-8', filters=None, password=None)

Extract files in zip archive filename on current directory. Assume that the file names in zip archive are encoded as encoding. Only the files prefixed the values of filters list are extracted if filters are provided. Use password on encrypted zip archive.

fixZipFilename(filename, enc)

Fix filename as UNICODE string which is originally encoded as enc. Works for both Python 2 and 3.

Motivation

The .ZIP format, PKZIP compression, have been widely used. Some valuable data are archived as .zip file. But, in non-ASCII, non-Western environment, it makes trouble due to filenames.

Since ZIP format was created too old (1993), there is no standard character encoding about the file name of zip archive entries. Most of zip file entries are encoded as legacy character encoding, local charset.

In modern UNICODE based environment or global data processing environment such as Linux, this makes inconvenience, less portability, mangled file names, fail to extract the file, and so on.

This module may mitigate the inconveniences.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

0.2.0

Jul 17, 2022

0.1.2

Jul 15, 2022

0.1.1

Oct 4, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unzipmbcs-0.2.0.tar.gz (5.2 kB view details)

Uploaded Jul 17, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unzipmbcs-0.2.0-py3-none-any.whl (5.2 kB view details)

Uploaded Jul 17, 2022 Python 3

File details

Details for the file unzipmbcs-0.2.0.tar.gz.

File metadata

Download URL: unzipmbcs-0.2.0.tar.gz
Upload date: Jul 17, 2022
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for unzipmbcs-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a5ab99716fa97aaa0ffad59178c72a2f9407fae7d487b4e4999cd9c2ef60bb4e`
MD5	`9aad029a93942e15ff1bdd19bb3e74e9`
BLAKE2b-256	`d46c46d6e2c0e8631746807b59b6215473513cabc53b29d445d9f30b91849afb`

See more details on using hashes here.

File details

Details for the file unzipmbcs-0.2.0-py3-none-any.whl.

File metadata

Download URL: unzipmbcs-0.2.0-py3-none-any.whl
Upload date: Jul 17, 2022
Size: 5.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for unzipmbcs-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c2397a9e6f2a5c64a5e5d0d834959abea70a819b69ffcfd492f3b45edbe6f5d`
MD5	`0cff4426ec250277eca01acdc0b2e393`
BLAKE2b-256	`ce603bbaec325d75c3f25e81e85f277ff33b05a8b13bc72c2b59d604fbc0f08c`

See more details on using hashes here.

unzipmbcs 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UnZip for non-UTF8 encoding

Install

CLI Usage

API

listZip(filename, encoding='utf-8')

extractZip(filename, encoding='utf-8', filters=None, password=None)

fixZipFilename(filename, enc)

Motivation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes