Skip to main content

Compress and decompress to and from `.zee` files!

Project description

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zcode-0.0.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

zcode-0.0.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file zcode-0.0.1.tar.gz.

File metadata

  • Download URL: zcode-0.0.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for zcode-0.0.1.tar.gz
Algorithm Hash digest
SHA256 0e2c46f77642d7c8d529ae5b7b3c5ff74521c7e1443de75e189f9d14494a3265
MD5 a2b001fc3fac88195969e402627806f8
BLAKE2b-256 74f3329555cdc6415850104414d09a7fe4140d4851037b95249533d1154ae5b1

See more details on using hashes here.

File details

Details for the file zcode-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: zcode-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for zcode-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dfab63ba7a4cb2c25e9b5c3ad5a949c534dfea5fcdddc77a1e8e54c317ebcf4f
MD5 bfd610ec0a9ae89cf22e2d2296f4bb57
BLAKE2b-256 bbd248badaba31e53baef23a1a26d24b99c33aac5147320d51c77fe51b8aad86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page