Skip to main content

A lossless and near-lossless compression method optimized for numbers/tensors in the Foundation Models environment

Project description

ZipNN

Introduction

ZipNN is a lossless and near-lossless compression method optimized for numbers/tensors in the Foundation Models environment. It adds Bit-Manipulation before and after vanilla compression/decompression methods.

  • Byte Grouping - grouping together similar bytes (group together the first byte from all parameters, then the second byte, etc.) - The default is ByteGroup of 4 (partitions to 4 groups)
  • Sign Bit - (On the way) - Move the sign bit since it hold high entropy.
  • Tunable Lossy Compression -This technique allows for incurring controlled inaccuracies to parameters, under the assumption that a lot of the entropy in model weights is actually redundant, i.e., noise saved to disk.
  • Delta - (On the way) - Compute the difference between the two inputs (for exmple: models)

For more details, please see our paper: Lossless and Near-Lossless Compression for Foundation Models

Currently, ZipNN compression methods are implemented on CPUs, and GPU implementations are on the way.

zipnn package

Zipnn is a tool designed to compress and decompress data in byte, file, and Torch tensor formats. This repository includes implementations for compressing data into byte or file formats and decompressing it back to byte, file, or Torch tensor formats. The Zipnn package implements support for several kinds of compression.

Flow Image

Installation

Install with pip:

pip install zipnn

For specific Compression methods other than ZSTD

  • For lz4 method: pip install lz4
  • For snappy method: pip install python-snappy

For compressing/decompressing PyTorch tensor:

pip install torch

Usage

Import zipnn

from zipnn import zipnn

Instance class:

zipnn = zipnn.ZipNN(method='zstd')

Compression:

compressed_data = zipnn.compress(example_string)

Decompression:

decompressed_data = zipnn.decompress(compressed_data)

Example

from zipnn import zipnn

example_string = b"Example string for compression"

# Initializing the ZipNN class with the default configuration
# for Byte->Byte compression and Byte->Byte decompression
zipnn = zipnn.ZipNN(method='zstd')

# Compress the byte string
compressed_data = zipnn.compress(example_string)

# Decompress the byte string back
decompressed_data = zipnn.decompress(compressed_data)

# Verify the result
print("Are the original and decompressed byte strings the same? ", example_string == decompressed_data)
>>> True

Configuration

The default configuration is ByteGrouping of 4 with vanilla ZSTD (running with 8 threads), and the input and outputs are "byte" For more advanced methods, please see the following option:

  • method: Compression method, Supporting zstd, lz4, snappy (default value = zstd).
  • delta_compressed_type: Type of delta compression if chosen (default value = None, supports byte and file).
  • bg_partitions: Number of partitions for Byte Grouping (default value = 4).
  • bg_compression_threshold: Compression threshold of Byte Grouping (default value = 0.99).
  • torch_dtype: If a non-torch compressed file is decompressed as torch, it's dtype should be noted (default value = None).
  • torch_shape: If a non-torch compressed file is decompressed as torch, it's shape should be noted (default value = None).
  • signbit_to_lsb: Flag for moving the sign bit to the lsb to have all the exponent byte together in FP32 and BF16, only supported with lossy compression (default value = False).
  • lossy_compressed_type: Type for lossy compression if wanted, supporting only integer (default value = None).
  • lossy_compressed_factor: Compression factor for lossy compression (default value = 27).
  • is_streaming: Streaming flag (default value = False, supports only file at the moment).
  • streaming_chunk_KB: Chunk size for streaming if is_streaming is True (default value = 1MB).
  • input_type: Supporting byte, torch, file (default value = byte, and in case of file, enter the file name).
  • input_file: Path to the input file (default value = byte, and in case of file, enter none).
  • compressed_ret_type: The Compression type, Supporting byte, file (default value = byte).
  • compressed_file: Path to the compressed file, if compress_ret_type is file.
  • decompressed_ret_type: The Decompression type, Supporting byte, torch, file (default value = byte).
  • decompressed_file: Path to the decompressed file.
  • zstd_level: Compression level for zstd (default value = 3).
  • zstd_threads: Number of threads to be used for zstd compression (default value = 8).
  • lz4_compression_level: Compression level for lz4 (default value = 0).

Validation test

Run tests for Byte/File input types, Byte/File compression types, Byte/File decompression types.

python3 -m unittest discover -s tests/ -p test_suit.py

Support and Questions

We are excited to hear your feedback!

For issues and feature requests, please open a GitHub issue.

Contributing

We welcome and value all contributions to the project!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipnn-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

zipnn-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file zipnn-0.1.0.tar.gz.

File metadata

  • Download URL: zipnn-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for zipnn-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c7d9979895ca180120bce81f69bc92e523e9d21c25991089542122733eb11ac8
MD5 9a6c0fe5a1d7b3c9ade24a4ff2ce6eac
BLAKE2b-256 6d21e4a7c8463ecfa607cf0e6f732e571f445957a058c90444faff516e007f3a

See more details on using hashes here.

File details

Details for the file zipnn-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zipnn-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for zipnn-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31f583c49f7846edf4d66f962f32bb5668310b3ba8067ab88ef223b127fbbec1
MD5 01c16a7f4103082d25571ad1b3761d9e
BLAKE2b-256 26f36a9f4a40e9ae060b94a179cc25886d6cd9946f5dc9b37c1b190dc54f9be2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page