McQuic, a.k.a. Multi-codebook Quantizers for neural image compression

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

a.k.a. Multi-codebook Quantizers for neural image compression

🥳Our paper will be presented at CVPR 2022!🥳

CVF Open Access | arXiv | BibTex | Demo

McQuic is a deep image compressor.

Features:

Solid performance and super-fast coding speed (See Reference Models).
Cross-platform support (Linux-64, Windows-64 and macOS-64, macOS-arm64).

Techs:

The McQuic hold rich multi-codebooks to quantize visual features and restore images by these quantized features. Similar ideas are presented in SHA [1], VQ-VAE [2], VQ-GAN [3], etc. We summarize these as vectorized priors, and our method extends these ideas to a unified multivariate Gaussian mixture, to perform high-quality, low-latency image compression.

Vectorized prior Figure 1. Operational diagrams of different methods.

Figure 2. Comparisons with traditional codecs on an image from Kodak dataset.

Quick Start
Reference Models
Train a New Model
Implement MCQ by yourself
Contribute to this Repository
To-do List
Detailed framework
References and License

Quick Start

It is easy (with a GPU, or CPU if you like) to try our model. I would give a quick guide to help you compress an image and restore it.

Requirements

To run the model, your device needs to meet following requirements.

Hardware
- a CUDA-enabled GPU (≥ 8GiB VRAM, Driver version ≥ 450.80.02)
- If you don't have GPU, running models on CPU may be slower.
- ≥ 8GiB RAM
OS
- I've tested all features on Ubuntu, other platforms should also work. If not, please file bugs.

Conda (Recommended)

Install this package is very easy with a conda environment installed, e.g. Miniconda. I recommend you to install it to a new virtual environment directly by:

# Install a clean pytorch with CUDA support
conda create -n [ENV_NAME] pytorch torchvision cudatoolkit -c pytorch
# Install mcquic and other dependencies
conda install -n [ENV_NAME] mcquic -c xiaosu-zhu -c conda-forge
conda activate [ENV_NAME]

Above command install packages with CUDA support. If you just want to run it on CPU, please use cpuonly other than cudatoolkit in the first command.

Since there is no proper version of torchvision now for Apple M1, you need to change channel from pytorch to conda-forge in the first command.

Compress images

mcquic --help
mcquic -qp 3 path/to/an/image path/to/output.mcq

Decompress images

# `-qp` is not necessary. Since this arg is written to `output.mcq`.
mcquic path/to/output.mcq path/to/restored.png

Docker

I also build docker images for you to get away from environment issues.

Try with the latest docker image:

docker pull ghcr.io/xiaosu-zhu/mcquic:main

Install Manually (for dev)

This way enables your full access to this repo for modifying. Also, if you want to go on, a conda environment is needed, e.g. Miniconda.

Clone this repository

git clone https://github.com/xiaosu-zhu/McQuic.git && cd McQuic

Create a virtual env mcquic and install all packages by

./install.sh  # for POSIX with bash
.\install.ps1 # for Windows with Anaconda PowerShell

Now you should in the mcquic virtual environment. If not, please activate it by conda activate mcquic.

Compress images

mcquic --help
mcquic -qp 3 assets/sample.png assets/compressed.mcq

Decompress images

# `-qp` is not necessary. Since this arg is written to `output.mcq`.
mcquic assets/compressed.mcq assets/restored.png

And check outputs: assets/compressed.mcq and assets/restored.png.

(Optional) Install `NVIDIA/Apex`

NVIDIA/Apex is an additional package required for training. If you want to develop, contribute, or train a new model, please ensure you've installed NVIDIA/Apex by following snippets.

git clone https://github.com/NVIDIA/apex && cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you are using Docker images, this step is not necessary.

Please make sure you've installed it in the correct virtual environment.

For more information such as building toolchains, please refer to their repository.

Reference Models

I've released one pretrained model (Sorry, currently I don't have much free GPUs). You could fetch them by specifying -qp [Model_NO]. Following is the pretrained model list (Others TBA):

Model No.	Channel	M	K	Throughput (Encode/Decode)	Avg.BPP
-	-	-	-	-	-
3	128	2	[8192,2048,512]	25.45 Mpps / 22.03 Mpps	0.1277
-	-	-	-	-	-

The coding throughput is tested on a NVIDIA RTX 3090. Image file I/O, loading and other operations are not included in the test.

Mpps = Mega-pixels per second
BPP = Bits per pixel

Train a New Model

Please ensure you've installed NVIDIA/Apex. To train models, here are minimal and recommended system requirements.

Requirements

Minimal
- RAM ≥ 16GiB
- VRAM ≥ 12GiB
Recommended
- VRAM ≥ 24GiB
- Better if you have ≥4-way NVIDIA RTX 3090s or faster GPUs.

Configs

The folder configs provides example config example.yaml to train models. Please check specifications in configs/README.md.

Prepare a Dataset

Before training models, you need to prepare an image dataset. It is free to pick any images to form dataset, as long as the image-size is ≥512x512.

To build a training dataset, please put all images in a folder (allow for sub-folders), then run

mcquic dataset --help
# mcquic dataset [PATH_OF_YOUR_IMAGE_FOLDER] [PATH_OF_OUTPUT_DATASET]
mcquic dataset train_images mcquic_dataset

to build a lmdb dataset for mcquic to read.

Then, you could prepare a training config e.g. configs/train.yaml, and don't forget to speify dataset path.

# `configs/train.yaml`
...
trainSet: mcquic_dataset # path to the training dataset.
valSet: val_images # path to a folder of validation images.
savePath: saved # path to a folder to save checkpoints.
...

where trainSet and valSet can be any relative or absolute paths, and savePath is a folder for saving checkpoints and logs.

In this example, the final folder structure is shown below:

. # A nice folder
├─ 📂configs
│   ...
│   └── 📄train.yaml
├── 📄README.md # this readme
├── 📂saved # saved models apprear here
├── 📂train_images # a lot of training images
│   ├── 📂ImageNet
│   |   ├── 📂folder1 # a lot of images
│   |   ├── 🖼️image1.png
│   |   ...
│   ├── 📂COCO
│   |   ├── 🖼️image1.png
│   |   ├── 🖼️image2.png
│   |   ...
|   ...
├── 📂mcquic_dataset # generated training dataset
|   ├── 📀data.mdb
|   ├── 📀lock.mdb
|   └── 📄metadata.json
└── 📂val_images # a lot of validation images
    ├── 🖼️image1.png
    ├── 🖼️image2.png
    ...

Training

To train a new model, run

mcquic train --help
# mcquic train [PATH_TO_CONFIG]
mcquic train configs/train.yaml

and saved model is located in saved/mcquic_dataset/latest.

To resume an interuptted training, run

mcquic train -r

, or

mcquic train -r configs/train.yaml

if you want to use an updated config (e.g. tuned learning rate, modified hyper-parameters) to resume training.

Test

You could use any save checkpoints (usually located in above savePath) to validate the performance. For example

mcquic validate --help
mcquic validate path/to/a/checkpoint path/to/images/folder path/to/final/model

And the output "final model" is compatible with the main program mcquic, you could directly use this local model to perform compression. Try:

mcquic --local assets/sample.png assets/compressed.mcq
# `--local` is not necessary. Since this arg is written to `output.mcq`.
mcquic assets/compressed.mcq assets/restored.png

If you think your model is awesome, please don't hasitate to Contribute to this Repository!

Implement MCQ by yourself

A minimal implementation of the multi-codebook quantizer comes up with:

class Quantizer(nn.Module):
    """
    Quantizer with `m` sub-codebooks,
        `k` codewords for each, and
        `n` total channels.
    Args:
        m (int): Number of sub-codebooks.
        k (int): Number of codewords for each sub-codebook.
        n (int): Number of channels of latent variables.
    """
    def __init__(self, m: int, k: int, n: int):
        super().__init__()
        # A codebook, channel `n -> n // m`.
        self._codebook = nn.Parameter(torch.empty(m, k, n // m))
        self._initParameters()

    def forward(self, x: Tensor, t: float = 1.0) -> (Tensor, Tensor):
        """
        Module forward.
        Args:
            x (Tensor): Latent variable with shape [b, n, h, w].
            t (float, 1.0): Temperature for Gumbel softmax.
        Return:
            Tensor: Quantized latent with shape [b, n, h, w].
            Tensor: Binary codes with shape [b, m, h, w].
        """
        b, _, h, w = x.shape
        # [b, m, d, h, w]
        x = x.reshape(n, len(self._codebook), -1, h, w)
        # [b, m, 1, h, w], square of x
        x2 = (x ** 2).sum(2, keepdim=True)
        # [m, k, 1, 1], square of codebook
        c2 = (self._codebook ** 2).sum(-1, keepdim=True)[..., None]
        # [b, m, d, h, w] * [m, k, d] -sum-> [b, m, k, h, w], dot product between x and codebook
        inter = torch.einsum("bmdhw,mkd->bmkhw", x, self._codebook)
        # [b, m, k, h, w], pairwise L2-distance
        distance = x2 + c2 - 2 * inter
        # [b, m, k, h, w], distance as logits to sample
        sample = F.gumbel_softmax(distance, t, hard=True, dim=2)
        # [b, m, d, h, w], use sample to find codewords
        quantized = torch.einsum("bmkhw,mkd->bmdhw", sample, self._codebook)
        # back to [b, n, h, w]
        quantized = quantized.reshape(b, -1, h, w)
        # [b, n, h, w], [b, m, h, w], quantizeds and binaries
        return quantized, sample.argmax(2)

Contribute to this Repository

It will be very nice if you want to check your new ideas or add new functions 😊. You will need to install mcquic by Docker or manually (with optional step). Just like other git repos, before raising issues or pull requests, please take a thorough look at issue templates.

To-do List

Docker file and bash script
execute pattern: mcquic service
reference model: 1,2,4~4+
change compressor args

Detailed framework

Thanks for your attention!❤️ Here are details in the paper.

Following previous works, we build the compression model as an AutoEncoder. Bottleneck of encoder (analysis transform) outputs a small feature map and is quantized by multi-codebook vector-quantization other than scalar-quantization. Quantizers are cascaded to effectively estimate latent distribution.

Figure 3. Left: Overall framework. Right: Structure of a quantizer.

Right part of above figure shows detailed structure of our proposed quantizer.

References and License

References

[1] Agustsson, Eirikur, et al. "Soft-to-hard vector quantization for end-to-end learning compressible representations." NeurIPS 2017.

[2] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." NeurIPS 2017.

[3] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." CVPR 2021.

Citation

To cite our paper, please use following BibTex:

@inproceedings{McQuic,
  author    = {Xiaosu Zhu and
               Jingkuan Song and
               Lianli Gao and
               Feng Zheng and
               Heng Tao Shen},
  title     = {Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression},
  booktitle = {CVPR},
  % pages     = {????--????}
  year      = {2022}
}

Copyright

Fonts:

Source Sans Pro. © 2010, 2012 Adobe Systems Incorporated, SIL Open Font License.
Flash Rogers 3D. © 2007 Iconian Fonts, donationware.
Cambria Math. © 2017 Microsoft Corporation. All rights reserved.
Times New Roman. © 2017 The Monotype Corporation. All Rights Reserved.
Caramel and Vanilla. © 2017 FOUND MY FONT LTD. All Rights Reserved.

Pictures:

kodim24.png by Alfons Rudolph, Kodak Image Dataset.
assets/sample.png by Ales Krivec, CLIC Professional valid set.

Third-party repos:

Repos	License
PyTorch	BSD-style
Torchvision	BSD-3-Clause
Apex	BSD-3-Clause
tqdm	MPLv2.0, MIT
Tensorboard	Apache-2.0
rich	MIT
python-lmdb	OpenLDAP Version 2.8
PyYAML	MIT
marshmallow	MIT
click	BSD-3-Clause
vlutils	Apache-2.0
MessagePack	Apache-2.0
pybind11	BSD-style
CompressAI	BSD 3-Clause Clear
Taming-transformer	MIT
marshmallow-jsonschema	MIT
json-schema-for-humans	Apache-2.0
CyclicLR	MIT
batch-transforms	MIT
pytorch-msssim	MIT
Streamlit	Apache-2.0

This repo is licensed under

Apache License
Version 2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.40

May 27, 2022

0.1.39

Apr 13, 2022

0.1.38

Apr 12, 2022

0.1.37

Apr 12, 2022

0.1.36

Apr 12, 2022

0.1.35

Apr 12, 2022

0.1.34

Apr 12, 2022

0.1.33

Apr 12, 2022

0.1.32

Apr 10, 2022

0.1.31

Apr 10, 2022

0.1.30

Apr 10, 2022

0.1.29

Apr 10, 2022

0.1.28

Apr 9, 2022

0.1.27

Apr 6, 2022

0.1.26

Apr 3, 2022

0.1.25

Apr 2, 2022

0.1.24

Apr 1, 2022

0.1.23

Mar 31, 2022

0.1.22

Mar 31, 2022

0.1.21

Mar 30, 2022

0.1.20

Mar 30, 2022

0.1.19

Mar 30, 2022

0.1.18

Mar 30, 2022

0.1.17

Mar 30, 2022

0.1.16

Mar 30, 2022

0.1.15

Mar 30, 2022

0.1.14

Mar 30, 2022

0.1.13

Mar 25, 2022

0.1.12

Mar 25, 2022

0.1.11

Mar 25, 2022

0.1.10

Mar 25, 2022

0.1.9

Mar 24, 2022

0.1.8

Mar 24, 2022

0.1.7

Mar 23, 2022

0.1.6

Mar 22, 2022

0.1.5

Mar 22, 2022

0.1.4

Mar 19, 2022

0.1.3

Mar 18, 2022

0.1.2

Mar 18, 2022

This version

0.1.1

Mar 18, 2022

0.0.35

Mar 18, 2022

0.0.34

Mar 17, 2022

0.0.33

Mar 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mcquic-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197.3 kB view hashes)

Uploaded Mar 18, 2022 CPython 3.9 manylinux: glibc 2.17+ x86-64

Hashes for mcquic-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for mcquic-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`f361b903ca215c5bcf9ca16c3caa0152ada80c2ca2a74f41f5bc3185cef51ec1`
MD5	`c0eddedc57ae6e718dd00337f8eaec24`
BLAKE2b-256	`56de5ddd2ace1d457545c7482c44b5ed06c4548222208dda731fc40a6c983e08`

mcquic 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Quick Start

Requirements

Conda (Recommended)

Docker

Install Manually (for dev)

(Optional) Install NVIDIA/Apex

Reference Models

Train a New Model

Requirements

Configs

Prepare a Dataset

Training

Test

Implement MCQ by yourself

Contribute to this Repository

To-do List

Detailed framework

References and License

References

Citation

Copyright

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

(Optional) Install `NVIDIA/Apex`