Reverse Product Quantization (RPQ) of weights to reduce static memory usage.
Project description
RPQ-pytorch
Reverse Product Quantization (RPQ) of weights to reduce static memory usage.
Table of Contents
Introduction
Product quantization is a method for reducing the memory requirements for vector similarity search. It reduces the memory footprint by chunking the vectors into subvectors that are each compressed into a set of codebooks with 256 codes each. This allows us to have a set of codes that can be represented by uint8 indices instead of the full vector representation.
If we reverse this process, we can dynamically spawn a larger set of vectors from a much smaller set of codebooks containing sub-vectors and a set of randomized uint8 indices, rather than having to persistently hold a much larger set of vectors. This can be used during the forward pass to expand/compile the weight just-in-time in order to perform the operations on the input.
This creates a state for a model where the weights are "dormant" and expanded to their active state just before use. This plays very well with methods like gradient checkpointing (and inference, similarly) where we can unpack the weights again rather then storing them. In other words, the weights are part of the dynamic computational graph and can be forgotten/unpacked whenever they are needed.
However, this doesn't come for free, the indices inherit from a set of shared codebooks, so the larger the weights, the more likelihood that vectors generated will share sub-vectors. This can be prevented by increasing the number of codebooks, but requires more testing to see what the minimum number of codebooks required for each implementation should be.
For instance, in the Usage section we define an RPQOPT model(OPT variant w/ RPQ weights) where the number of codebooks is set to the number of heads. This is chosen abitrarily, but works well since the hidden_dim must be divisible by num_codebooks.
The effect of having a set of entangled vectors is unknown and would require rigorous testing with standard benchmarks for comparison. Intuitively, this would have a different outcome depending on the way the final weight structure is used. For a vector quantization module, it could be advantageous to have codes be entangled to avoid the issue of "dead codes" and increase codebook utilization.
Installation
pip install rpq-pytorch
Usage
Standalone Weights
A standalone module RPQWeight is available as an nn.Module wrapper that intializes a set of dynamic PQ weight and returns the expanded set of weight vectors.
from rpq.nn import RPQWeight
w = RPQWeight(num_codebooks=72, codebook_dim=128, num_vectors=9216)
print(w.codebooks.shape, w.indices.shape) # torch.Size([72, 256, 128]) torch.Size([72, 9216])
print(w().shape) # torch.Size([9216, 9216])
Layers
A set of common layers are re-implemented with quantized weights. It follows the same usage as torch.nn modules with an extra argument for the num_codebooks for each layer. For each layer, the out_features/num_embeddings must be divisible by the num_codebooks.
from rpq.nn import RPQLinear
layer = RPQLinear(in_features=1024, out_features=1024, num_codebooks=16)
x = torch.randn(1, 1, 1024) # (b, n, d)
y = layer(x) # (1, 1, 4096)
Layers implemented:
-
RPQLinear -
RPQEmbedding* -
RPQConv1d -
RPQConv2d -
RPQConvTranspose2d -
RPQConv1d -
RPQConvTranspose1d -
RPQConv3d -
RPQConvTranspose3d -
RPQBilinear
*Note: Embedding layers are a lookup table and therefore very fast, as such the operation to expand the weights for RPQEmbedding adds a lot of time to the operation especially for a small number of tokens (10s of $\mu s$ -> 10s of ms).
Models
Using the layer implementations, we can implement models via drop-in replacement of their static weight counterparts.
RPQViT (ViT Giant)
from vit_pytorch import ViT
from rpq.models.rpqvit import RPQViT
from rpq.utils import model_size
# vit_giant_patch14_336
model = ViT(
image_size=336,
patch_size=14,
num_classes=1000,
dim=1280,
depth=32,
heads=16,
mlp_dim=5120,
dropout=0.1,
emb_dropout=0.1
)
# rpqvit_giant_patch14_336
rpq_model = RPQViT(
image_size=336,
patch_size=14,
num_classes=1000,
dim=1280,
depth=32,
heads=16,
mlp_dim=5120,
dropout=0.1,
emb_dropout=0.1
)
model_size(model)
model_size(rpq_model)
model size: 2252.157MB
model size: 361.429MB
Approximately ~6x reduction in model size.
RPQOPT (opt-66b)
import torch
from transformers.models.opt.modeling_opt import OPTConfig
from transformers import GPT2Tokenizer
from rpq.models.rpqopt import RPQOPTModel
from rpq.utils import model_size
tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-66b")
conf = OPTConfig.from_pretrained("facebook/opt-66b")
rpq_model = RPQOPTModel(conf) # randomly initialized model
inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
with torch.no_grad():
outputs = rpq_model(**inputs)
model_size(rpq_model)
model size: 5885.707MB
This is an RPQOPT-66b initialized at float32 precision, a static weight version (standard OPT-66b) would be 264 GB in size. This amounts to approximately ~44x reduction in size.
Benchmarks
Due to the entanglement of the weight matrix arising as result of the inheritance from a shared set of codebooks, testing the RPQ model variants against the original methods would be important to characterize issues/tradeoffs with training stability, especially at scale. Those tests will be displayed in the table below:
| Model | Config | Model Size | Dataset | Validation Accuracy | Epochs |
|---|---|---|---|---|---|
| ViT | vit_base_patch16_224 | 330MB | MNIST | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | MINST | TBD | 90 |
| ViT | vit_base_patch16_224 | 330MB | CIFAR10 | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | CIFAR10 | TBD | 90 |
| ViT | vit_base_patch16_224 | 330MB | Imagenet | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | Imagenet | TBD | 90 |
TODO
- Implement
RPQConv1dlayer - Implement
RPQConv2dlayer - Implement
RPQConv3dlayer - Implement
RPQConvTranspose1dlayer - Implement
RPQConvTranspose2dlayer - Implement
RPQConvTranspose3dlayer - Implement
RPQBilinearlayer - Perform benchmarks with ViTs (ViT vs RPQViT)
- Perform benchmarks with LLMs (BERT, OPT, etc.,)
- Explore methods of conversion from pre-trained static weights to dynamic RPQ weights
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file RPQ-pytorch-0.0.34.tar.gz.
File metadata
- Download URL: RPQ-pytorch-0.0.34.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
260defa0703cfed7d1ca2d8156133cbd28ac2f0e166920b56a365781b5248acd
|
|
| MD5 |
4ce22a15271fa73c4f568df76b70b5fb
|
|
| BLAKE2b-256 |
33b00c09dafac1b3053bcb2a63fe06b296ff360e9e357851e3ceecab1e03d620
|
File details
Details for the file RPQ_pytorch-0.0.34-py3-none-any.whl.
File metadata
- Download URL: RPQ_pytorch-0.0.34-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eed345f0f8720ad5ffc6d1172cc0f5a9fb14cd8dfa8f800addc867300442c4be
|
|
| MD5 |
d23f4a1af7546b36f5aaf902ff9c3e74
|
|
| BLAKE2b-256 |
6e85b6df5ff1817534aef52fded55c11b7dd66256d2bae29c425404e7aaedbb6
|