torch-pruning

Structural Pruning for Model Acceleration.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Towards Any Structural Pruning

[中文README | README in Chinese]

Torch-Pruning (TP) is a versatile library for Structural Network Pruning with the following features:

General-purpose Pruning Toolkit: TP enables structural pruning for a wide range of neural networks, including Large Language Models (LLMs), Diffusion Models, Vision Transformers, Yolov7, yolov8, FasterRCNN, SSD, KeypointRCNN, MaskRCNN, ResNe(X)t, ConvNext, DenseNet, ConvNext, RegNet, FCN, DeepLab, etc. Different from torch.nn.utils.prune that zeroizes parameters through masking, Torch-Pruning deploys a (non-deep) graph algorithm called DepGraph to remove parameters and channels physically.
Reproducible Performance Benchmark and Prunability Benchmark: Currently, TP is able to prune approximately 81/85=95.3% of the models from Torchvision 0.13.1. Try this Colab Demo for quick start.

For more technical details, please refer to our CVPR'23 paper:

DepGraph: Towards Any Structural Pruning
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang

Update:

2023.05.20 :rocket: LLM-Pruner: On the Structural Pruning of Large Language Models [arXiv]
2023.05.19 Structural Pruning for Diffusion Models [arXiv]
2023.04.15 Pruning and Post-training for YOLOv7 / YOLOv8
2023.04.21 Join our Telegram or Wechat group for casual discussions:
- Telegram: https://t.me/+NwjbBDN2ao1lZjZl
- Wechat:

Please do not hesitate to open a discussion or issue if you encounter any problems with the library or the paper.

Features:

Structural pruning for CNNs, Transformers, Detectors, Language Models and Diffusion Models. Please refer to the Prunability Benchmark.
High-level pruners: MagnitudePruner, BNScalePruner, GroupNormPruner, RandomPruner, etc.
Importance Criteria: L-p Norm, Taylor, Random, BNScaling, etc.
Dependency Graph for dependency modeling.
Supported modules: Linear, (Transposed) Conv, Normalization, PReLU, Embedding, MultiheadAttention, nn.Parameters and customized modules.
Supported operators: split, concatenation, skip connection, flatten, reshape, view, all element-wise ops, etc.
Low-level pruning functions
Benchmarks and tutorials
A resource list for practical structrual pruning.

TODO List:

A strong baseline with bags of tricks from existing methods.
A benchmark for Torchvision compatibility (81/85=95.3%, :heavy_check_mark:) and timm compatibility.
Pruning from Scratch / at Initialization.
More high-level pruners like FisherPruner, GrowingReg, etc.
More Transformers like Vision Transformers (:heavy_check_mark:), Swin Transformers, PoolFormers.
Block/Layer/Depth Pruning
Pruning benchmarks for CIFAR, ImageNet and COCO.

Installation

Torch-Pruning is compatible with PyTorch 1.x and 2.x. PyTorch 1.12.1 is recommended!

pip install torch-pruning # v1.1.8

git clone https://github.com/VainF/Torch-Pruning.git

Quickstart

Here we provide a quick start for Torch-Pruning. More explained details can be found in tutorals

0. How It Works

In structural pruning, Group is the minimal removable unit within deep networks. Each group contains several interdependent layers that must be pruned simultaneously to maintain the integrity of the resulting structures. However, deep networks often present complex dependencies among layers, making structural pruning a challenging endeavor. This work addresses this challenge by offering an automated mechanism, DepGraph, for parameter grouping, which facilitates effortless pruning for a wide range of deep networks.

1. A Minimal Example

import torch
from torchvision.models import resnet18
import torch_pruning as tp

model = resnet18(pretrained=True).eval()

# 1. build dependency graph for resnet18
DG = tp.DependencyGraph().build_dependency(model, example_inputs=torch.randn(1,3,224,224))

# 2. Specify the to-be-pruned channels. Here we prune those channels indexed by [2, 6, 9].
group = DG.get_pruning_group( model.conv1, tp.prune_conv_out_channels, idxs=[2, 6, 9] )

# 3. prune all grouped layers that are coupled with model.conv1 (included).
if DG.check_pruning_group(group): # avoid full pruning, i.e., channels=0.
    group.prune()
    
# 4. Save & Load
model.zero_grad() # We don't want to store gradient information
torch.save(model, 'model.pth') # without .state_dict
model = torch.load('model.pth') # load the model object

The above example demonstrates the fundamental pruning pipeline using DepGraph. The target layer resnet.conv1 is coupled with several layers, which requires simultaneous removal in structural pruning. Let's print the group and observe how a pruning operation "triggers" other ones. In the following outputs, A => B means the pruning operation A triggers the pruning operation B. group[0] refers to the pruning root in DG.get_pruning_group.

--------------------------------
          Pruning Group
--------------------------------
[0] prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)) => prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)), idxs=[2, 6, 9] (Pruning Root)
[1] prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)) => prune_out_channels on bn1 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[2] prune_out_channels on bn1 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _ElementWiseOp_20(ReluBackward0), idxs=[2, 6, 9]
[3] prune_out_channels on _ElementWiseOp_20(ReluBackward0) => prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0), idxs=[2, 6, 9]
[4] prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0) => prune_out_channels on _ElementWiseOp_18(AddBackward0), idxs=[2, 6, 9]
[5] prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0) => prune_in_channels on layer1.0.conv1 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[6] prune_out_channels on _ElementWiseOp_18(AddBackward0) => prune_out_channels on layer1.0.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[7] prune_out_channels on _ElementWiseOp_18(AddBackward0) => prune_out_channels on _ElementWiseOp_17(ReluBackward0), idxs=[2, 6, 9]
[8] prune_out_channels on _ElementWiseOp_17(ReluBackward0) => prune_out_channels on _ElementWiseOp_16(AddBackward0), idxs=[2, 6, 9]
[9] prune_out_channels on _ElementWiseOp_17(ReluBackward0) => prune_in_channels on layer1.1.conv1 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[10] prune_out_channels on _ElementWiseOp_16(AddBackward0) => prune_out_channels on layer1.1.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[11] prune_out_channels on _ElementWiseOp_16(AddBackward0) => prune_out_channels on _ElementWiseOp_15(ReluBackward0), idxs=[2, 6, 9]
[12] prune_out_channels on _ElementWiseOp_15(ReluBackward0) => prune_in_channels on layer2.0.downsample.0 (Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)), idxs=[2, 6, 9]
[13] prune_out_channels on _ElementWiseOp_15(ReluBackward0) => prune_in_channels on layer2.0.conv1 (Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[14] prune_out_channels on layer1.1.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on layer1.1.conv2 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[15] prune_out_channels on layer1.0.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on layer1.0.conv2 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
--------------------------------

For more details about grouping, please refer to tutorials/2 - Exploring Dependency Groups

How to scan all groups (Advanced):

We can use DG.get_all_groups(ignored_layers, root_module_types) to scan all groups sequentially. Each group will begin with a layer that matches a type in the "root_module_types" parameter. Note that DG.get_all_groups is only responsible for grouping and does not have any knowledge or understanding of which parameters should be pruned. Therefore, it is necessary to specify the pruning idxs using group.prune(idxs=idxs).

for group in DG.get_all_groups(ignored_layers=[model.conv1], root_module_types=[nn.Conv2d, nn.Linear]):
    # handle groups in sequential order
    idxs = [2,4,6] # your pruning indices
    group.prune(idxs=idxs)
    print(group)

2. High-level Pruners

Leveraging the DependencyGraph, we developed several high-level pruners in this repository to facilitate effortless pruning. By specifying the desired channel sparsity, you can prune the entire model and fine-tune it using your own training code. For detailed information on this process, please refer to this tutorial, which shows how to implement a slimming pruner from scratch. Additionally, you can find more practical examples in benchmarks/main.py.

import torch
from torchvision.models import resnet18
import torch_pruning as tp

model = resnet18(pretrained=True)

# Importance criteria
example_inputs = torch.randn(1, 3, 224, 224)
imp = tp.importance.TaylorImportance()

ignored_layers = []
for m in model.modules():
    if isinstance(m, torch.nn.Linear) and m.out_features == 1000:
        ignored_layers.append(m) # DO NOT prune the final classifier!

iterative_steps = 5 # progressive pruning
pruner = tp.pruner.MagnitudePruner(
    model,
    example_inputs,
    importance=imp,
    iterative_steps=iterative_steps,
    ch_sparsity=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256}
    ignored_layers=ignored_layers,
)

base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs)
for i in range(iterative_steps):
    if isinstance(imp, tp.importance.TaylorImportance):
        # Taylor expansion requires gradients for importance estimation
        loss = model(example_inputs).sum() # a dummy loss for TaylorImportance
        loss.backward() # before pruner.step()
    pruner.step()
    macs, nparams = tp.utils.count_ops_and_params(model, example_inputs)
    # finetune your model here
    # finetune(model)
    # ...

Sparse Training

Some pruners like BNScalePruner and GroupNormPruner require sparse training before pruning. This can be easily achieved by inserting just one line of code pruner.regularize(model) in your training script. The pruner will update the gradient of trainable parameters.

for epoch in range(epochs):
    model.train()
    for i, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        out = model(data)
        loss = F.cross_entropy(out, target)
        loss.backward()
        pruner.regularize(model) # <== for sparse learning
        optimizer.step()

Interactive Pruning (Advanced)

All high-level pruners support interactive pruning. Use pruner.step(interactive=True) to get all groups and interactively prune them by calling group.prune(). This feature is useful if you want to control/monitor the pruning process.

for i in range(iterative_steps):
    for group in pruner.step(interactive=True): # Warning: groups must be handled sequentially. Do not keep them as a list.
        print(group) 
        # do whatever you like with the group 
        dep, idxs = group[0] # get the idxs
        target_module = dep.target.module # get the root module
        pruning_fn = dep.handler # get the pruning function
       
        # Don't forget to prune the group
        group.prune()
          
        # group.prune(idxs=[0, 2, 6]) # It is even possible to change the pruning behaviour with the idxs parameter
    macs, nparams = tp.utils.count_ops_and_params(model, example_inputs)
    # finetune your model here
    # finetune(model)
    # ...

Group-level Pruning

With DepGraph, it is easy to design some "group-level" criteria to estimate the importance of a whole group rather than a single layer. In Torch-pruning, all pruners work in the group level.

3. Save & Load

The following script saves the whole model object (structure+weights) as a 'model.pth'.

model.zero_grad() # We don't want to store gradient information
torch.save(model, 'model.pth') # without .state_dict
model = torch.load('model.pth') # load the pruned model

Experimental Features: Re-create pruned models from unpruned ones using tp.state_dict and tp.load_state_dict.

# save the pruned state_dict, which includes both pruned parameters and modified attributes
state_dict = tp.state_dict(pruned_model) # the pruned model, e.g., a resnet-18-half
torch.save(state_dict, 'pruned.pth')

# create a new model, e.g. resnet18
new_model = resnet18().eval()

# load the pruned state_dict into the unpruned model.
loaded_state_dict = torch.load('pruned.pth', map_location='cpu')
tp.load_state_dict(new_model, state_dict=loaded_state_dict)
print(new_model) # This will be a pruned model.

Refer to tests/test_serialization.py for an ViT example. In this example, we will prune the model and modify some attributes like model.hidden_dims.

4. Low-level Pruning Functions

While it is possible to manually prune your model using low-level functions, this approach can be quite laborious, as it requires careful management of the associated dependencies. As a result, we recommend utilizing the aforementioned high-level pruners to streamline the pruning process.

tp.prune_conv_out_channels( model.conv1, idxs=[2,6,9] )

# fix the broken dependencies manually
tp.prune_batchnorm_out_channels( model.bn1, idxs=[2,6,9] )
tp.prune_conv_in_channels( model.layer2[0].conv1, idxs=[2,6,9] )
...

The following pruning functions are available:

'prune_conv_out_channels',
'prune_conv_in_channels',
'prune_depthwise_conv_out_channels',
'prune_depthwise_conv_in_channels',
'prune_batchnorm_out_channels',
'prune_batchnorm_in_channels',
'prune_linear_out_channels',
'prune_linear_in_channels',
'prune_prelu_out_channels',
'prune_prelu_in_channels',
'prune_layernorm_out_channels',
'prune_layernorm_in_channels',
'prune_embedding_out_channels',
'prune_embedding_in_channels',
'prune_parameter_out_channels',
'prune_parameter_in_channels',
'prune_multihead_attention_out_channels',
'prune_multihead_attention_in_channels',
'prune_groupnorm_out_channels',
'prune_groupnorm_in_channels',
'prune_instancenorm_out_channels',
'prune_instancenorm_in_channels',

5. Customized Layers

Please refer to tests/test_customized_layer.py.

6. Benchmarks

Our results on {ResNet-56 / CIFAR-10 / 2.00x}

Method	Base (%)	Pruned (%)	$\Delta$ Acc (%)	Speed Up
NIPS [1]	-	-	-0.03	1.76x
Geometric [2]	93.59	93.26	-0.33	1.70x
Polar [3]	93.80	93.83	+0.03	1.88x
CP [4]	92.80	91.80	-1.00	2.00x
AMC [5]	92.80	91.90	-0.90	2.00x
HRank [6]	93.26	92.17	-0.09	2.00x
SFP [7]	93.59	93.36	+0.23	2.11x
ResRep [8]	93.71	93.71	+0.00	2.12x

Ours-L1	93.53	92.93	-0.60	2.12x
Ours-BN	93.53	93.29	-0.24	2.12x
Ours-Group	93.53	93.77	+0.38	2.13x

Please refer to benchmarks for more details.

7. Series of Works

LLM-Pruner: On the Structural Pruning of Large Language Models [Project] [arXiv]
Xinyin Ma, Gongfan Fang, Xinchao Wang

Structural Pruning for Diffusion Models [Project] [arxiv]
Gongfan Fang, Xinyin Ma, Xinchao Wang

Citation

@inproceedings{fang2023depgraph,
  title={Depgraph: Towards any structural pruning},
  author={Fang, Gongfan and Ma, Xinyin and Song, Mingli and Mi, Michael Bi and Wang, Xinchao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16091--16101},
  year={2023}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.3.7

Feb 21, 2024

1.3.6

Dec 29, 2023

1.3.5

Dec 15, 2023

1.3.4

Dec 14, 2023

1.3.3

Dec 12, 2023

1.3.2

Nov 9, 2023

1.3.1

Oct 16, 2023

1.3.0

Oct 15, 2023

1.2.5

Sep 6, 2023

1.2.4

Aug 28, 2023

1.2.3

Aug 23, 2023

1.2.2

Aug 14, 2023

1.2.1

Jul 26, 2023

1.2.0

Jul 21, 2023

1.1.9

Jun 26, 2023

This version

1.1.8

May 26, 2023

1.1.7

May 18, 2023

1.1.6

Apr 14, 2023

1.1.5

Apr 9, 2023

1.1.4

Apr 8, 2023

1.1.3

Apr 3, 2023

1.1.2

Apr 1, 2023

1.1.1

Mar 31, 2023

1.1.0

Mar 29, 2023

1.0.0

Jan 3, 2023

0.2.8

Jul 6, 2022

0.2.7

Jul 30, 2021

0.2.6

Jul 8, 2021

0.2.5

Jun 7, 2021

0.2.4

Mar 5, 2021

0.2.1

Jul 2, 2020

0.2.0

Jul 2, 2020

0.1.5

Mar 10, 2020

0.1.4

Jan 9, 2020

0.1.3

Dec 18, 2019

0.1.2

Dec 18, 2019

0.1.1

Dec 17, 2019

0.1.0

Dec 16, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch-pruning-1.1.8.tar.gz (49.5 kB view hashes)

Uploaded May 26, 2023 Source

Built Distribution

torch_pruning-1.1.8-py3-none-any.whl (39.6 kB view hashes)

Uploaded May 26, 2023 Python 3

Hashes for torch-pruning-1.1.8.tar.gz

Hashes for torch-pruning-1.1.8.tar.gz
Algorithm	Hash digest
SHA256	`52692e216b32db4f158b2d5edf4e74f91a646d351c7dce608fc1bbd9f9af7c17`
MD5	`c927ec1e794c7c4ab4744cf31fc4a2f7`
BLAKE2b-256	`14bc30e38693ab70dcae1673d69ac6a0c865d1ffed80bc8dda07a1bce4eafdca`

Hashes for torch_pruning-1.1.8-py3-none-any.whl

Hashes for torch_pruning-1.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1a180050dcb82f572d109163a066d0f9f6a0d88bffed4f0e2c424c698f9f1f5f`
MD5	`e7ac2cdd8ad5473a25e18d6f9a5bdc9c`
BLAKE2b-256	`dc338aeff75dd8030e07df47232a32f966db343389a61a1eea0a8cdbe3ff7224`