Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)
Project description
Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)
1.Installation
MMT is used in both server-side and inference-side situations:
- on the server side, the operator list is generated according to the specified operator space; the delay of a given model is predicted according to the operator delay table.
- On the inference side, test the operator delay according to the operator list to obtain the operator latency table.
The server side must install Pytorch
and MNN(C++)
at the same time,
and the inference side must install MNN(C++)
Note: Be sure to add the build
folder generated by compiling MNN
to the environment variable!
After configuring the above dependencies, install MMT
pip install mmn-meter
2.Start
2.1 Modify your models
For your custom model(layer), please override repr() with unique representation of the parameters, for example:
def __init__(self, ...)
self.name = "ResNetBasicBlock-%d-%d-%d-%d-" % (in_channels, out_channels, stride, kernel)
...
def __repr__(self):
return self.name
If the results returned by __repr__()
cannot be differentiated for the same type of operator input with different parameters,
it is very easy to cause running errors or measurement errors!
2.2 Export the operators
After the mmt=2.x
version, both
description file generation
and function generation
are supported.
2.2.1 Method 1: Write an operator description file
The parameters that determine the specific delay of an operator include (operator type, operator instantiation parameters, input shape). The specific operator space needs to be expressed in the following way:
resnet18:
ResNetBasicBlock:
in_channels: [64, 128, 256, 512]
out_channels: [64, 128, 256, 512]
stride: [1]
kernel: [3, 5, 7]
input_shape: [[1, 64, 112, 112], [1, 128, 56, 56], [1, 256, 28, 28], [1, 512, 14, 14]]
torch.nn:
Conv2d:
in_channels: [3]
out_channels: [64]
kernel_size: [7]
stride: [2]
padding: [3]
input_shape: [[1, 3, 224, 224]]
BatchNorm2d:
num_features: [64]
input_shape: [[1, 64, 112, 112]]
ReLU:
no_params: true
input_shape: [[1, 64, 112, 112]]
Refer to how to describe your operator
Then use the following command to create a list of operators and export the operators to mnn format.
from mmt.converter import generate_ops_list
generate_ops_list("ops.yaml", "/path/ops_folder")
ops.yaml
is the operator description file,
pathops_folder
is the directory where
the operator is saved, and the corresponding
meta.pkl
will be generated in this directory
to save the metadata information of the operator.
2.2.1 Method 2: Functional Generation
Highly similar to Mode 1, it is directly registered and generated by using the mmt.register
function, and supports multiple registration of operators of the same type to reduce redundant operators caused by unnecessary combinations (the disadvantage of Mode 1),
for example
from mmt import register
import torch.nn as nn
fp = "./mbv3_ops"
reg = lambda ops, **kwargs: register(ops, fp, **kwargs)
reg(nn.Linear,
in_features=[576, 1024],
out_features=[1024, 1000],
bias=[True],
input_shape=[[1, 576], [1, 1024]],
)
Method 1:
torch.nn:
Linear:
in_features: [576, 1024]
out_features: [1024, 1000]
bias: [True]
input_shape: [[1, 576], [1, 1024]]
The corresponding operator can be generated by directly running the written file. For more details, please refer to Example
2.3 Record operator delays on the deployment side, and build an operator latency table
from mmt.meter import meter_ops
meter_ops("./ops", times=100)
ops
is the folder where the operator and meta.pkl
are saved,
times
represents the number of repeated tests,
run the modified program, the delay of the operator
will be calculated, and the operator latency table will be
saved as .ops/meta_latency.pkl
. This file
specifically records the metadata and corresponding
latency of all operators.
2.4 Predicting model latency on the server side
from mmt.parser import predict_latency
...
model = ResNet18()
pred_latency = predict_latency(model, path, [1, 3, 224, 224], verbose=False)
path
is the path corresponding to meta_latency.pkl
.
Note that the shape of the input tensor must be
the same as the input_shape
set in the operator
description.
3 Test the prediction error of MMT
Specific reference MobileNetV3 test
Model | Num | err(%) | device |
---|---|---|---|
ResNet | 6561 | 2.6% | 40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz |
MobileNet | 200 | 4.3%* | 40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.