Skip to main content

Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)

Project description

img.png

Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)

中文版

1.Installation

MMT is used in both server-side and inference-side situations:

  • on the server side, the operator list is generated according to the specified operator space; the delay of a given model is predicted according to the operator delay table.
  • On the inference side, test the operator delay according to the operator list to obtain the operator latency table.

The server side must install Pytorch and MNN(C++) at the same time, and the inference side must install MNN(C++)

Note: Be sure to add the build folder generated by compiling MNN to the environment variable!

After configuring the above dependencies, install MMT

pip install mmn-meter

2.Start

2.1 Modify your models

For your custom model(layer), please override repr() with unique representation of the parameters, for example:

    def __init__(self, ...)
     self.name = "ResNetBasicBlock-%d-%d-%d-%d-" % (in_channels, out_channels, stride, kernel)
    ...
    def __repr__(self):
        return self.name

If the results returned by __repr__() cannot be differentiated for the same type of operator input with different parameters, it is very easy to cause running errors or measurement errors!

See how to modify your model

2.2 Write an operator description file

The parameters that determine the specific delay of an operator include (operator type, operator instantiation parameters, input shape). The specific operator space needs to be expressed in the following way:

resnet18:
    ResNetBasicBlock:
        in_channels: [64, 128, 256, 512]
        out_channels: [64, 128, 256, 512]
        stride: [1]
        kernel: [3, 5, 7]
        input_shape: [[1, 64, 112, 112], [1, 128, 56, 56], [1, 256, 28, 28], [1, 512, 14, 14]]

torch.nn:
    Conv2d:
        in_channels: [3]
        out_channels: [64]
        kernel_size: [7]
        stride: [2]
        padding: [3]
        input_shape: [[1, 3, 224, 224]]

    BatchNorm2d:
        num_features: [64]
        input_shape: [[1, 64, 112, 112]]

    ReLU:
        no_params: true
        input_shape: [[1, 64, 112, 112]]

Refer to how to describe your operator

2.3 Create a list of operators and export the operators to mnn format

from core.converter import generate_ops_list
generate_ops_list("ops.yaml", "/path/ops_folder")

ops.yaml is the operator description file, pathops_folder is the directory where the operator is saved, and the corresponding meta.pkl will be generated in this directory to save the metadata information of the operator.

2.4 Record operator delays on the deployment side, and build an operator latency table

from core.meter import meter_ops
meter_ops("./ops", times=100)

ops is the folder where the operator and meta.pkl are saved, times represents the number of repeated tests, run the modified program, the delay of the operator will be calculated, and the operator latency table will be saved as .ops/meta_latency.pkl . This file specifically records the metadata and corresponding latency of all operators.

2.5 Predicting model latency on the server side

from core.parser import predict_latency
...
model = ResNet18()
pred_latency = predict_latency(model, path, [1, 3, 224, 224], verbose=False)

path is the path corresponding to meta_latency.pkl. Note that the shape of the input tensor must be the same as the input_shape set in the operator description.

3 Test the prediction error of MMT

will come soon~

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnn-meter-1.0.0.tar.gz (6.3 kB view hashes)

Uploaded Source

Built Distribution

mnn_meter-1.0.0-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page