mnn-meter

Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)

These details have not been verified by PyPI

Project links

Source

Project description

Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)

中文版

1.Installation

MMT is used in both server-side and inference-side situations:

on the server side, the operator list is generated according to the specified operator space; the delay of a given model is predicted according to the operator delay table.
On the inference side, test the operator delay according to the operator list to obtain the operator latency table.

The server side must install Pytorch and MNN(C++) at the same time, and the inference side must install MNN(C++)

Note: Be sure to add the build folder generated by compiling MNN to the environment variable!

After configuring the above dependencies, install MMT

pip install mmn-meter

2.Start

2.1 Modify your models

For your custom model(layer), please override repr() with unique representation of the parameters, for example:

    def __init__(self, ...)
     self.name = "ResNetBasicBlock-%d-%d-%d-%d-" % (in_channels, out_channels, stride, kernel)
    ...
    def __repr__(self):
        return self.name

If the results returned by __repr__() cannot be differentiated for the same type of operator input with different parameters, it is very easy to cause running errors or measurement errors!

See how to modify your model

2.2 Export the operators

After the mmt=2.x version, both description file generation and function generation are supported.

2.2.1 Method 1: Write an operator description file

The parameters that determine the specific delay of an operator include (operator type, operator instantiation parameters, input shape). The specific operator space needs to be expressed in the following way:

resnet18:
    ResNetBasicBlock:
        in_channels: [64, 128, 256, 512]
        out_channels: [64, 128, 256, 512]
        stride: [1]
        kernel: [3, 5, 7]
        input_shape: [[1, 64, 112, 112], [1, 128, 56, 56], [1, 256, 28, 28], [1, 512, 14, 14]]

torch.nn:
    Conv2d:
        in_channels: [3]
        out_channels: [64]
        kernel_size: [7]
        stride: [2]
        padding: [3]
        input_shape: [[1, 3, 224, 224]]

    BatchNorm2d:
        num_features: [64]
        input_shape: [[1, 64, 112, 112]]

    ReLU:
        no_params: true
        input_shape: [[1, 64, 112, 112]]

Refer to how to describe your operator

Then use the following command to create a list of operators and export the operators to mnn format.

from mmt.converter import generate_ops_list

generate_ops_list("ops.yaml", "/path/ops_folder")

ops.yaml is the operator description file, pathops_folder is the directory where the operator is saved, and the corresponding meta.pkl will be generated in this directory to save the metadata information of the operator.

2.2.1 Method 2: Functional Generation

Highly similar to Mode 1, it is directly registered and generated by using the mmt.register function, and supports multiple registration of operators of the same type to reduce redundant operators caused by unnecessary combinations (the disadvantage of Mode 1), for example

from mmt import register
import torch.nn as nn
fp = "./mbv3_ops"
reg = lambda ops, **kwargs: register(ops, fp, **kwargs)
reg(nn.Linear,
    in_features=[576, 1024],
    out_features=[1024, 1000],
    bias=[True],
    input_shape=[[1, 576], [1, 1024]],
    )

Method 1:

torch.nn:
    Linear:
        in_features: [576, 1024]
        out_features: [1024, 1000]
        bias: [True]
        input_shape: [[1, 576], [1, 1024]]

The corresponding operator can be generated by directly running the written file. For more details, please refer to Example

2.3 Record operator delays on the deployment side, and build an operator latency table

from mmt.meter import meter_ops

meter_ops("./ops", times=100)

ops is the folder where the operator and meta.pkl are saved, times represents the number of repeated tests, run the modified program, the delay of the operator will be calculated, and the operator latency table will be saved as .ops/meta_latency.pkl . This file specifically records the metadata and corresponding latency of all operators.

2.4 Predicting model latency on the server side

from mmt.parser import predict_latency

...
model = ResNet18()
pred_latency = predict_latency(model, path, [1, 3, 224, 224], verbose=False)

path is the path corresponding to meta_latency.pkl. Note that the shape of the input tensor must be the same as the input_shape set in the operator description.

3 Test the prediction error of MMT

Specific reference MobileNetV3 test

Model	Num	err(%)	device
ResNet	6561	2.6%	40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
MobileNet	200	4.3%*	40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz

Project details

These details have not been verified by PyPI

Project links

Source

Release history Release notifications | RSS feed

2.2.0

May 7, 2022

2.1.3

May 6, 2022

2.1.2

May 6, 2022

2.1.1

May 6, 2022

2.0.4

May 5, 2022

This version

2.0.3

May 5, 2022

2.0.2

May 5, 2022

2.0.1

May 5, 2022

2.0.0

May 5, 2022

1.0.2

May 4, 2022

1.0.1

May 4, 2022

1.0.0

May 3, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnn-meter-2.0.3.tar.gz (413.7 kB view hashes)

Uploaded May 5, 2022 Source

Hashes for mnn-meter-2.0.3.tar.gz

Hashes for mnn-meter-2.0.3.tar.gz
Algorithm	Hash digest
SHA256	`d43d8ac4b02eeb8e498bbcf89e698f4014aa6dda12b12138e7a438864f5240cf`
MD5	`2785f67800d4d8f28179eef33bf84f88`
BLAKE2b-256	`2b27562e2567cc059df96de8b644cc51dd0e20246eba5097176754547d7bc0ef`