A Hyper-Automated Tuning System for Tensor Operators
Lorien: A Hyper-Automated Tuning System for Tensor Operators
Lorien is a system built on the top of TVM to massively explore/benchmark the best schedule configs of TOPI schedules.
Although TVM already has a TOPI (TVM Operator Inventory) with the implementations of algorithm and schedules for commonly used operators such as conv2d and dense, there is a challenge makes TOPI hard to be improved efficiently.
The best schedule of TOPI is stored in TopHub, which is a JSON file in GitHub. However, it has the following problems.
Storing all schedules in a single text file has low accessibility and scalability. Every time AutoTVM has to load an entire JSON file in order to find only one schedule config for a workload.
The coverage of workloads and platforms are insufficient in the current version. For example, the latest TopHub covers only 690 workloads for CUDA backend, including conv2, depthwise conv2d, and 5 GPU models.
Comparing to TVM that has several commits everyday, TopHub is not frequently updated. As a result, some schedule configs are out-of-date and cannot achieve good performance anymore.
Since it is impractical to use TVM CI to benchmark the performance for every pull request, we need a separate system to regularly benchmark and update the stored schedule configs.
Commandline Interface and Example Usages
The system has a complete CLI with hierarchical commands. All commands can also be
specified in a config file in YAML format, and use a prefix "@" to expand them.
See the following examples for CLI usages, and
configs/samples for example configurations.
Note the the complete description of each command can be retrieved by the help command:
python3 -m lorien <commands> -h
- Extract workloads from a Gluon CV model.
python3 -m lorien generate extract gcv --model alexnet --target llvm
- Extract workloads from a TF model.
python3 -m lorien generate extract tf --model ./mobilenet.pb --target llvm
- Extract workloads from a Gluon CV model and mutate them to generate new workloads.
python3 -m lorien generate mutate modelzoo rules.yaml --model alexnet --target llvm
- Tune workloads with RPC servers.
# tune.yaml rpc: llvm -mcpu=skylake-avx512: - localhost:18871 db: endpoint_url: http://localhost:10020 log-s3-bucket: saved-tuning-logs ntrial: 3000
python3 -m lorien tune @tune.yaml @gcv_workloads_llvm.yaml
Amazon DynamoDB (local or aws): DynamoDB is used for storing and maintain the tuned schedules. You can choose to either 1) launch a local version and specify endpoint URL (e.g.
--db "endpoint_url: http://<your IP>:8000"), 2) or launch an AWS service, configure AWS CLI in your machine, and specify the region name (e.g.,
--db "region_name: us-west-1") when invoking the tuning.
AWS S3 (optional): S3 is used to store the full tuning logs (JSON files generated by AutoTVM). This is an optional requirement, so if you did not specify
--log-s3-bucket bucket_name, then the full tuning logs will not be uploaded but only the best schedule config will be submitted to the DynamoDB.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.