Skip to main content

Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint

Project description

icon for Sparsify: Sparsify

Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint

GitHub Documentation GitHub release Contributor Covenant

Overview

Sparsify is an easy-to-use UI tool that simplifies the deep learning model optimization process to rapidly achieve the best combination of size, speed, and accuracy. Sparsify sparsifies and benchmarks models informed by industry research insights for ML practitioners, including ML engineers and operators, who need to deploy performant deep learning models fast and at scale. Sparsify shows visual performance potential for your model, including a sliding scale between performance and recovery, ultimately speeding up the model sparsification process from weeks to minutes.

This repository contains the package to locally launch Sparsify where you can create projects to load and sparsify your deep learning models. At the end, you can export sparsification recipes to integrate with your training workflow.

Sparsification

Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model. Techniques for sparsification are all encompassing including everything from inducing sparsity using pruning and quantization to enabling naturally occurring sparsity using activation sparsity or winograd/FFT. When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics. For example, pruning plus quantization can give over 7x improvements in performance while recovering to nearly the same baseline accuracy.

The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches. Recipes encode the directions for how to sparsify a model into a simple, easily editable format.

  • Download a sparsification recipe and sparsified model from the SparseZoo.
  • Alternatively, create a recipe for your model using Sparsify.
  • Apply your recipe with only a few lines of code using SparseML.
  • Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the DeepSparse Engine.

Full Deep Sparse product flow:

Quick Tour

A console script entry point is installed with the package: sparsify. This enables easy interaction through your console/terminal.

Note, for some environments the console scripts cannot install properly. If this happens for your system and the sparsify command is not available, scripts/main.py may be used in its place. Documentation is provided in the script file.

To launch Sparsify locally, open up a console or terminal window and enter the following:

sparsify

The Sparsify server will begin running locally on the machine and can be accessed through a web browser. The default host:port Sparsify starts on is 0.0.0.0:5543. Therefore, after starting Sparsify with the default commands, you may enter the following into a web browser to begin using Sparsify: http://0.0.0.0:5543.

If you are running Sparsify on a separate server from where the web browser is located, then you will need to substitute in the proper IP address for that server in place of 0.0.0.0. Additionally, confirm that the networking rules on your server allow for access to port 5543.

After visiting http://0.0.0.0:5543 in a web browser, the home page for Sparsify will load if configured correctly:


A quick start flow is given below. For a more in-depth read, check out Sparsify documentation.

New Project

To begin sparsifying a model, a new project must be created. The New Project button is located in the lower right of Sparsify's home screen. After clicking, the create project popup will be displayed:


Sparsify only accepts ONNX model formats currently. To easily convert to ONNX from common ML frameworks, see the SparseML repository.

To begin creating a project use one of the following flows:

  • Upload your model file through the browser by clicking on Click to browse.
  • Download your model file through a public URL by filling in the field Remote Path or URL.
  • Move your model file from an accessible file location on the server by filling in the field Remote Path or URL.

Continue through the popup and fill in information as specified to finish creating the project.

Analyzing a Model

After model creation, sensitivity analysis for the model are shown under the Performance Profiles and Loss Profiles in the left navigation.

The profiles will show the effects that different types of algorithms and degrees of those algorithms have on both the models inference speed and the baseline loss.

Performance Profiles:


Loss Profiles:


Optimizing a Model

Click on the Optimization in the left navigation or the Start Optimizing button on the analyzing pages to begin sparsifying your model. After clicking, the sparsification creation popup will be displayed:


Fill in the information as required in the modal. Once completed, Sparsify's autoML algorithms will choose the best settings it can find for optimizing your model. The resulting recipe will be displayed along with estimated metrics for the optimized model. The recipe can then be further edited if desired:


Exporting a Recipe

Currently Sparsify is focused on training-aware methods; these allow much better loss recovery for a given target performance. A future release will enable the option of one-shot sparsification with limited to no retraining.

Given that the recipe is created with training-aware algorithms, it must be exported for inclusion in your original training pipeline using SparseML. SparseML enables this inclusion with only a few lines of code for most training workflows.

On the optimization page, click the Export button in the bottom right. This will open up the export popup:


Select the framework the model was originally trained in on the upper right of the popup. Once selected, either copy or download the recipe for use with SparseML. In addition, some sample code using SparseML is given to integrate the exported sparsification recipe.

Installation

This repository is tested on Python 3.6+, Linux/Debian systems, and Chrome 87+. It is recommended to install in a virtual environment to keep your system in order.

Install with pip using:

pip install sparsify

Then if you would like to explore any of the scripts, clone the repository and install any additional dependencies as required.

From the initial screen, click the "New Project button" so you can:

  1. Upload an ONNX file of your deep learning model to a new project
  2. Profile the model for the effects of sparsifying your model on loss and performance
  3. Create an automatic sparsification recipe and edit as desired
  4. Export the recipe and integrate into your current training flow

Projects are saved out locally on the left navigation bar of the initial screen for easy access. You can create a single or multiple projects for your analysis.

Resources and Learning More

Contributing

We appreciate contributions to the code, examples, and documentation as well as bug reports and feature requests! Learn how here.

Join the Community

For user help or questions about Sparsify, use our GitHub Discussions. Everyone is welcome!

You can get the latest news, webinar and event invites, research papers,and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please email us at learnmore@neuralmagic.com or fill out this form.

License

The project is licensed under the Apache License Version 2.0.

Release History

Official builds are hosted on PyPi

Additionally, more information can be found via GitHub Releases.

Citation

Find this project useful in your research or other communications? Please consider citing:

@InProceedings{
    pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research}, 
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}, 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}
@misc{
    singh2020woodfisher,
    title={WoodFisher: Efficient Second-Order Approximation for Neural Network Compression}, 
    author={Sidak Pal Singh and Dan Alistarh},
    year={2020},
    eprint={2004.14340},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for sparsify, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size sparsify-0.1.1-py3-none-any.whl (2.9 MB) File type Wheel Python version py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page