Skip to main content

SuperGradients

Project description



Easily train or fine-tune SOTA computer vision models with one open source training library Tweet

Fill our 4-question quick survey! We will raffle free SuperGradients swag between those who will participate -> Fill Survey


WebsiteWhy Use SG?User GuideDocsGetting Started NotebooksTransfer LearningPretrained ModelsCommunityLicenseDeci Platform

SuperGradients

Introduction

Welcome to SuperGradients, a free, open-source training library for PyTorch-based deep learning models. SuperGradients allows you to train or fine-tune SOTA pre-trained models for all the most commonly applied computer vision tasks with just one training library. We currently support object detection, image classification and semantic segmentation for videos and images.

Docs and full user guide

Why use SuperGradients?

Built-in SOTA Models

Easily load and fine-tune production-ready, pre-trained SOTA models that incorporate best practices and validated hyper-parameters for achieving best-in-class accuracy.

Easily Reproduce our Results

Why do all the grind work, if we already did it for you? leverage tested and proven recipes & code examples for a wide range of computer vision models generated by our team of deep learning experts. Easily configure your own or use plug & play hyperparameters for training, dataset, and architecture.

Production Readiness and Ease of Integration

All SuperGradients models’ are production ready in the sense that they are compatible with deployment tools such as TensorRT (Nvidia) and OpenVINO (Intel) and can be easily taken into production. With a few lines of code you can easily integrate the models into your codebase.

What's New

  • 【06/9/2022】 PP-LiteSeg - new pre-trained checkpoints for Cityscapes with SOTA mIoU scores (~1.5% above paper)🎯
  • 【07/08/2022】DDRNet23 - new pre-trained checkpoints and recipes for Cityscapes with SOTA mIoU scores (~1% above paper)🎯
  • 【27/07/2022】YOLOX models (object detection) - recipes and pre-trained checkpoints.
  • 【07/07/2022】SSD Lite MobileNet V2,V1 - Training recipes and pre-trained checkpoints on COCO - Tailored for edge devices! 📱
  • 【07/07/2022】 STDC - new pre-trained checkpoints and recipes for Cityscapes with super SOTA mIoU scores (~2.5% above paper)🎯
  • 【16/06/2022】 ResNet50 - new pre-trained checkpoint and recipe for ImageNet top-1 score of 81.9 💪
  • 【09/06/2022】 ViT models (Vision Transformer) - Training recipes and pre-trained checkpoints (ViT, BEiT).
  • 【09/06/2022】 Knowledge Distillation support - training module and notebook.
  • 【06/04/2022】 Integration with professional tools - Weights and Biases and DagsHub.
  • 【09/03/2022】 New quick start and transfer learning example notebooks for Semantic Segmentation.
  • 【07/02/2022】 We added RegSeg recipes and pre-trained models to our Semantic Segmentation models.
  • 【01/02/2022】 We added issue templates for feature requests and bug reporting.
  • 【20/01/2022】 STDC family - new recipes added with even higher mIoU💪

Check out SG full release notes.

Coming soon

  • PP-LiteSeg recipes for Cityscapes with SOTA mIoU scores (~1.5% above paper)🎯
  • Single class detectors (recipes, pre-trained checkpoints) for edge devices deployment.
  • Single class segmentation (recipes, pre-trained checkpoints) for edge devices deployment.
  • QAT capabilities (Quantization Aware Training).
  • Dali implementation.
  • Integration with more professional tools.
  • Improved pre-trained checkpoints and recipes (DDRNet, ResNet, RegSeg, etc.)

Table of Content

Getting Started

Start Training with Just 1 Command Line

The most simple and straightforward way to start training SOTA performance models with SuperGradients reproducible recipes. Just define your dataset path and where you want your checkpoints to be saved and you are good to go from your terminal!

python -m super_gradients.train_from_recipe --config-name=imagenet_regnetY architecture=regnetY800 dataset_interface.data_dir=<YOUR_Imagenet_LOCAL_PATH> ckpt_root_dir=<CHEKPOINT_DIRECTORY>

Quickly Load Pre-Trained Weights for Your Desired Model with SOTA Performance

Want to try our pre-trained models on your machine? Import SuperGradients, initialize your Trainer, and load your desired architecture and pre-trained weights from our SOTA model zoo

# The pretrained_weights argument will load a pre-trained architecture on the provided dataset
# This is an example of loading COCO-2017 pre-trained weights for a YOLOX Nano object detection model

import super_gradients
from super_gradients.training import Trainer

trainer = SgModel(experiment_name="yoloxn_coco_experiment",ckpt_root_dir=<CHECKPOINT_DIRECTORY>)
trainer.build_model(architecture="yolox_n", arch_params={"pretrained_weights": "coco", num_classes": 80})

Quick Start Notebook - Classification

Get started with our quick start notebook for image classification tasks on Google Colab for a quick and easy start using free GPU hardware.

Classification Quick Start in Google Colab Download notebook View source on GitHub


Quick Start Notebook - Semantic Segmentation

Get started with our quick start notebook for semantic segmentation tasks on Google Colab for a quick and easy start using free GPU hardware.

Segmentation Quick Start in Google Colab Download notebook View source on GitHub


Transfer Learning

Transfer Learning with SG Notebook - Semantic Segmentation

Learn more about SuperGradients transfer learning or fine tuning abilities with our Citiscapes pre-trained RegSeg48 fine tuning into a sub-dataset of Supervisely example notebook on Google Colab for an easy to use tutorial using free GPU hardware

Segmentation Transfer Learning in Google Colab Download notebook View source on GitHub


Knowledge Distillation Training

Knowledge Distillation Training Quick Start with SG Notebook - ResNet18 example

Knowledge Distillation is a training technique that uses a large model, teacher model, to improve the performance of a smaller model, the student model. Learn more about SuperGradients knowledge distillation training with our pre-trained BEiT base teacher model and Resnet18 student model on CIFAR10 example notebook on Google Colab for an easy to use tutorial using free GPU hardware

KD Training in Google Colab Download notebook View source on GitHub


Installation Methods

Prerequisites

General requirements
To train on nvidia GPUs

Quick Installation

Install stable version using PyPi

See in PyPi

pip install super-gradients

That's it !

Install using GitHub
pip install git+https://github.com/Deci-AI/super-gradients.git@stable

Computer Vision Models - Pretrained Checkpoints

Pretrained Classification PyTorch Checkpoints

Model Dataset Resolution Top-1 Top-5 Latency (HW)*T4 Latency (Production)**T4 Latency (HW)*Jetson Xavier NX Latency (Production)**Jetson Xavier NX Latency Cascade Lake
ViT base ImageNet21K 224x224 84.15 - 4.46ms 4.60ms - * - 57.22ms
ViT large ImageNet21K 224x224 85.64 - 12.81ms 13.19ms - * - 187.22ms
BEiT ImageNet21K 224x224 - - -ms -ms - * - -ms
EfficientNet B0 ImageNet 224x224 77.62 93.49 0.93ms 1.38ms - * - 3.44ms
RegNet Y200 ImageNet 224x224 70.88 89.35 0.63ms 1.08ms 2.16ms 2.47ms 2.06ms
RegNet Y400 ImageNet 224x224 74.74 91.46 0.80ms 1.25ms 2.62ms 2.91ms 2.87ms
RegNet Y600 ImageNet 224x224 76.18 92.34 0.77ms 1.22ms 2.64ms 2.93ms 2.39ms
RegNet Y800 ImageNet 224x224 77.07 93.26 0.74ms 1.19ms 2.77ms 3.04ms 2.81ms
ResNet 18 ImageNet 224x224 70.6 89.64 0.52ms 0.95ms 2.01ms 2.30ms 4.56ms
ResNet 34 ImageNet 224x224 74.13 91.7 0.92ms 1.34ms 3.57ms 3.87ms 7.64ms
ResNet 50 ImageNet 224x224 81.91 93.0 1.03ms 1.44ms 4.78ms 5.10ms 9.25ms
MobileNet V3_large-150 epochs ImageNet 224x224 73.79 91.54 0.67ms 1.11ms 2.42ms 2.71ms 1.76ms
MobileNet V3_large-300 epochs ImageNet 224x224 74.52 91.92 0.67ms 1.11ms 2.42ms 2.71ms 1.76ms
MobileNet V3_small ImageNet 224x224 67.45 87.47 0.55ms 0.96ms 2.01ms * 2.35ms 1.06ms
MobileNet V2_w1 ImageNet 224x224 73.08 91.1 0.46 ms 0.89ms 1.65ms * 1.90ms 1.56ms

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)
  • Latency (Production)** - Production Performance (including IO)
  • Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
  • Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Object Detection PyTorch Checkpoints

Model Dataset Resolution mAPval
0.5:0.95
Latency (HW)*T4 Latency (Production)**T4 Latency (HW)*Jetson Xavier NX Latency (Production)**Jetson Xavier NX Latency Cascade Lake
SSD lite MobileNet v2 COCO 320x320 21.5 0.77ms 1.40ms 5.28ms 6.44ms 4.13ms
SSD lite MobileNet v1 COCO 320x320 24.3 1.55ms 2.84ms 8.07ms 9.14ms 22.76ms
YOLOX nano COCO 640x640 26.77 2.47ms 4.09ms 11.49ms 12.97ms -
YOLOX tiny COCO 640x640 37.18 3.16ms 4.61ms 15.23ms 19.24ms -
YOLOX small COCO 640x640 40.47 3.58ms 4.94ms 18.88ms 22.48ms -
YOLOX medium COCO 640x640 46.4 6.40ms 7.65ms 39.22ms 44.5ms -
YOLOX large COCO 640x640 49.25 10.07ms 11.12ms 68.73ms 77.01ms -

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)
  • Latency (Production)** - Production Performance (including IO)
  • Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
  • Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Semantic Segmentation PyTorch Checkpoints

Model Dataset Resolution mIoU Latency b1T4 Latency b1T4 including IO
PP-LiteSeg B50 Cityscapes 512x1024 76.48 4.18ms 31.22ms
PP-LiteSeg B75 Cityscapes 768x1536 78.52 6.84ms 33.69ms
PP-LiteSeg T50 Cityscapes 512x1024 74.92 3.26ms 30.33ms
PP-LiteSeg T75 Cityscapes 768x1536 77.56 5.20ms 32.28ms
DDRNet 23 slim Cityscapes 1024x2048 78.01 5.74ms 32.01ms
DDRNet 23 Cityscapes 1024x2048 80.26 12.74ms 39.01ms
STDC 1-Seg50 Cityscapes 512x1024 75.11 3.34ms 30.12ms
STDC 1-Seg75 Cityscapes 768x1536 77.8 5.53ms 32.490ms
STDC 2-Seg50 Cityscapes 512x1024 76.44 4.12ms 30.94ms
STDC 2-Seg75 Cityscapes 768x1536 78.93 6.95ms 33.89ms
RegSeg (exp48) Cityscapes 1024x2048 78.15 12.03ms 38.91ms
Larger RegSeg (exp53) Cityscapes 1024x2048 79.2 22.00ms 48.96ms

NOTE: Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO

NOTE: For resolutions below 1024x2048 we first resize the input to the inference resolution and then resize the predictions to 1024x2048. The time of resizing is included in the measurements so that the practical input-size is 1024x2048.

Implemented Model Architectures

Image Classification

Object Detection

Semantic Segmentation

Training Recipes

We defined recipes to ensure that anyone can reproduce our results in the most simple way.

Setup

To run recipes you first need to clone the super-gradients repository:

git clone https://github.com/Deci-AI/super-gradients

You then need to move to the root of the clone project (where you find "requirements.txt" and "setup.py") and install super-gradients:

pip install -e .

Finally, append super-gradients to the python path: (Replace "YOUR-LOCAL-PATH" with the path to the downloaded repo)

export PYTHONPATH=$PYTHONPATH:<YOUR-LOCAL-PATH>/super-gradients/

How to run a recipe

The recipes are defined in .yaml format and we use the hydra library to allow you to easily customize the parameters. The basic basic syntax is as follow:

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=<CONFIG-NAME> dataset_params.data_dir=<PATH-TO-DATASET>

But in most cases you will want to train on multiple GPU's using this syntax:

python -m torch.distributed.launch --nproc_per_node=<N-NODES> src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=<CONFIG-NAME> dataset_params.data_dir=<PATH-TO-DATASET>

Note: this script needs to be launched from the root folder of super_gradients Note: if you stored your dataset in the path specified by the recipe you can drop "dataset_params.data_dir=".

Explore our recipes

You can find all of our recipes here. You will find information about the performance of a recipe as well as the command to execute it in the header of its config file.

Example: Training of YoloX Small on Coco 2017, using 8 GPU

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_s dataset_params.data_dir=/home/coco2017

List of commands

All the commands to launch the recipes described here are listed below. Please make to "dataset_params.data_dir=" if you did not store the dataset in the path specified by the recipe (as showed in the example above).

- Classification

Cifar10

resnet:

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cifar10_resnet +experiment_name=cifar10
ImageNet

efficientnet

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_efficientnet

mobilenetv2

python -m torch.distributed.launch --nproc_per_node=2 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_mobilenetv2

mobilenetv3 small

python -m torch.distributed.launch --nproc_per_node=2 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_mobilenetv3_small

mobilenetv3 large

python -m torch.distributed.launch --nproc_per_node=2 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_mobilenetv3_large

regnetY200

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_regnetY architecture=regnetY200

regnetY400

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_regnetY architecture=regnetY400

regnetY600

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_regnetY architecture=regnetY600

regnetY800

python src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_regnetY architecture=regnetY800

repvgg

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_repvgg

resnet50

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_resnet50

resnet50_kd

python -m torch.distributed.launch --nproc_per_node=8  src/super_gradients/examples/train_from_kd_recipe_example/train_from_kd_recipe.py --config-name=imagenet_resnet50_kd

vit_base

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_vit_base

vit_large

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=imagenet_vit_large

- Detection

Coco2017

ssd_lite_mobilenet_v2

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_ssd_lite_mobilenet_v2

yolox_n

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_n

yolox_t

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_t

yolox_s

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_s

yolox_m

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_m

yolox_l

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_l

yolox_x

python -m torch.distributed.launch --nproc_per_node=8 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=coco2017_yolox architecture=yolox_x

- Segmentation

Cityscapes

DDRNet23

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_ddrnet

DDRNet23-Slim

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_ddrnet architecture=ddrnet_23_slim

RegSeg48

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_regseg48

STDC1-Seg50

python -m torch.distributed.launch --nproc_per_node=2 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_stdc_seg50

STDC2-Seg50

python -m torch.distributed.launch --nproc_per_node=2 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_stdc_seg50 architecture=stdc2_seg

STDC1-Seg75

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_stdc_seg75

STDC2-Seg75

python -m torch.distributed.launch --nproc_per_node=4 src/super_gradients/examples/train_from_recipe_example/train_from_recipe.py --config-name=cityscapes_stdc_seg75 external_checkpoint_path=<stdc2-backbone-pretrained-path> architecture=stdc2_seg

Documentation

Check SuperGradients Docs for full documentation, user guide, and examples.

Contributing

To learn about making a contribution to SuperGradients, please see our Contribution page.

Our awesome contributors:


Made with contrib.rocks.

Citation

If you are using SuperGradients library or benchmarks in your research, please cite SuperGradients deep learning training library.

Community

If you want to be a part of SuperGradients growing community, hear about all the exciting news and updates, need help, request for advanced features, or want to file a bug or issue report, we would love to welcome you aboard!

  • Slack is the place to be and ask questions about SuperGradients and get support. Click here to join our Slack

  • To report a bug, file an issue on GitHub.

  • Join the SG Newsletter for staying up to date with new features and models, important announcements, and upcoming events.

  • For a short meeting with us, use this link and choose your preferred time.

License

This project is released under the Apache 2.0 license.


Deci Platform

Deci Platform is our end to end platform for building, optimizing and deploying deep learning models to production.

Sign up for our FREE Community Tier to enjoy immediate improvement in throughput, latency, memory footprint and model size.

Features:

  • Automatically compile and quantize your models with just a few clicks (TensorRT, OpenVINO).
  • Gain up to 10X improvement in throughput, latency, memory and model size.
  • Easily benchmark your models’ performance on different hardware and batch sizes.
  • Invite co-workers to collaborate on models and communicate your progress.
  • Deci supports all common frameworks and Hardware, from Intel CPUs to Nvidia's GPUs and Jetsons.

Sign up for Deci Platform for free here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

super_gradients-2.5.0-py3-none-any.whl (11.0 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page