Skip to main content

Unfied Semi-Supervised Learning Benchmark

Project description

Contributors Forks Stargazers Issues


Logo

USB: A Unified Semi-supervised learning Benchmark for CV, NLP, and Audio Classification
Paper · Benchmark · Demo · Docs · Issue · Blog · Blog (Chinese) · Video · Video (Chinese)

Table of Contents
  1. News and Updates
  2. Introduction
  3. Getting Started
  4. Usage
  5. Benchmark Results
  6. Model Zoo
  7. Community
  8. License
  9. Acknowledgments

News and Updates

  • [10/16/2022] Dataset download link and process instructions released! [Datasets]

  • [10/13/2022] We have finished the camera ready version with updated [Results]. [Openreview]

  • [10/06/2022] Training logs and results of USB has been updated! Available dataset will be uploaded soon. [Logs] [Results]

  • [09/17/2022] The USB paper has been accepted by NeurIPS 2022 Dataset and Benchmark Track! [Openreview]

  • [08/21/2022] USB has been released!

Introduction

USB is a Pytorch-based Python package for Semi-Supervised Learning (SSL). It is easy-to-use/extend, affordable to small groups, and comprehensive for developing and evaluating SSL algorithms. USB provides the implementation of 14 SSL algorithms based on Consistency Regularization, and 15 tasks for evaluation from CV, NLP, and Audio domain.

Code Structure

(back to top)

Getting Started

This is an example of how to set up USB locally. To get a local copy up, running follow these simple example steps.

Prerequisites

USB is built on pytorch, with torchvision, torchaudio, and transformers.

To install the required packages, you can create a conda environment:

conda create --name usb python=3.8

then use pip to install required packages:

pip install -r requirements.txt

Installation

We provide a Python package semilearn of USB for users who want to start training/testing the supported SSL algorithms on their data quickly:

pip install semilearn

(back to top)

Development

You can also develop your own SSL algorithm and evaluate it by cloning USB:

git clone https://github.com/microsoft/Semi-supervised-learning.git

(back to top)

Prepare Datasets

The detailed instructions for downloading and processing are shown in Dataset Download. Please follow it to download datasets before running or developing algorithms.

(back to top)

Usage

USB is easy to use and extend. Going through the belowing examples will help you faimiliar with USB for quick use, evaluate an exsiting SSL algorithm on your own dataset, or developing new SSL algorithms.

Quick Start with USB package

Please see Installation to install USB first. We provide colab tutorials for:

Start with Docker

Step1: Check your environment

You need to properly install Docker and nvidia driver first. To use GPU in a docker container You also need to install nvidia-docker2 (Installation Guide). Then, Please check your CUDA version via nvidia-smi

Step2: Clone the project

git clone https://github.com/microsoft/Semi-supervised-learning.git

Step3: Build the Docker image

Before building the image, you may modify the Dockerfile according to your CUDA version. The CUDA version we use is 11.6. You can change the base image tag according to this site. You also need to change the --extra-index-url according to your CUDA version in order to install the correct version of Pytorch. You can check the url through Pytorch website.

Use this command to build the image

cd Semi-supervised-learning && docker build -t semilearn .

Job done. You can use the image you just built for your own project. Don't forget to use the argument --gpu when you want to use GPU in a container.

Training

Here is an example to train FixMatch on CIFAR-100 with 200 labels. Trianing other supported algorithms (on other datasets with different label settings) can be specified by a config file:

python train.py --c config/usb_cv/fixmatch/fixmatch_cifar100_200_0.yaml

Evaluation

After trianing, you can check the evaluation performance on training logs, or running evaluation script:

python eval.py --dataset cifar100 --num_classes 100 --load_path /PATH/TO/CHECKPOINT

Develop

Check the developing documentation for creating your own SSL algorithm!

For more examples, please refer to the Documentation

(back to top)

Benchmark Results

Please refer to Results for benchmark results on different tasks.

(back to top)

Model Zoo

TODO: add pre-trained models.

(back to top)

TODO

  • Finish Readme
  • Updating SUPPORT.MD with content about this project's support experience
  • Multi-language Support
    • Chinese

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

If you have a suggestion that would make USB better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the project
  2. Create your branch (git checkout -b your_name/your_branch)
  3. Commit your changes (git commit -m 'Add some features')
  4. Push to the branch (git push origin your_name/your_branch)
  5. Open a Pull Request

(back to top)

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Community and Contact

The USB comunity is maintained by:

(back to top)

Citing USB

Please cite us if you fine USB helpful for your project/paper:

@inproceedings{usb2022,
  doi = {10.48550/ARXIV.2208.07204},
  url = {https://arxiv.org/abs/2208.07204},
  author = {Wang, Yidong and Chen, Hao and Fan, Yue and Sun, Wang and Tao, Ran and Hou, Wenxin and Wang, Renjie and Yang, Linyi and Zhou, Zhi and Guo, Lan-Zhe and Qi, Heli and Wu, Zhen and Li, Yu-Feng and Nakamura, Satoshi and Ye, Wei and Savvides, Marios and Raj, Bhiksha and Shinozaki, Takahiro and Schiele, Bernt and Wang, Jindong and Xie, Xing and Zhang, Yue},
  title = {USB: A Unified Semi-supervised Learning Benchmark for Classification},
  booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
  year = {2022}
}

Acknowledgments

We thanks the following projects for reference of creating USB:

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semilearn-0.3.1a0.tar.gz (117.5 kB view details)

Uploaded Source

Built Distribution

semilearn-0.3.1a0-py3-none-any.whl (180.1 kB view details)

Uploaded Python 3

File details

Details for the file semilearn-0.3.1a0.tar.gz.

File metadata

  • Download URL: semilearn-0.3.1a0.tar.gz
  • Upload date:
  • Size: 117.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for semilearn-0.3.1a0.tar.gz
Algorithm Hash digest
SHA256 faf08234f62b55b9d0fdfef6f738842dcfc5c58503731d6dff93346960588254
MD5 beca195fbbf74b8fcff7d674bcd6c9dd
BLAKE2b-256 b7dd546b6f851edd7ab1577a7676595d5e23a99b0e6e9f84f87e8615f55bd255

See more details on using hashes here.

File details

Details for the file semilearn-0.3.1a0-py3-none-any.whl.

File metadata

  • Download URL: semilearn-0.3.1a0-py3-none-any.whl
  • Upload date:
  • Size: 180.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for semilearn-0.3.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 d55891fd1f8d6831e64d443ee47f16035948d025b378da173fc2e75ff3f12151
MD5 ea4765c7e37f8b065e0763d2034b2425
BLAKE2b-256 6accb94b2b2a49ce1924c41851ee2851e73b877bef016ed49dd6d2fe5e3399b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page