Skip to main content

Automatic Deep Learning, towards fully automated multi-label classification for image, video, text, speech, tabular data.

Project description

English | 简体中文

GitHub issues GitHub forks GitHub stars GitHub release (latest by date) GitHub license img img

1. NeurIPS AutoDL Challenge 1'st Solution

img img

1st solution for AutoDL Challenge@NeurIPS, competition rules can be found at AutoDL Competition.

1.0.1. Motivation

There exists a series of common and tough problems in the real world, such as limited resources (CPU/ memory), skewed data, hand-craft features, model selection, network architecture details tuning, sensitivity of pre-trained models, sensitivity of hyperparameters and so on. How to solve them wholly and efficiently?

1.0.2. Solution

AutoDL concentrates on developing generic algorithms for multi-label classification problems in ANY modalities: image, video, speech, text and tabular data without ANY human intervention. Ten seconds at the soonest, our solution achieved SOTA performances on all the 24 offline datasets and 15 online datasets, beating a number of top players in the world.

1.1. Table of Contents

1.2. Features

  • Full-AutoML/AutoDL: Fully automated Deep Learning without ANY human intervention covering the whole pipelines.
  • Generic & Universal: Supporting ANY modality(image, video, speech, text, tabular) data, and ANY classification problems including binary-class, multi-class and multi-label problems.
  • SOTA: Winner solution of AutoDL challenge, involving both tranditional machine learning models and deep learning model backbones.
  • Out-of-the-Box: You can use the solution out-of-the-box.
  • Fast: You can train your model in ten seconds at the soonest to get highly competitive performance.
  • Real-time: You can get the performance feedback(AUC score) in real time.

1.3. Evaluation

  • Feedback-phase leaderboard: DeepWisdom Top 1, average rank 1.2, won 4 out of 5 datasets. img

  • Final-phase leaderboard visualization: DeepWisdom Top 1, average rank 1.2, won 7 out of 10 datasets. img

1.4. Installation

This repo is tested on Python 3.6+, PyTorch 1.0.0+ and TensorFlow 2.0.

You should install AutoDL in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Create a virtual environment with the version of Python you're going to use and activate it.

Now, if you want to use AutoDL, you can install it with pip.

1.4.1. With pip

AutoDL can be installed using pip as follows:

pip install autodl-gpu 
pip install autodl-gpu=1.0.0 

1.5. Quick Tour

1.5.1. Run local test tour

see Quick Tour - Run local test tour.

1.5.2. Tour of Image Classification

see Quick Tour - Image Classification Demo.

1.5.3. Tour of Video Classification

see Quick Tour - Video Classification Demo.

1.5.4. Tour of Speech Classification

see Quick Tour - Speech Classification Demo.

1.5.5. Tour of Text Classification

see Quick Tour - Text Classification Demo.

1.5.6. Tour of Tabular Classification

see Quick Tour - Tabular Classification Demo.

1.6. Public Datasets

1.6.1. Optional: Download public datasets

python download_public_datasets.py

1.6.2. Public datasets sample info

# Name Type Domain Size Source Data (w/o test labels) Test labels
1 Munster Image HWR 18 MB MNIST munster.data munster.solution
2 City Image Objects 128 MB Cifar-10 city.data city.solution
3 Chucky Image Objects 128 MB Cifar-100 chucky.data chucky.solution
4 Pedro Image People 377 MB PA-100K pedro.data pedro.solution
5 Decal Image Aerial 73 MB NWPU VHR-10 decal.data decal.solution
6 Hammer Image Medical 111 MB Ham10000 hammer.data hammer.solution
7 Kreatur Video Action 469 MB KTH kreatur.data kreatur.solution
8 Kreatur3 Video Action 588 MB KTH kreatur3.data kreatur3.solution
9 Kraut Video Action 1.9 GB KTH kraut.data kraut.solution
10 Katze Video Action 1.9 GB KTH katze.data katze.solution
11 data01 Speech Speaker 1.8 GB -- data01.data data01.solution
12 data02 Speech Emotion 53 MB -- data02.data dat02.solution
13 data03 Speech Accent 1.8 GB -- data03.data data03.solution
14 data04 Speech Genre 469 MB -- data04.data data04.solution
15 data05 Speech Language 208 MB -- data05.data data05.solution
16 O1 Text Comments 828 KB -- O1.data O1.solution
17 O2 Text Emotion 25 MB -- O2.data O2.solution
18 O3 Text News 88 MB -- O3.data O3.solution
19 O4 Text Spam 87 MB -- O4.data O4.solution
20 O5 Text News 14 MB -- O5.data O5.solution
21 Adult Tabular Census 2 MB Adult adult.data adult.solution
22 Dilbert Tabular -- 162 MB -- dilbert.data dilbert.solution
23 Digits Tabular HWR 137 MB MNIST digits.data digits.solution
24 Madeline Tabular -- 2.6 MB -- madeline.data madeline.solution

1.7. Usage for AutoDL local development and testing

w1. Git clone the repo

cd <path_to_your_directory>
git clone https://github.com/DeepWisdom/AutoDL.git
  1. Prepare pretrained models. Download model speech_model.h5 and put it to AutoDL_sample_code_submission/at_speech/pretrained_models/ directory.

  2. Optional: run in the exact same environment as on the challenge platform with docker.

    • CPU
    cd path/to/autodl/
    docker run -it -v "$(pwd):/app/codalab" -p 8888:8888 evariste/autodl:cpu-latest
    
    • GPU
    nvidia-docker run -it -v "$(pwd):/app/codalab" -p 8888:8888 evariste/autodl:gpu-latest
    
  3. Prepare sample datasets, using the toy data in AutoDL_sample_data or download new datasets.

  4. Run local test

python run_local_test.py

The full usage is

python run_local_test.py -dataset_dir='AutoDL_sample_data/miniciao' -code_dir='AutoDL_sample_code_submission'

Then you can view the real-time feedback with a learning curve by opening the HTML page in AutoDL_scoring_output/.

Details can be seen in AutoDL Challenge official starting_kit.

1.8. Contributing

Feel free to dive in! Open an issue or submit PRs.

1.9. Contact us

img

1.10. Join the Community

Scan QR code and join AutoDL community!

AutoDL Community

1.11. License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autodl-gpu-0.1.1.tar.gz (173.4 kB view hashes)

Uploaded Source

Built Distribution

autodl_gpu-0.1.1-py3-none-any.whl (235.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page