A codebase for active learning built on top of pycls.

These details have not been verified by PyPI

Project links

Homepage

Project description

TorchAL codebase

Source code for our CVPR 2022 Paper: Towards Robust and Reproducible Active Learning Using Neural Networks

cifar_five_lSet_statistics_latest_wo_cog — Figure 1. Comparisons of AL methods on CIFAR10 (top) and CIFAR100 (bottom) for different initial labeled sets L0, L1, · · · , L4. The mean accuracy for the base model (at 10% labeled data) is noted at the bottom of each subplot. The model is trained 5 times for different random initialization seeds where for the first seed we use AutoML to tune hyper-parameters and re-use these hyper-parameters for the other 4 seeds.

Abstract

Active learning (AL) is a promising ML paradigm that has the potential to parse through large unlabeled data and help reduce annotation cost in domains where labeling data can be prohibitive. Recently proposed neural network based AL methods use different heuristics to accomplish this goal. In this study, we demonstrate that under identical experimental settings, different types of AL algorithms (uncertainty based, diversity based, and committee based) produce an inconsistent gain over random sampling baseline. Through a variety of experiments, controlling for sources of stochasticity, we show that variance in performance metrics achieved by AL algorithms can lead to results that are not consistent with the previously reported results. We also found that under strong regularization, AL methods show marginal or no advantage over the random sampling baseline under a variety of experimental conditions. Finally, we conclude with a set of recommendations on how to assess the results using a new AL algorithm to ensure results are reproducible and robust under changes in experimental conditions. We share our codes to facilitate AL evaluations. We believe our findings and recommendations will help advance reproducible research in AL using neural networks.

What is TorchAL?

TL;DR: An Active Learning framework built on top of pycls.

TorchAL is an evaluation toolkit with a motive to advance the reproducible research in deep active learning. We currently implement state-of-the-art active learning (AL) algorithms. Our tookit extends the widely used pycls codebase under AL settings.

Features of TorchAL

We report strong random baselines across widely used architectures and datasets.
Our baselines are well-trained using AutoML which helps in reducing the bias introduced by choosing sub-optimal hyper-parameters.
As we dream of reproducible results in AL, we release the training, validation index sets so that newer AL methods in future can use exact labeled set for training as we used to report our strong baselines.
For familiarity with the codebase, we recommend interested users to go through the notebooks.

AutoML in Active Learning

During AL iterations we observed that labeled set changes and therefore does the class distribution too. Thus in contrast to contemporary AL methods which fix the training hyper-parameters at the start of AL, we tune the training hyper-parameters using AutoML. To facilitate this we make use of optuna to perform random search over 50 trials for each AL cycle.

Requirements

For creating a conda environment, kindly refer to conda_env.yaml
For installing dependencies via pip, kindly refer to requirements.txt

NOTE: In either case we have to download the dataset indexes and follow tools/train_al.py: Dataset index sets

wget https://github.com/PrateekMunjal/torchal/blob/master/dataset_indexes.zip

Installation

From source

git clone https://github.com/PrateekMunjal/TorchAL
cd TorchAL
python setup.py install

From pip

pip install torchal

Dataset indexes and Pretrained models

Dataset and active set indexes: Click here to download
Pretrained CIFAR models trained on 10% data: Click here to download

For interested readers we recommend to checkout notebook ensuring reproducible active sets. Link to Notebook

AL algorithms implemented

Uncertainty
Coreset
BALD
DBAL
VAAL
QBC
Random Baseline

Experimental Settings

NOTE: Below 5% meains 5 percent of the full training dataset. For example, for CIFAR10 - we have 50k datapoints as their official training split so with 5% we have 2500 datapoints in our initial labeled set.

Different budget sizes: 5%, 10%
Different validation set sizes: 2%, 5%, 10%
Effect of regularization techniques
We share notebooks demonstrating such different experimental settings.
For interested readers we recommend to follow the summary of experiments presented here.

Examples

Run the random baseline without strong regularization

Expand to see the full script

pythonExec=$1

cd /raid/shadab/prateek/newcode

# script params
port=5035
sampling_fn=uncertainty
lSet_partition=1
base_seed=1
num_GPU=2
al_iterations=4 #7 #4
num_aml_trials=3 #50
budget_size=5000 #2500

dataset=CIFAR10
init_partition=10
step_partition=10
clf_epochs=5 #150
num_classes=10

log_iter=40

#Data arguments
train_dir=/raid/shadab/prateek/newcode/data/$dataset/train-$dataset/
test_dir=/raid/shadab/prateek/newcode/data/$dataset/test-$dataset/
lSetPath=/raid/shadab/prateek/newcode/data/$dataset/partition_$lSet_partition/lSet_$dataset.npy
uSetPath=/raid/shadab/prateek/newcode/data/$dataset/partition_$lSet_partition/uSet_$dataset.npy
valSetPath=/raid/shadab/prateek/newcode/data/$dataset/partition_$lSet_partition/valSet_$dataset.npy

#for lSet 1
out_dir=/raid/shadab/prateek/newcode/results 

# for other lSet Exps
# out_dir=/raid/shadab/prateek/newcode/results_lSetPartitions

#model_types: (i) wide_resnet_50 (ii) wide_resnet_28_10 (iii) wide_resnet_28_2

model_style=vgg_style
model_type=vgg #resnet_shake_shake
model_depth=16 #26

export CUDA_VISIBLE_DEVICES=0,1

$pythonExec tools/main_aml.py --n_GPU $num_GPU \
--port $port --sampling_fn $sampling_fn --lSet_partition $lSet_partition \
--seed_id $base_seed \
--init_partition $init_partition --step_partition $step_partition \
--dataset $dataset --budget_size $budget_size \
--out_dir $out_dir \
--num_aml_trials $num_aml_trials --num_classes $num_classes \
--al_max_iter $al_iterations \
--model_type $model_type --model_depth $model_depth \
--clf_epochs $clf_epochs \
--eval_period 1 --checkpoint_period 1 \
--lSetPath $lSetPath --uSetPath $uSetPath --valSetPath $valSetPath \
--train_dir $train_dir --test_dir $test_dir \
--dropout_iterations 25 \
--cfg configs/$dataset/$model_style/$model_type/R-18_4gpu_unreg.yaml \
--vaal_z_dim 32 --vaal_vae_bs 64 --vaal_epochs 15 \
--vaal_vae_lr 5e-4 --vaal_disc_lr 5e-4 --vaal_beta 1.0 --vaal_adv_param 1.0 \

Usage: Assume above script is named as **run.sh**, then we can simply run it

sh run.sh `which python`

Run the random baseline with strong regularization

In the above script we only need to add few more switches to add strong-regularization.

swa_lr=5e-4
swa_freq=50
swa_epochs=5 #50

...
--rand_aug --swa_mode --swa_freq $swa_freq --swa_lr $swa_lr \
--swa_epochs $swa_epochs --swa_iter 0 \

Citing TorchAL

If you use TorchAL, please consider citing:

@inproceedings{Munjal2022TorchAL,
    title={Towards Robust and Reproducible Active Learning Using Neural Networks}, 
    author={Prateek Munjal and Nasir Hayat and Munawar Hayat and Jamshid Sourati 
            and Shadab Khan},
    booktitle={CVPR},
    year={2022}
}

Acknowledgement

This repository is built using the following repositories. Thanks for their wonderful works.

Contact

If you have any question about this project, please feel free to contact prateekmunjal31@gmail.com or skhan.shadab@gmail.com.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.2

Apr 21, 2022

0.0.1

Apr 3, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchal-0.0.2.tar.gz (96.2 kB view details)

Uploaded Apr 21, 2022 Source

Built Distribution

torchal-0.0.2-py3-none-any.whl (127.1 kB view details)

Uploaded Apr 21, 2022 Python 3

File details

Details for the file torchal-0.0.2.tar.gz.

File metadata

Download URL: torchal-0.0.2.tar.gz
Upload date: Apr 21, 2022
Size: 96.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.8.2 requests/2.25.1 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.6.13

File hashes

Hashes for torchal-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`a4f30be2d5797bbb4610e34fe6f12e37966460befaa82daa4cf1e7df66418970`
MD5	`109db716f9501e5722b4ad833b9fd286`
BLAKE2b-256	`daa6c96527a009b14ae3c1bbeed3f6fbf3930e5ee5f2b62cfbbee2970e40fa76`

See more details on using hashes here.

File details

Details for the file torchal-0.0.2-py3-none-any.whl.

File metadata

Download URL: torchal-0.0.2-py3-none-any.whl
Upload date: Apr 21, 2022
Size: 127.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.8.2 requests/2.25.1 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.6.13

File hashes

Hashes for torchal-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`295aedcd20e34ef6e1b71f5b9cd54bdd110a9f7f4f751e2f598051bf91d1cd65`
MD5	`b88e65dd5d8a437aa91c700dd39943ee`
BLAKE2b-256	`1d51e81fcc2e26ac05fc6f1c664198f5605da7c544803488c606b7a7fa6acf69`

See more details on using hashes here.

torchal 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TorchAL codebase

Abstract

What is TorchAL?

Features of TorchAL

AutoML in Active Learning

Requirements

Installation

From source

From pip

Dataset indexes and Pretrained models

AL algorithms implemented

Experimental Settings

Examples

Run the random baseline without strong regularization

Run the random baseline with strong regularization

Citing TorchAL

Acknowledgement

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes