Skip to main content

MAE - Masked Autoencoder (An Updated PyTorch Implementation for Single GPU with 4GB Memory)

Project description

Masked Autoencoders: An Updated PyTorch Implementation for Single GPU with 4GB Memory

This is an updated PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners for consumer GPU user for learning purpose.

  • Updated to latest Torch and Timm
  • Use Imagenette as the default dataset so that you can run the training in a consumer GPU to debug the code immediately without downloading the huge Imagenet

Github Repo: 🔗

Command to Train the model:

pip install maskedautoencoder

git checkout https://github.com/henrywoo/mae/
cd mae
pip install -r requirements.txt
bash run.sh

Screenshot of training it with a 4G GPU laptop:

One liner change to replace ImageNette with ImageNet1K:

Repalce

dataset_train = get_cv_dataset(path=DS_PATH_IMAGENETTE, transform=transform_train, name="full_size")

with

dataset_train = get_cv_dataset(path=DS_PATH_IMAGENET1K, transform=transform_train)

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code
  • Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base ViT-Large ViT-Huge
pre-trained checkpoint download download download
md5 8cad7c b8b06e 9bdbb0

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B ViT-L ViT-H ViT-H448 prev best
ImageNet-1K (no external data) 83.6 85.9 86.9 87.8 87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate) 51.7 41.8 33.8 36.8 42.5
ImageNet-Adversarial 35.9 57.1 68.2 76.7 35.8
ImageNet-Rendition 48.3 59.9 64.4 66.5 48.7
ImageNet-Sketch 34.5 45.3 49.6 50.9 36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017 70.5 75.7 79.3 83.4 75.4
iNaturalists 2018 75.4 80.1 83.0 86.8 81.2
iNaturalists 2019 80.5 83.4 85.7 88.3 84.1
Places205 63.9 65.8 65.9 66.8 66.0
Places365 57.9 59.4 59.8 60.3 58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Other Versions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maskedautoencoder-0.0.1-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file maskedautoencoder-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for maskedautoencoder-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e93d3921d7bdff8f66b95108797442c05a05e8e36d06856919f0ffb6cd94d329
MD5 847468ea9b9e77b2ac56d7ab1ccd9e95
BLAKE2b-256 6f7641741bd1350cc42aacd6ce3e9fc8b962a201eb88fde8a53c5def130b4393

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page