Skip to main content

A Self-Supervised Learning Library

Project description

AK_SSL Logo


AK_SSL: A Self-Supervised Learning Library

GitHub Code style: black PyPI - Version Downloads


📒 Table of Contents

  • 📒 Table of Contents
  • 📍 Overview
  • ✍️ Self Supervised Learning
  • 🔎 Supported Methods
  • 📦 Installation
  • 💡 Tutorial
  • 📊 Benchmarks
  • 📜 References Used
  • 💯 License
  • 🤝 Collaborators

📍 Overview

Welcome to the Self-Supervised Learning Library! This repository hosts a collection of tools and implementations for self-supervised learning. Self-supervised learning is a powerful paradigm that leverages unlabeled data to pre-trained models, which can then be fine-tuned on specific tasks with smaller labeled datasets. This library aims to provide researchers and practitioners with a comprehensive set of tools to experiment, learn, and apply self-supervised learning techniques effectively. This project was our assignment during the summer apprenticeship and final project in the newly established Intelligent and Learning System (ILS) laboratory at the University of Isfahan.


✍️ Self Supervised Learning

Self-supervised learning is a subfield of machine learning where models are trained to predict certain aspects of the input data without relying on manual labeling. This approach has gained significant attention due to its ability to leverage large amounts of unlabeled data, which is often easier to obtain than fully annotated datasets. This library provides implementations of various self-supervised techniques, allowing you to experiment with and apply these methods in your own projects.


🔎 Supported Methods

Vision Models

  • BarlowTwins
  • BYOL
  • DINO
  • MoCo v2
  • MoCo v3
  • SimCLR v1
  • SimCLR v2
  • SimSiam
  • SwAV

Multimodal Models

  • CLIP
  • ALBEF
  • SLIP
  • VSE
  • SimVLM
  • UNITER

📦 Installation

You can install AK_SSL and its dependencies from PyPI with:

pip install AK-SSL

We strongly recommend that you install AK_SSL in a dedicated virtualenv, to avoid conflicting with your system packages


💡 Tutorial

Using AK_SSL, you have the flexibility to leverage the most recent self-supervised learning techniques seamlessly, harnessing the complete capabilities of PyTorch. You can explore diverse backbones, models, and optimizer while benefiting from a user-friendly framework that has been purposefully crafted for ease of use.

Initializing the Trainer for Vision Models

You can easily import Trainer module from AK_SSL library and start utilizing it right away.

from AK_SSL.vision import Trainer

Now, let's initialize the self-supervised trainer with our chosen method, backbone, dataset, and other configurations.

trainer = Trainer(
    method="barlowtwins",           # training method as string (BarlowTwins, BYOL, DINO, MoCov2, MoCov3, SimCLR, SimSiam, SwAV)
    backbone=backbone,              # backbone architecture as torch.Module
    feature_size=feature_size,      # size of the extracted features as integer
    image_size=32,                  # dataset image size as integer
    save_dir="./save_for_report/",  # directory to save training checkpoints and Tensorboard logs as string
    checkpoint_interval=50,         # interval (in epochs) for saving checkpoints as integer
    reload_checkpoint=False,        # reload a previously saved checkpoint as boolean
    verbose=True,                   # enable verbose output for training progress as a boolean
    **kwargs                        # other arguments 
)

Note: The use of **kwargs can differ between methods, depending on the specific method, loss function, transformation, and other factors. If you are utilizing any of the objectives listed below, you must provide their arguments during the initialization of the Trainer class.

  • SimCLR Transformation
      color_jitter_strength     # a float to Set the strength of color
      use_blur                  # a boolean to specify whether to apply blur augmentation
      mean                      # a float to specify the mean values for each channel
      std                       # a float to specify the standard deviation values for each channel
    
  • BarlowTwins
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        hidden_dim              # an integer to specify dimensionality of the hidden layers in the neural network
        moving_average_decay    # a float to specify decay rate for moving averages during training
      
    • Loss
        lambda_param            # a float to controlling the balance between the main loss and the orthogonality loss
      
  • DINO Method
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
        bottleneck_dim          # an integer to specify dimensionality of the bottleneck layer in the student network
        temp_student            # a float to specify temperature parameter for the student's logits
        temp_teacher            # a float to specify temperature parameter for the teacher's logits
        norm_last_layer         # a boolean to specify whether to normalize the last layer of the network
        momentum_teacher        # a float to control momentum coefficient for updating the teacher network
        num_crops               # an integer to determines the number of augmentations applied to each input image
        use_bn_in_head          # a boolean to spcecify whether to use batch normalization in the projection head
      
    • Loss
        center_momentum        # a float to control momentum coefficient for updating the center of cluster assignments
      
  • MoCo v2
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        K                       # an integer to specify number of negative samples per positive sample in the contrastive loss
        m                       # a float to control momentum coefficient for updating the moving-average encoder
      
    • Loss
        temperature             # a float to control the temperature for the contrastive loss function
      
  • MoCo v3
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
        moving_average_decay    # a float to specify decay rate for moving averages during training
      
    • Loss
        temperature             # a float to control the temperature for the contrastive loss function
      
  • SimCLR
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        projection_num_layers   # an integer to specify the number of layers in the projection head (1: SimCLR v1, 2: SimCLR v2)
        projection_batch_norm   # a boolean to indicate whether to use batch normalization in the projection head
      
    • Loss
        temperature             # a float to control the temperature for the contrastive loss function
      
  • SimSiam
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
      
    • Loss
        eps                     # a float to control the stability of the loss function
      
  • SwAV
    • Method
        projection_dim          # an integer to specify dimensionality of the projection head
        hidden_dim              # an integer to specify dimensionality of the hidden layers in the projection head neural network
        epsilon                 # a float to control numerical stability in the algorithm
        sinkhorn_iterations     # an integer to specify the number of iterations in the Sinkhorn-Knopp algorithm
        num_prototypes          # an integer to specify the number of prototypes or clusters for contrastive learning
        queue_length            # an integer to specify rhe length of the queue for maintaining negative samples
        use_the_queue           # a boolean to indicate whether to use the queue for negative samples
        num_crops               # an integer to determines the number of augmentations applied to each input image
      
    • Loss
        temperature             # a float to control the temperature for the contrastive loss function
      

Initializing the Trainer for Multimodal Models

You can easily import Trainer module from AK_SSL library and start utilizing it right away.

from AK_SSL.multimodal import Trainer

Now, let's initialize the self-supervised trainer with our chosen method, backbone, dataset, and other configurations.

trainer = Trainer(
    method="clip",                  # training method as string (CLIP, ALBEF, SLIP, SimVLM, UNITER, VSE)
    image_encoder=img_encoder,      # vision model to extract image features as nn.Module
    text_encoder=txt_encoder,       # text model to extract text features as nn.Module
    mixed_precision_training=True,  # whether to use mixed precision training or not as boolean
    save_dir="./save_for_report/",  # directory to save training checkpoints and Tensorboard logs as string
    checkpoint_interval=50,         # interval (in epochs) for saving checkpoints as integer
    reload_checkpoint=False,        # reload a previously saved checkpoint as boolean
    verbose=True,                   # enable verbose output for training progress as a boolean
    **kwargs                        # other arguments 
)

Note: The use of **kwargs can differ between methods, depending on the specific method, loss function, transformation, and other factors. If you are utilizing any of the objectives listed below, you must provide their arguments during the initialization of the Trainer class.

  • CLIP
      image_feature_dim         # Dimension of the image features as integer
      text_feature_dim          # Dimension of the text features as integer
      embed_dim                 # Dimension of the embeddings as integer
      init_tau                  # Initial value of tau as float
      init_b                    # Initial value of b as float
    
  • ALBEF
      mlm_probability           # Masked language modeling probability as float
      embed_dim                 # Dimension of the embeddings as integer
      vision_width              # Vision encoder output width as integer
      temp                      # Temperature parameter as float
      queue_size                # Queue size as integer
      momentum                  # Momentum parameter as float
    
  • SimVLM
      transformer_encoder       # Transformer encoder for vision and text embeddings as nn.Module
      transformer_decoder       # Transformer decoder for embeddings as nn.Module
      vocab_size                # Size of the vocabulary as integer
      feature_dim               # Dimension of the features as integer
      max_seq_len               # Maximum sequence length as integer
      max_trunc_txt_len         # Maximum truncated text length as integer
      prefix_txt_len            # Prefix text length as integer
      target_txt_len            # Target text length as integer
      pad_idx                   # Padding index as integer
      image_resolution          # Image resolution as integer
      patch_size                # Patch size as integer
      num_channels              # Number of channels as integer
    
  • SLIP
      mlp_dim                   # Dimension of the MLP as integer
      vision_feature_dim        # Dimension of the vision features as integer
      transformer_feature_dim   # Dimension of the transformer features as integer
      embed_dim                 # Dimension of the embeddings as integer
    
  • UNITER
      pooler                          # pooler as nn.Module
      encoder                         # transformer encoder as nn.Module
      num_answer                      # number of answer classes as integer
      hidden_size                     # hidden size as integer
      attention_probs_dropout_prob    # dropout rate as float
      initializer_range               # initializer range as float
    
  • VSE
      margin                   # Margin for contrastive loss as float
    

Training the Self-Supervised Model for Vision Models

Then, we'll train the self-supervised model using the specified parameters.

  trainer.train(
      dataset=train_dataset,          # training dataset as torch.utils.data.Dataset               
      batch_size=256,          # the number of training examples used in each iteration as integer
      start_epoch=1,           # the starting epoch for training as integer (if 'reload_checkpoint' parameter was True, start epoch equals to the latest checkpoint epoch)
      epochs=100,              # the total number of training epochs as integer
      optimizer="Adam",        # the optimization algorithm used for training as string (Adam, SGD, or AdamW)
      weight_decay=1e-6,       # a regularization term to prevent overfitting by penalizing large weights as float
      learning_rate=1e-3,      # the learning rate for the optimizer as float
)

Training the Self-Supervised Model for Multimodal Models

Then, we'll train the self-supervised model using the specified parameters.

  trainer.train(
      dataset=train_dataset,           # the training data set as torch.utils.data.Dataset             
      batch_size=256,          # the number of training examples used in each iteration as integer
      start_epoch=1,           # the starting epoch for training as integer (if 'reload_checkpoint' parameter was True, start epoch equals to the latest checkpoint epoch)
      epochs=100,              # the total number of training epochs as integer
      optimizer="Adam",        # the optimization algorithm used for training as string (Adam, SGD, or AdamW)
      weight_decay=1e-6,       # a regularization term to prevent overfitting by penalizing large weights as float
      learning_rate=1e-3,      # the learning rate for the optimizer as float
)

Evaluating the Vision Self-Supervised Models

This evaluation assesses how well the pre-trained model performs on a dataset, specifically for tasks related to linear evaluation.

trainer.evaluate(
    train_dataset=train_dataset,      # to specify the training dataset as torch.utils.data.Dataset
    test_dataset=test_dataset,        # to specify the testing dataset as torch.utils.data.Dataset
    eval_method="linear",             # the evaluation method to use as string (linear or finetune)
    top_k=1,                          # the number of top-k predictions to consider during evaluation as integer
    epochs=100,                       # the number of evaluation epochs as integer
    optimizer='Adam',                 # the optimization algorithm used during evaluation as string (Adam, SGD, or AdamW)
    weight_decay=1e-6,                # a regularization term applied during evaluation to prevent overfitting as float
    learning_rate=1e-3,               # the learning rate for the optimizer during evaluation as float
    batch_size=256,                   # the batch size used for evaluation in integer
    fine_tuning_data_proportion=1,    # the proportion of training data to use during evaluation as float in range of (0.0, 1]
)

Get the Vision Self-Supervised Models backbone

In case you want to use the pre-trained network in your own downstream task, you need to define a downstream task model. This model should include the self-supervised model backbone as one of its components. Here's an example of how to define a simple downstream model class:

  class DownstreamNet(nn.Module):
      def __init__(self, backbone, **kwargs):
          super().__init__()
          self.backbone = backbone
  
          # You can define your downstream task model here
  
      def forward(self, x):
          x = self.backbone(x)
          # ...
  
  
  downstream_model = DownstreamNet(trainer.get_backbone())

Loading Self-Supervised Model Checkpoint

To load a previous checkpoint into the network, you can do as below.

path = 'YOUR CHECKPOINT PATH'
trainer.load_checkpoint(path)

Saving Self-Supervised Model backbone

To save model backbone, you can do as below.

trainer.save_backbone()

That's it! You've successfully trained and evaluate a self-supervised model using the AK_SSL Python library. You can further customize and experiment with different self-supervised methods, backbones, and hyperparameters to suit your specific tasks. You can find the description of Trainer class and its function using help built in fuction in python.


📊 Benchmarks

We executed models and obtained results on the CIFAR10 dataset, with plans to expand our experimentation to other datasets. Please note that hyperparameters were not optimized for maximum accuracy.

Method Backbone Batch Size Epoch Optimizer Learning Rate Weight Decay Linear Top1 Fine-tune Top1 Download Backbone Download Full Checkpoint
BarlowTwins Resnet18 256 800 Adam 1e-3 1e-6 70.92% 79.50% Link Link
BYOL Resnet18 256 800 Adam 1e-3 1e-6 71.06% 71.04%
DINO Resnet18 256 800 Adam 1e-3 1e-6 9.91% 9.76%
MoCo v2 Resnet18 256 800 Adam 1e-3 1e-6 70.08% 78.71% Link Link
MoCo v3 Resnet18 256 800 Adam 1e-3 1e-6 59.98% 74.20% Link Link
SimCLR v1 Resnet18 256 800 Adam 1e-3 1e-6 73.09% 72.75% Link Link
SimCLR v2 Resnet18 256 800 Adam 1e-3 1e-6 73.07% 81.52%
SimSiam Resnet18 256 800 Adam 1e-3 1e-6 19.77% 70.77% Link Link
SwAv Resnet18 256 800 Adam 1e-3 1e-6 33.36% 74.14%

📜 References Used

In the development of this project, we have drawn inspiration and utilized code, libraries, and resources from various sources. We would like to acknowledge and express our gratitude to the following references and their respective authors:

These references have played a crucial role in enhancing the functionality and quality of our project. We extend our thanks to the authors and contributors of these resources for their valuable work.


💯 License

This project is licensed under the MIT License.


🤝 Collaborators

By:

Thanks to Dr. Peyman Adibi and Dr. Hossein Karshenas, for their invaluable guidance and support throughout this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ak_ssl-0.2.0.tar.gz (48.5 kB view details)

Uploaded Source

Built Distribution

AK_SSL-0.2.0-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file ak_ssl-0.2.0.tar.gz.

File metadata

  • Download URL: ak_ssl-0.2.0.tar.gz
  • Upload date:
  • Size: 48.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for ak_ssl-0.2.0.tar.gz
Algorithm Hash digest
SHA256 576706991d2fa438190ce303ffd9b68d78e9c09f1271ee50edf7ea21074ede8c
MD5 cd3c8f1dc37ddd34a9ea376fd2d5809c
BLAKE2b-256 03149ed56e1651a41a4206aae3f0d87b75bda30a3fa78e51579fbdcde0224556

See more details on using hashes here.

File details

Details for the file AK_SSL-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: AK_SSL-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for AK_SSL-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2a4c7fc2ca190320d48416006ce87bd89923ffc41275b832a3a08bff3489e47
MD5 f2073b70bbc516ea6f7f06307901f67d
BLAKE2b-256 67496b41ab158c90e40845c802f35bc45cfefef549930ad8efd1f0b21663670f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page