Skip to main content

Extracting image features from state-of-the-art neural networks for Computer Vision made easy

Project description

Model collection

Features can be extraced for all models in torchvision, each of the CORnet versions and both CLIP variants (clip-ViT and clip-RN). For the correct abbrevations of torchvision models have a look here. For the correct abbrevations of CORnet models look here. To separate the string cornet from its variant (e.g., s, z) use a hyphen instead of an underscore (e.g., cornet-s, cornet-z).

Examples: alexnet, resnet50, resnet101, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn, cornet-s, clip-ViT

Environment Setup

Make sure you have the latest Python version (>= 3.7) and install PyTorch 1.7.1. Note that PyTorch 1.7.1 requires CUDA 10.2 or above, if you want to extract network activations on a GPU. However, the code runs already pretty fast on a strong CPU (Intel i7 or i9). Run the following pip command in your terminal.

$ pip install thingsvision

You have to download files from the parent repository (i.e., this repo), if you want to extract network activations for THINGS. Simply download the shell script get_files.sh from this repo and execute it as follows (the shell script will do file downloading and moving for you):

$ wget https://raw.githubusercontent.com/ViCCo-Group/THINGSvision/master/get_files.sh (Linux)
$ curl -O https://raw.githubusercontent.com/ViCCo-Group/THINGSvision/master/get_files.sh (macOS)
$ bash get_files.sh

Execute the following lines to have the latest PyTorch and CUDA versions available (not necessary, but perhaps desirable):

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine (e.g., 10.2) or cpuonly when installing on a machine without a GPU.

Extract features at specific layer of a state-of-the-art torchvision, CORnet or CLIP model

Example call for AlexNet:

import torch
import thingsvision.vision as vision

model_name = 'alexnet'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model, transforms = vision.load_model(model_name, pretrained=True, model_path=None, device=device)
module_name = vision.show_model(model, model_name)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

#Enter part of the model for which you would like to extract features:

(e.g., "features.10")

dl = vision.load_dl(root='./images/', out_path=f'./{model_name}/{module_name}/features', batch_size=64, transforms=transforms)
features, targets, predictions = vision.extract_features(model, dl, module_name, batch_size=64, flatten_acts=True, device=device, return_predictions=True)

vision.save_features(features, f'./{model_name}/{module_name}/features', '.npy')

Example call for CLIP:

import torch
import thingsvision.vision as vision

model_name = 'clip-ViT'
module_name = 'visual'

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model, transforms = vision.load_model(model_name, pretrained=True, model_path=None, device=device)
dl = vision.load_dl(root='./images/', out_path=f'./{model_name}/{module_name}/features', batch_size=64, transforms=transforms)
features, targets = vision.extract_features(model, dl, module_name, batch_size=64, flatten_acts=False, device=device, clip=True, return_predictions=False)

features = vision.center_features(features)

vision.save_features(features, f'./{model_name}/{module_name}/features', '.npy')
vision.save_targets(targets, f'./{model_name}/{module_name}/targets', '.npy')

Example call for CORnet

import torch
import thingsvision.vision as vision

model_name = 'cornet-s'
device = 'cuda' if torch.cuda.is_available() else 'cpu'

model, transforms = vision.load_model(model_name, pretrained=True, model_path=None, device=device)
module_name = vision.show_model(model, model_name)

Sequential(
  (V1): Sequential(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (nonlin1): ReLU(inplace=True)
    (pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (norm2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (nonlin2): ReLU(inplace=True)
    (output): Identity()
  )
  (V2): CORblock_S(
    (conv_input): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(128, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (V4): CORblock_S(
    (conv_input): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(256, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (IT): CORblock_S(
    (conv_input): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(512, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder): Sequential(
    (avgpool): AdaptiveAvgPool2d(output_size=1)
    (flatten): Flatten()
    (linear): Linear(in_features=512, out_features=1000, bias=True)
    (output): Identity()
  )
)

#Enter part of the model for which you would like to extract features:

(e.g., "decoder.flatten")

dl = vision.load_dl(root='./images/', out_path=f'./{model_name}/{module_name}/features', batch_size=64, transforms=transforms)
features, targets = vision.extract_features(model, dl, module_name, batch_size=64, flatten_acts=False, device=device, return_predictions=False)

features = vision.center_features(features)
features = vision.normalize_features(features)

vision.save_features(features, f'./{model_name}/{module_name}/features', '.npy')

Predictions and ImageNet classes

Do you want to get model predictions alongside activations for free, and subsequently convert them into ImageNet classes? Simply set the return_predictions argument to True and use the get_imagenet_classes helper (the function works for both synset and class files). Note that this is, unfortunately, not (yet) possible for CLIP models due to their multi-modality and different training objectives.

features, targets, predictions = vision.extract_features(model, dl, module_name, batch_size, flatten_acts=False, device=device, return_predictions=True)
imagenet_classes = vision.get_imagenet_classes(PATH='./data/imagenet1000_classes.txt', predictions=predictions)

Model comparison

To compare object representations extracted from specifid models and layers against each other, for a List[str] of models and layers a user can perform the following operation,

clip_list = [n.startswith('clip') for n in model_names]

correlations = vision.compare_models(
                                     root=root,
                                     out_path=out_path,
                                     model_names=model_names,
                                     module_names=module_names,
                                     pretrained=True,
                                     batch_size=batch_size,
                                     flatten_acts=True,
                                     clip=clip_list,
                                     save_features=True,
                                     dissimilarity='correlation',
                                     correlation='pearson',
                                    )

The function returns a correlation matrix in the form of a Pandas dataframe whose rows and columns correspond to the names of the models in model_names. The cell elements are the correlation coefficients for each model combination. The dataframe can subsequenly be converted into a heatmap with matplotlib or seaborn. We will release a clear and concise documentary as soon as possible. Until then, we recommend to look at Section 3.2.3 in the bioRxiv preprint.

IMPORTANT NOTES:

  1. Image data will automatically be converted into a ready-to-use dataset class, and subsequently wrapped with a PyTorch mini-batch dataloader to make neural activation extraction more efficient.

  2. If you happen to use the THINGS image database, make sure to correctly unzip all zip files (sorted from A-Z), and have all object directories stored in the parent directory ./images/ (e.g., ./images/object_xy/) as well as the things_concept.tsv file stored in the ./data/ folder. bash get_files.sh does the latter for you. Images, however, must be downloaded from the THINGS database.

  3. Features can be extracted at every layer for all torchvision, CORnet and CLIP models.

  4. If you happen to be interested in an ensemble of feature maps, as introduced in this recent COLING 2020 paper, you can simply extract an ensemble of conv or max-pool layers. The ensemble can additionally be concatenated with the activations of the penultimate layer, and subsequently transformed into a lower-dimensional space with PCA to reduce noise and only keep those dimensions that account for most of the variance.

  5. The script automatically extracts features for the specified model and layer and stores them together with the targets in out_path (see above).

  6. Since 4-way tensors cannot be easily saved to disk, they must be sliced into different parts to be efficiently stored as a matrix. The helper function tensor2slices will slice any 4-way tensor (activations extraced from features.##) automatically for you, and will save it as a matrix in a file called activations.txt. To merge the slices back into the original shape (i.e., 4-way tensor) simply call slices2tensor which takes out_path and file_name (see above) as input arguments (e.g., tensor = slices2tensor(PATH, file)).

  7. If you happen to extract hidden unit activations for many images, it is possible to run into MemoryErrors. To circumvent such problems, a helper function called split_activations will split the activation matrix into several batches, and stores them in separate files. For now, the split parameter is set to 10. Hence, the function will split the activation matrix into 10 files. This parameter can, however, easily be modified in case you need more (or fewer) splits. To merge the separate activation batches back into a single activation matrix, just call merge_activations when loading the activations (e.g., activations = merge_activations(PATH)).

OpenAI's CLIP models

CLIP

[Blog] [Paper] [Model Card] [Colab]

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

Citation

If you use this GitHub repository (or any modules associated with it), we would grately appreciate to cite our preprint as follows:

@article{Muttenthaler_2021,
	author = {Muttenthaler, Lukas and Hebart, Martin N.},
	title = {THINGSvision: a Python toolbox for streamlining the extraction of activations from deep neural networks},
	elocation-id = {2021.03.11.434979},
	year = {2021},
	doi = {10.1101/2021.03.11.434979},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2021/03/12/2021.03.11.434979},
	eprint = {https://www.biorxiv.org/content/early/2021/03/12/2021.03.11.434979.full.pdf},
	journal = {bioRxiv}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thingsvision-0.7.5.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

thingsvision-0.7.5-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file thingsvision-0.7.5.tar.gz.

File metadata

  • Download URL: thingsvision-0.7.5.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for thingsvision-0.7.5.tar.gz
Algorithm Hash digest
SHA256 26e3bbeb86a4cea8365b595588ba1852432739283923da8014046ea90b04b2c2
MD5 b2c3ab85ea2c88afb7fb521c98057de5
BLAKE2b-256 88ce910739bf25576c39e2661b932d77a95f6b5fe5002a26b9bc94912819cd98

See more details on using hashes here.

Provenance

File details

Details for the file thingsvision-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: thingsvision-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for thingsvision-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 43e108656c70ead286640d6f7ddc7bb874957cd5fc8fa2ef41b2af1a178497b6
MD5 d9bb3136670b0bddd7c2ff6b9eb9327f
BLAKE2b-256 2a57d147bf8c16dfa063ad99ff760b3ad660cf8b74b5080ed11e39f909046c10

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page