thingsvision

Extracting image features from state-of-the-art neural networks for Computer Vision made easy

These details have not been verified by PyPI

Project links

Homepage

Project description

Model collection

Features can be extracted for all models in torchvision, Keras, timm, custom models (VGG-16, Resnet50, Inception_v3 and Alexnet) trained on Ecoset, each of the many CORnet versions and both CLIP variants (clip-ViT and clip-RN).

Note that you have to use the respective model name (str). For example, if you want to use VGG16 from torchvision, use vgg16 as the model name and if you want to use VGG16 from TensorFlow/Keras, use the model name VGG16. You can further specify the model source by setting the source parameter (e.g., timm, torchvision, keras).

For the correct abbreviations of torchvision models have a look here. For the correct abbreviations of CORnet models look here. To separate the string cornet from its variant (e.g., s, z) use a hyphen instead of an underscore (e.g., cornet-s, cornet-z).

PyTorch examples: alexnet, resnet18, resnet50, resnet101, vit_b_16, vit_b_32, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn, cornet-s, clip-ViT

Environment Setup

We recommend to create a new conda environment with Python version 3.7, 3.8, or 3.9 before using thingsvision. Check out the environment.yml file in envs, if you want to create a conda environment via yml. Activate the environment and run the following pip command in your terminal.

$ pip install --upgrade thingsvision

You have to download files from the parent folder of this repository, if you want to extract network activations for THINGS. Simply download the shell script get_files.sh from this repo and execute it as follows (the shell script will automatically do file downloading and moving for you):

$ wget https://raw.githubusercontent.com/ViCCo-Group/THINGSvision/master/get_files.sh (Linux)
$ curl -O https://raw.githubusercontent.com/ViCCo-Group/THINGSvision/master/get_files.sh (macOS)
$ bash get_files.sh

Google Colab

Alternatively, you can use Google Colab to play around with thingsvision by uploading your image data to Google Drive. You can find the jupyter notebook using PyTorch here and the TensorFlow example here.

IMPORTANT NOTES:

There exist four different sources from which neural network models and their (pretrained) weights can be downloaded. You can define the source of a model using the source argument. Possible sources are torchvision, keras, timm, and custom (e.g., source = torchvision).
If you happen to use the THINGS image database, make sure to correctly unzip all zip files (sorted from A-Z), and have all object directories stored in the parent directory ./images/ (e.g., ./images/object_xy/) as well as the things_concepts.tsv file stored in the ./data/ folder. bash get_files.sh does the latter for you. Images, however, must be downloaded from the THINGS database Main subfolder. The download is around 5GB.

Go to https://osf.io/jum2f/files/
Select Main folder and click on "Download as zip" button (top right).
Unzip contained object_images_*.zip file using the password (check the description.txt file for details). For example:
```
for fn in object_images_*.zip; do unzip -P the_password $fn; done
```

Features can be extracted for every layer for all timm, torchvision, TensorFlow, CORnet and CLIP models.
The script automatically extracts features for the specified model and module.
If you happen to extract hidden unit activations for many images, it is possible to run into MemoryErrors. To circumvent such problems, a helper function called split_activations will split the activation matrix into several batches, and stores them in separate files. For now, the split parameter is set to 10. Hence, the function will split the activation matrix into 10 files. This parameter can, however, easily be modified in case you need more (or fewer) splits. To merge the separate activation batches back into a single activation matrix, just call merge_activations when loading the activations (e.g., activations = merge_activations(PATH)).

Extract features for a specific layer of a state-of-the-art `torchvision`, `timm`, `TensorFlow`, `CORnet`, or `CLIP` model

The following examples demonstrate how to load a model with PyTorch or TensorFlow/Keras and how to subsequently extract features. Please keep in mind that the model names as well as the layer names depend on the backend you want to use. If you use PyTorch, you will need to use these model names. If you use Tensorflow, you will need to use these model names. You can find the layer names by using extractor.show_model().

Example call for AlexNet with PyTorch:

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'alexnet'
source = 'torchvision'
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source)
module_name = extractor.show_model()

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

#Enter part of the model for which you would like to extract features:

(e.g., "features.10")

dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=True,
				clip=False,
)
save_features(features, out_path='path/to/features', file_format='npy')

Example call for CLIP with PyTorch:

Note, that the vision model has to be defined in the model_parameters dictionary with the variant key. You can either use ViT-B/32 or RN50.

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader
from thingsvision.core.extraction import center_features

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'clip'
module_name = 'visual'
source = 'custom'
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initialize extractor module
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source, model_parameters={'variant': 'ViT-B/32'})
dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=False,
				clip=True,
)
features = center_features(features)
save_features(features, out_path='path/to/features', file_format='npy')

Example call for Open CLIP with PyTorch:

Note, that the vision model and the dataset that was used for training, have to be defined in the model_parameters dictionary with the variant and dataset keys. Possible values can be found in the Open CLIP pretrained models list.

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader
from thingsvision.core.extraction import center_features

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'OpenCLIP'
module_name = 'visual'
source = 'custom'
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initialize extractor module
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source, model_parameters={'variant': 'ViT-H-14', 'dataset': 'laion2b_s32b_b79k'})
dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=False,
				clip=True,
)
features = center_features(features)
save_features(features, out_path='path/to/features', file_format='npy')

Example call for ViT with PyTorch:

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'vit_b_16'
source = 'torchvision'
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initialize extractor module
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source)
module_name = extractor.show_model()

VisionTransformer(
  (conv_proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
  (encoder): Encoder(
    (dropout): Dropout(p=0.0, inplace=False)
    (layers): Sequential(
      (encoder_layer_0): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_1): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_2): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_3): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_4): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_5): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_6): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_7): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_8): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_9): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_10): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_11): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (linear_1): Linear(in_features=768, out_features=3072, bias=True)
          (act): GELU()
          (dropout_1): Dropout(p=0.0, inplace=False)
          (linear_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout_2): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
  )
  (heads): Sequential(
    (head): Linear(in_features=768, out_features=1000, bias=True)
  )
)

#Enter part of the model for which you would like to extract features:

(e.g., "encoder.layers.encoder_layer_11.mlp.linear_2")

dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=False,
				clip=False,
)
save_features(features, out_path='path/to/features', file_format='npy')

Example call for CORnet with PyTorch:

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'cornet-s'
source = 'custom'
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initialize extractor module
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source)
module_name = extractor.show_model()

Sequential(
  (V1): Sequential(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (nonlin1): ReLU(inplace=True)
    (pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (norm2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (nonlin2): ReLU(inplace=True)
    (output): Identity()
  )
  (V2): CORblock_S(
    (conv_input): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(128, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (V4): CORblock_S(
    (conv_input): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(256, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (IT): CORblock_S(
    (conv_input): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (skip): Conv2d(512, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (norm_skip): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin1): ReLU(inplace=True)
    (conv2): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (nonlin2): ReLU(inplace=True)
    (conv3): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (nonlin3): ReLU(inplace=True)
    (output): Identity()
    (norm1_0): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_0): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm1_1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm2_1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (norm3_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder): Sequential(
    (avgpool): AdaptiveAvgPool2d(output_size=1)
    (flatten): Flatten()
    (linear): Linear(in_features=512, out_features=1000, bias=True)
    (output): Identity()
  )
)

#Enter part of the model for which you would like to extract features:

(e.g., "decoder.flatten")

dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=False,
				clip=False,
)
save_features(features, out_path='path/to/features', file_format='npy')

Example call for VGG16 with TensorFlow:

import torch
from thingsvision import Extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

root='path/to/root/img/directory' # (e.g., './images/)
model_name = 'VGG16'
module_name = 'block1_conv1'
source = 'keras' # TensorFlow backend
batch_size = 64
class_names = None  # optional list of class names for class dataset
file_names = None # optional list of file names according to which features should be sorted

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# initialize extractor module
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, source=source)
dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=extractor.get_transformations(),
        class_names=class_names,
        file_names=file_names,
)
batches = DataLoader(dataset=dataset, batch_size=batch_size, backend=extractor.backend)
features = extractor.extract_features(
				batches=batches,
				module_name=module_name,
				flatten_acts=False,
				clip=False,
)
save_features(features, out_path='path/to/features', file_format='npy')

Optional Center Cropping

Center cropping is used by default but can be deactivated by turning off the apply_center_crop argument of the get_transformations method.

root = 'path/to/images'
apply_center_crop = False
dataset = ImageDataset(
        root=root,
        out_path='path/to/features',
        backend=extractor.backend,
        transforms=model.get_transformations(apply_center_crop=apply_center_crop),
        class_names=class_names,
        file_names=file_names,
)

Extract features from custom models

If you want to use a custom model from the custom_models directory, you need to use their class name (e.g., VGG16_ecoset) as the model name.

from thingsvision import Extractor
model_name = 'VGG16_ecoset'
source = 'custom'
extractor = Extractor(model_name, pretrained=True, model_path=None, device=device, custom=custom)

Representational Similarity Analysis (RSA)

Compare representational (dis-)similarity matrices (RDMs) corresponding to model features and human representations (e.g., fMRI recordings).

from thingsvision.core.rsa import compute_rdm, correlate_rdms
rdm_dnn = compute_rdm(features, method='correlation')
corr_coeff = correlate_rdms(rdm_dnn, rdm_human, correlation='pearson')

Centered Kernel Alignment (CKA)

Perform CKA to compare image features of two different model architectures for the same layer, or two different layers of the same architecture.

from thingsvision.core.cka import CKA
m = # number of images (e.g., features_i.shape[0])
kernel = 'linear'
cka = CKA(m=m, kernel=kernel)
rho = cka.compare(X=features_i, Y=features_j)

OpenAI's CLIP models

CLIP

[Blog] [Paper] [Model Card] [Colab]

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

Adding custom models

If you want to use your own model and/or want to make it public, you just need to implement a class inheriting from the custom_models/custom.py:Custom class and implement the create_model method. There you can build/download the model and its weights. The constructors expects a device (str) and a kwargs (dict) where you can put model parameters. The backend attribute needs to be set to either pt (PyTorch) or tf (Tensorflow). The create_model method needs to return the model and an optional preprocessing method. If no preprocessing is set, the ImageNet default preprocessing is used. Afterwards you can put the file in the custom_models directory and create a pull request to include the model in the official GitHub repository.

from thingsvision.custom_models.custom import Custom
import torchvision.models as torchvision_models
import torch

class VGG16_ecoset(Custom):
    def __init__(self, device, **kwargs) -> None:
        super().__init__(device)
        self.backend = 'pt'
        self.preprocess = None

    def create_model(self):
          model = torchvision_models.vgg16_bn(pretrained=False, num_classes=565)
          path_to_weights = 'https://osf.io/fe7s5/download'
          state_dict = torch.hub.load_state_dict_from_url(path_to_weights, map_location=self.device)
          model.load_state_dict(state_dict)
          return model, self.preprocess

Using HDF5 datasets (e.g. NSD stimuli)

You can also extract features for images stored in HDF5 dataset. For this you can simply replace ImageDataset with HDF5Dataset, providing the path to the HDF5 file as hdf5_fp and the name of the dataset containing the images as img_ds_key.

Optionally, you can specify which images to extract features for by providing a list of indices as img_indices, otherwise features for all images will be extracted.

The following examples shows how to extract features for images of the NSD stimuli dataset shown to subject 1:

from thingsvision.utils.data import HDF5Dataset

# get indices of all 10000 images shown to first subject
img_indices = np.unique(
    experiment['subjectim'][:, experiment['masterordering'][0] - 1][0]
)

dataset = HDF5Dataset(
    hdf5_fp="<path_to_nsd>/nsddata_stimuli/stimuli/nsd_stimuli.hdf5",
    img_ds_key="imgBrick",
    transforms=extractor.get_transformations(),
    backend=extractor.backend,
    img_indices=img_indices
)

Citation

If you use this GitHub repository (or any modules associated with it), we would grately appreciate to cite our paper as follows:

@article{Muttenthaler_2021,
	author = {Muttenthaler, Lukas and Hebart, Martin N.},
	title = {THINGSvision: A Python Toolbox for Streamlining the Extraction of Activations From Deep Neural Networks},
	journal ={Frontiers in Neuroinformatics},
	volume = {15},
	pages = {45},
	year = {2021},
	url = {https://www.frontiersin.org/article/10.3389/fninf.2021.679838},
	doi = {10.3389/fninf.2021.679838},
	issn = {1662-5196},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.6.10

Sep 11, 2024

2.6.9

Sep 9, 2024

2.6.8

Jun 21, 2024

2.6.7

May 19, 2024

2.6.6

May 16, 2024

2.6.5

May 16, 2024

2.6.4

May 2, 2024

2.6.3

Apr 29, 2024

2.6.2

Apr 24, 2024

2.6.1

Apr 22, 2024

2.6.0

Apr 22, 2024

2.5.4

Apr 16, 2024

2.5.3

Apr 8, 2024

2.5.2

Apr 5, 2024

2.5.1

Apr 4, 2024

2.5.0

Mar 28, 2024

2.4.2

Mar 19, 2024

2.4.1

Aug 9, 2023

2.4.0

Aug 8, 2023

2.3.20

Aug 4, 2023

2.3.19

Aug 4, 2023

2.3.18

Jul 26, 2023

2.3.17

Jul 11, 2023

2.3.16

Jun 29, 2023

2.3.15

Jun 29, 2023

2.3.14

Apr 30, 2023

2.3.13

Mar 6, 2023

2.3.12

Mar 6, 2023

2.3.11

Mar 6, 2023

2.3.10

Mar 6, 2023

2.3.9

Mar 6, 2023

2.3.8

Mar 5, 2023

2.3.7

Mar 2, 2023

2.3.6

Mar 2, 2023

2.3.5

Mar 2, 2023

2.3.4

Mar 2, 2023

2.3.3

Mar 2, 2023

2.3.2

Mar 2, 2023

2.3.1

Mar 2, 2023

2.3.0

Mar 2, 2023

2.2.24

Feb 24, 2023

2.2.23

Feb 22, 2023

2.2.22

Feb 17, 2023

2.2.21

Feb 15, 2023

2.2.20

Feb 15, 2023

2.2.19

Feb 15, 2023

2.2.18

Jan 18, 2023

2.2.17

Jan 1, 2023

2.2.16

Dec 20, 2022

2.2.15

Dec 20, 2022

2.2.14

Dec 20, 2022

2.2.13

Dec 13, 2022

2.2.12

Dec 12, 2022

2.2.11

Dec 6, 2022

2.2.10

Nov 17, 2022

2.2.9

Nov 17, 2022

2.2.8

Nov 16, 2022

2.2.7

Nov 15, 2022

2.2.6

Nov 15, 2022

2.2.5

Nov 11, 2022

2.2.4

Nov 9, 2022

2.2.3

Nov 2, 2022

2.2.2

Oct 27, 2022

2.2.1

Oct 27, 2022

2.2.0

Oct 26, 2022

2.1.4

Sep 26, 2022

2.1.3

Sep 26, 2022

2.1.2

Sep 26, 2022

2.1.1

Sep 26, 2022

2.1.0

Sep 25, 2022

2.0.12

Sep 23, 2022

2.0.11

Sep 23, 2022

2.0.10

Sep 22, 2022

This version

2.0.9

Sep 22, 2022

2.0.8

Sep 20, 2022

2.0.7

Sep 12, 2022

2.0.6

Sep 1, 2022

2.0.5

Aug 31, 2022

2.0.4

Aug 30, 2022

2.0.3

Aug 25, 2022

2.0.2

Aug 23, 2022

2.0.1

Aug 22, 2022

2.0.0

Aug 22, 2022

1.6.2

Aug 18, 2022

1.6.1

Aug 11, 2022

1.6.0

Aug 11, 2022

1.5.0

Aug 3, 2022

1.4.5

Jul 14, 2022

1.4.4

Jun 15, 2022

1.4.3

Apr 19, 2022

1.4.2

Mar 17, 2022

1.4.1

Feb 8, 2022

1.4.0

Feb 8, 2022

1.3.4

Feb 6, 2022

1.3.3

Feb 2, 2022

1.3.2

Feb 2, 2022

1.3.1

Feb 2, 2022

1.3.0

Feb 2, 2022

1.2.7

Feb 2, 2022

1.2.6

Feb 2, 2022

1.2.5

Jan 31, 2022

1.2.4

Jan 31, 2022

1.2.3

Jan 31, 2022

1.2.2

Jan 30, 2022

1.2.1

Jan 30, 2022

1.2.0

Jan 26, 2022

1.1.7

Jan 26, 2022

1.1.6

Jan 25, 2022

1.1.5

Jan 9, 2022

1.1.4

Oct 8, 2021

1.1.3

Oct 5, 2021

1.1.2

Oct 5, 2021

1.1.1

Aug 10, 2021

1.1.0

Aug 2, 2021

1.0.2

Jul 28, 2021

1.0.1

Jul 14, 2021

1.0.0

Jul 14, 2021

0.9.9

Jul 13, 2021

0.9.8

Jul 13, 2021

0.9.6

Jul 12, 2021

0.9.5

Jul 12, 2021

0.9.4

Jul 3, 2021

0.9.3

Jul 3, 2021

0.9.2

Jul 1, 2021

0.9.1

Jul 1, 2021

0.9.0

Jul 1, 2021

0.8.9

Jun 30, 2021

0.8.8

Jun 24, 2021

0.8.7

Jun 24, 2021

0.8.6

Jun 24, 2021

0.8.5

May 14, 2021

0.8.4

Apr 8, 2021

0.8.3

Mar 24, 2021

0.8.2

Mar 23, 2021

0.8.1

Mar 23, 2021

0.8.0

Mar 23, 2021

0.7.9

Mar 23, 2021

0.7.8

Mar 23, 2021

0.7.7

Mar 22, 2021

0.7.6

Mar 19, 2021

0.7.5

Mar 19, 2021

0.7.4

Mar 19, 2021

0.7.3

Mar 19, 2021

0.7.2

Mar 19, 2021

0.7.1

Mar 19, 2021

0.7.0

Mar 19, 2021

0.6.9

Mar 18, 2021

0.6.8

Mar 18, 2021

0.6.7

Mar 18, 2021

0.6.6

Mar 16, 2021

0.6.5

Mar 11, 2021

0.6.4

Mar 9, 2021

0.6.3

Mar 4, 2021

0.6.2

Mar 4, 2021

0.6.1

Mar 4, 2021

0.6.0

Mar 3, 2021

0.5.9

Mar 2, 2021

0.5.8

Mar 2, 2021

0.5.7

Feb 26, 2021

0.5.6

Feb 26, 2021

0.5.5

Feb 25, 2021

0.5.4

Feb 25, 2021

0.5.2

Feb 22, 2021

0.5.1

Feb 22, 2021

0.5.0

Feb 22, 2021

0.4.9

Feb 22, 2021

0.4.8

Feb 22, 2021

0.4.7

Feb 17, 2021

0.4.6

Feb 17, 2021

0.4.5

Feb 15, 2021

0.4.4

Feb 15, 2021

0.4.3

Feb 12, 2021

0.4.2

Feb 11, 2021

0.4.1

Feb 11, 2021

0.4.0

Feb 10, 2021

0.3.9

Feb 10, 2021

0.3.8

Feb 10, 2021

0.3.7

Feb 5, 2021

0.3.6

Feb 4, 2021

0.3.5

Feb 4, 2021

0.3.4

Feb 4, 2021

0.3.3

Feb 4, 2021

0.3.2

Feb 4, 2021

0.3.1

Feb 4, 2021

0.3.0

Feb 4, 2021

0.2.9

Feb 4, 2021

0.2.8

Feb 4, 2021

0.2.7

Feb 4, 2021

0.2.6

Feb 4, 2021

0.2.5

Feb 4, 2021

0.2.4

Feb 4, 2021

0.2.3

Jan 29, 2021

0.2.2

Jan 29, 2021

0.2.1

Jan 29, 2021

0.2.0

Jan 29, 2021

0.1.9

Jan 29, 2021

0.1.8

Jan 28, 2021

0.1.7

Jan 23, 2021

0.1.6

Jan 22, 2021

0.1.5

Jan 22, 2021

0.1.4

Jan 22, 2021

0.1.3

Jan 22, 2021

0.1.2

Jan 22, 2021

0.1.1

Jan 22, 2021

0.1.0

Jan 22, 2021

0.0.9

Jan 22, 2021

0.0.8

Jan 22, 2021

0.0.7

Jan 22, 2021

0.0.6

Jan 22, 2021

0.0.5

Jan 22, 2021

0.0.4

Jan 22, 2021

0.0.3

Jan 22, 2021

0.0.2

Jan 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thingsvision-2.0.9.tar.gz (38.0 kB view details)

Uploaded Sep 22, 2022 Source

Built Distribution

thingsvision-2.0.9-py3-none-any.whl (82.7 kB view details)

Uploaded Sep 22, 2022 Python 3

File details

Details for the file thingsvision-2.0.9.tar.gz.

File metadata

Download URL: thingsvision-2.0.9.tar.gz
Upload date: Sep 22, 2022
Size: 38.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.10

File hashes

Hashes for thingsvision-2.0.9.tar.gz
Algorithm	Hash digest
SHA256	`b488e4d5eb3ea054cac627288bfd88039962735cf37fdf5c06cd7f0296af3e6e`
MD5	`320a792b146b0c9a3c4284c918ce763c`
BLAKE2b-256	`ba27ed51af9e97f6a3bdcc9269f7290714ab790f118199d1ba8ea542a993e516`

See more details on using hashes here.

Provenance

File details

Details for the file thingsvision-2.0.9-py3-none-any.whl.

File metadata

Download URL: thingsvision-2.0.9-py3-none-any.whl
Upload date: Sep 22, 2022
Size: 82.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.10

File hashes

Hashes for thingsvision-2.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95eb30d163a8a92653788da5c781c44cdf4c9b938eb9dd344b4c5610e82388e7`
MD5	`dc3697ecd381b5fc3d9898d78b6e9ba5`
BLAKE2b-256	`33fa437d38b06051eff0888d4645a236891e1a7fdf826ea161ea20da17b14148`

See more details on using hashes here.

thingsvision 2.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Model collection

Environment Setup

Google Colab

IMPORTANT NOTES:

Extract features for a specific layer of a state-of-the-art torchvision, timm, TensorFlow, CORnet, or CLIP model

Example call for AlexNet with PyTorch:

Example call for CLIP with PyTorch:

Example call for Open CLIP with PyTorch:

Example call for ViT with PyTorch:

Example call for CORnet with PyTorch:

Example call for VGG16 with TensorFlow:

Optional Center Cropping

Extract features from custom models

Representational Similarity Analysis (RSA)

Centered Kernel Alignment (CKA)

OpenAI's CLIP models

CLIP

Adding custom models

Using HDF5 datasets (e.g. NSD stimuli)

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Extract features for a specific layer of a state-of-the-art `torchvision`, `timm`, `TensorFlow`, `CORnet`, or `CLIP` model