Skip to main content

A modular framework for research on multimodal representation learning.

Project description

mmlearn

code checks integration tests license

This project aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating experimentation and research for new techniques.

Quick Start

Installation

Prerequisites

The library requires Python 3.10 or later. We recommend using a virtual environment to manage dependencies. You can create a virtual environment using the following command:

python3 -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate

Installing binaries

To install the pre-built binaries, run:

python3 -m pip install mmlearn
Installation Options You can install optional dependencies to enable additional features. Use one or more of the pip extras listed below to install the desired dependencies.
pip extra Dependencies Notes
vision "torchvision", "opencv-python", "timm" Enables image processing and vision tasks.
audio "torchaudio" Enables audio processing and tasks.
peft "peft" Uses the PEFT library to enable parameter-efficient fine-tuning.

For example, to install the library with the vision and audio extras, run:

python3 -m pip install mmlearn[vision,audio]

Building from source

To install the library from source, run:

git clone https://github.com/VectorInstitute/mmlearn.git
cd mmlearn
python3 -m pip install -e .

Running Experiments

We use Hydra and hydra-zen to manage configurations in the library.

For new experiments, it is recommended to create a new directory to store the configuration files. The directory should have an __init__.py file to make it a Python package and an experiment folder to store the experiment configuration files. This format allows the use of .yaml configuration files as well as Python modules (using structured configs or hydra-zen) to define the experiment configurations.

To run an experiment, use the following command:

mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name

Hydra will compose the experiment configuration from all the configurations in the specified directory as well as all the configurations in the mmlearn package. Note the dot-separated path to the directory containing the experiment configuration files. One can add a path to hydra.searchpath either as a package (pkg://path.to.config.directory) or as a file system (file://path/to/config/directory). However, new configs in mmlearn are added to hydra's external store inside path/to/config/directory/__init__.py which is only interpreted when the config directory is added as a package. Hence, please refrain from using the file:// notation.

Hydra also allows for overriding configurations parameters from the command line. To see the available options and other information, run:

mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> --help

By default, the mmlearn_run command will run the experiment locally. To run the experiment on a SLURM cluster, we use the submitit launcher plugin built into Hydra. The following is an example of how to run an experiment on a SLURM cluster:

mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name

This will submit a job to the SLURM cluster with the specified resources.

Note: After the job is submitted, it is okay to cancel the program with Ctrl+C. The job will continue running on the cluster. You can also add & at the end of the command to run it in the background.

Summary of Implemented Methods

Pretraining Methods Notes

Contrastive Pretraining

Uses the contrastive loss to align the representations from N modalities. Supports sharing of encoders, projection heads or postprocessing modules (e.g. logit/temperature scaling) across modalities. Also supports multi-task learning with auxiliary unimodal tasks applied to specific modalities.
Evaluation Methods Notes

Zero-shot Cross-modal Retrieval

Evaluates the quality of the learned representations in retrieving the k most similar examples from a different modality, using recall@k metric. This is applicable to any number of pairs of modalities at once, depending on memory constraints.

Zero-shot Classification

Evaluates the ability of a pre-trained encoder-based multimodal model to predict classes that were not explicitly seen during training. The new classes are given as text prompts, and the query modality can be any of the supported modalities. Binary and multi-class classification tasks are supported.

Components

Datasets

Every dataset object must return an instance of Example with one or more keys/attributes corresponding to a modality name as specified in the Modalities registry. The Example object must also include an example_index attribute/key, which is used, in addition to the dataset index, to uniquely identify the example.

CombinedDataset

The CombinedDataset object is used to combine multiple datasets into one. It accepts an iterable of torch.utils.data.Dataset and/or torch.utils.data.IterableDataset objects and returns an Example object from one of the datasets, given an index. Conceptually, the CombinedDataset object is a concatenation of the datasets in the input iterable, so the given index can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access, the examples from these datasets are returned in order as they are iterated over.

The CombinedDataset object also adds a dataset_index attribute to the Example object, corresponding to the index of the dataset in the input iterable. Every example returned by the CombinedDataset will have an example_ids attribute, which is instance of Example containing the same keys/attributes as the original example, with the exception of the example_index and dataset_index attributes, with values being a tensor of the dataset_index and example_index.

Dataloading

When dealing with multiple datasets with different modalities, the default collate_fn of torch.utils.data.DataLoader may not work, as it assumes that all examples have the same keys/attributes. In that case, the collate_example_list function can be used as the collate_fn argument of torch.utils.data.DataLoader. This function takes a list of Example objects and returns a dictionary of tensors, with all the keys/attributes of the Example objects.

Contributing

If you are interested in contributing to the library, please see CONTRIBUTING.MD. This file contains many details around contributing to the code base, including are development practices, code checks, tests, and more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmlearn-0.1.0a0.dev8.tar.gz (83.1 kB view details)

Uploaded Source

Built Distribution

mmlearn-0.1.0a0.dev8-py3-none-any.whl (102.9 kB view details)

Uploaded Python 3

File details

Details for the file mmlearn-0.1.0a0.dev8.tar.gz.

File metadata

  • Download URL: mmlearn-0.1.0a0.dev8.tar.gz
  • Upload date:
  • Size: 83.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for mmlearn-0.1.0a0.dev8.tar.gz
Algorithm Hash digest
SHA256 1934de75d7796c5903204bbe1cf7d95e6f1a9b59d9e3cbeac1663ac6e5dad896
MD5 bb5efcc11789a75b6c20664d8ba1b3a8
BLAKE2b-256 840e9f40f9d52366e3481b6136584f84a4523216321a293b5528f112a2883782

See more details on using hashes here.

Provenance

The following attestation bundles were made for mmlearn-0.1.0a0.dev8.tar.gz:

Publisher: publish.yml on VectorInstitute/mmlearn

Attestations:

File details

Details for the file mmlearn-0.1.0a0.dev8-py3-none-any.whl.

File metadata

File hashes

Hashes for mmlearn-0.1.0a0.dev8-py3-none-any.whl
Algorithm Hash digest
SHA256 0d8209ae0558cadba87abc29e18ebcf056d02e8ca0d95039c990508059eb49fb
MD5 90d959e0e3c2f3d3ad952f9055d79ec5
BLAKE2b-256 8e724a7f1250ef46f1113455671a120e347a2885442f9f2d42133d92ca642aa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for mmlearn-0.1.0a0.dev8-py3-none-any.whl:

Publisher: publish.yml on VectorInstitute/mmlearn

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page