Skip to main content

General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.

Project description

LinMulT

License python pytorch

General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.

Setup

Install package from PyPi

pip install linmult

Install package from repository root

git clone https://github.com/fodorad/LinMulT
cd LinMulT
pip install -e .
pip install -U -r requirements.txt
python -m unittest

Quick start

Example 1:

Simple transformer encoder with linear attention. The forward pass is performed using an input sequence.

import torch
from linmult import LinT

# input shape: (batch_size, time_dimension, feature_dimension)
x = torch.rand((32, 15, 1024), device='cuda')
model = LinT(input_modality_channels=1024, output_dim=5).cuda()
y_pred_seq = model(x)

# output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])

Example 2:

Multimodal Transformer with Linear Attention. The forward pass is performed using 2 input sequences. Both input sequences have the same time dimension.

import torch
from linmult import LinMulT

# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((32, 15, 1024), device='cuda')
x_2 = torch.rand((32, 15, 160), device='cuda')
model = LinMulT(input_modality_channels=[1024, 160], output_dim=5).cuda()
y_pred_cls, y_pred_seq = model([x_1, x_2])

# 1. output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([32, 5])

# 2. output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])

Example 3:

Multimodal Transformer with Linear Attention. The forward pass is performed using 3 input sequences with different time dimensions.

import torch
from linmult import LinMulT

# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((16, 1500, 25), device='cuda')
x_2 = torch.rand((16, 450, 35), device='cuda')
x_3 = torch.rand((16, 120, 768), device='cuda')
model = LinMulT(input_modality_channels=[25, 35, 768],
                output_dim=5,
                add_time_collapse=True,
                add_self_attention_fusion=False).cuda()
y_pred_cls = model([x_1, x_2, x_3])

# output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([16, 5])

Similar projects using LinMulT

(2023) BlinkLinMulT

LinMulT is trained for blink presence detection and eye state recognition tasks. Our results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.

(2022) PersonalityLinMulT

LinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.

Citation - BibTex

If you found our research helpful or influential please consider citing:

(2023) LinMulT for blink presence detection and eye state recognition:

@article{blinklinmult-fodor23,
  title = {BlinkLinMulT: Transformer-based Eye Blink Detection},
  author = {Fodor, {\'A}d{\'a}m and Fenech, Kristian and L{\H{o}}rincz, Andr{\'a}s},
  journal = {...}
  pages = {1--19},
  year = {2023}
}

(2022) LinMulT for personality trait and sentiment estimation:

@InProceedings{pmlr-v173-fodor22a,
  title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},
  author = {Fodor, {\'A}d{\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\H{o}}rincz, Andr{\'a}s},
  booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},
  pages = {218--241},
  year = {2022},
  editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},
  volume = {173},
  series = {Proceedings of Machine Learning Research},
  month = {16 Oct},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},
  url = {https://proceedings.mlr.press/v173/fodor22a.html}
}

Acknowledgement

The code is inspired by the following two materials:

Multimodal Transformer:

Linear Attention:

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linmult-1.2.0.tar.gz (15.6 kB view hashes)

Uploaded Source

Built Distribution

linmult-1.2.0-py3-none-any.whl (20.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page