Skip to main content

A Tool for extracting multimodal features from videos.

Project description

MMSA-Feature Extraction Toolkit

MMSA-Feature Extraction Toolkit extracts multimodal features for Multimodal Sentiment Analysis Datasets. It integrates several commonly used tools for visual, acoustic and text modality. The extracted features are compatible with the MMSA Framework and thus can be used directly. The tool can also extract features for single videos.

This work is included in the ACL-2022 DEMO paper: M-SENA: An Integrated Platform for Multimodal Sentiment Analysis. If you find our work useful, don't hesitate to cite our paper. Thank you!


Features

  • Extract fully customized features for single videos or datasets.
  • Integrate some most commonly used tools, including Librosa, OpenFace, Transformers, etc.
  • Support Active Speaker Detection in case multiple faces exists in a video.
  • Easy to use, provides Python APIs and commandline tools.
  • Extracted features are compatible with MMSA, a unified training & testing framework for Multimodal Sentiment Analysis.

1. Installation

MMSA-Feature Extraction Toolkit is available from PyPI. Due to package size limitation on PyPi, large model files cannot be shipped with the package. Users need to run a post install command to download these files manually. If you can't access Google Drive, please refer to this page for manual download.

# Install package from PyPI
$ pip install MMSA-FET
# Download models & libraries from Google Drive. Use --proxy if needed.
$ python -m MSA_FET install

Note: For the OpenFaceExtractor to work on Linux Platforms, a few system-wide dependancies are needed. See Dependency Installation for more information.

2. Quick Start

MMSA-FET is fairly easy to use. You can either call API in python or use commandline interface. Below is a basic example using python APIs.

Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a label.csv file is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded from BaiduYunDisk.

from MSA_FET import FeatureExtractionTool

# initialize with default librosa config which only extracts audio features
fet = FeatureExtractionTool("librosa")

# alternatively initialize with a custom config file
fet = FeatureExtractionTool("custom_config.json")

# extract features for single video
feature = fet.run_single("input.mp4")
print(feature)

# extract for dataset & save features to file
feature = fet.run_dataset(dataset_dir="~/MOSI", out_file="output/feature.pkl")

The custom_config.json is the path to a custom config file, the format of which is introduced below.

For detailed usage, please read APIs and Command Line Arguments.

3. Config File

MMSA-FET comes with a few example configs which can be used like below.

# Each supported tool has an example config
fet = FeatureExtractionTool(config="librosa")
fet = FeatureExtractionTool(config="opensmile")
fet = FeatureExtractionTool(config="wav2vec")
fet = FeatureExtractionTool(config="openface")
fet = FeatureExtractionTool(config="mediapipe")
fet = FeatureExtractionTool(config="bert")
fet = FeatureExtractionTool(config="roberta")

For customized features, you can:

  1. Edit the default configs and pass a dictionary to the config parameter like the example below:
from MSA_FET import FeatureExtractionTool, get_default_config

# here we only extract audio and video features
config_a = get_default_config('opensmile')
config_v = get_default_config('openface')

# modify default config
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'

# combine audio and video configs
config = {**config_a, **config_v}

# initialize
fet = FeatureExtractionTool(config=config)
  1. Provide a config json file. The below example extracts features of all three modalities. To extract unimodal features, just remove unnecessary sections from the file.
{
  "audio": {
    "tool": "librosa",
    "sample_rate": null,
    "args": {
      "mfcc": {
        "n_mfcc": 20,
        "htk": true
      },
      "rms": {},
      "zero_crossing_rate": {},
      "spectral_rolloff": {},
      "spectral_centroid": {}
    }
  },
  "video": {
    "tool": "openface",
    "fps": 25,
    "average_over": 3,
    "args": {
      "hogalign": false,
      "simalign": false,
      "nobadaligned": false,
      "landmark_2D": true,
      "landmark_3D": false,
      "pdmparams": false,
      "head_pose": true,
      "action_units": true,
      "gaze": true,
      "tracked": false
    }
  },
  "text": {
    "model": "bert",
    "device": "cpu",
    "pretrained": "models/bert_base_uncased",
    "args": {}
  }
}

4. Supported Tools & Features

4.1 Audio Tools

4.2 Video Tools

  • OpenFace (link)

    Supports all features in OpenFace's FeatureExtraction binary, including: facial landmarks in 2D and 3D, head pose, gaze related, facial action units, HOG binary files. Details of these features can be found in the OpenFace Wiki here and here. Detailed configurations can be found here.

  • MediaPipe (link)

    Supports face mesh and holistic(face, hand, pose) solutions. Detailed configurations can be found here.

  • TalkNet(link)

    TalkNet is used to support Active Speaker Detection in case there are multiple human faces in the video.

4.3 Text Tools

  • BERT (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

  • XLNet (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MMSA-FET-0.3.1.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MMSA_FET-0.3.1-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file MMSA-FET-0.3.1.tar.gz.

File metadata

  • Download URL: MMSA-FET-0.3.1.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for MMSA-FET-0.3.1.tar.gz
Algorithm Hash digest
SHA256 00bf70e9989dfb16327cd1cf95e6dd799146921a70fa17e372265977997444a0
MD5 fee9f98cf2812845080718c419baeb43
BLAKE2b-256 b049592ca21796a31fe1da0620cd4e3156c01e37874cdbb90c110319f5584665

See more details on using hashes here.

File details

Details for the file MMSA_FET-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: MMSA_FET-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for MMSA_FET-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 21dfa2c4bff1ef013bf08c617d0f62917a14db665c879c6da1b1827cb33199e4
MD5 f6ae01bcc6bc1059bbbb26bba6c92fb9
BLAKE2b-256 4a55e5d10ec161ea62ca92581297ff7bf7721a421c55aaaee8043a09d6240936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page