Parameter Efficient Fine-tuning on Speech Emotion Recognition.

These details have not been verified by PyPI

Project links

Project description

PEFT-SER [Paper Link] was accepted to 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023.

This work include implementations for PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models. PEFT-SER is an open source project for researchers exploring SER applications with using parameter efficient finetuning methods.

This python package provides checkpoints that was trained on the combined data of IEMOCAP, MSP-Improv, MSP-Podcast, and CREMA-D, using the following models that shows promising results:

Whisper Tiny
Whisper Base
Whisper Small
Massively Multilingual Speech (MMS)
WavLM Base+
WavLM Large

We provide the checkpoints with LoRa rank of 16 according to our paper, as LoRa in general provides the best performance in PEFT SER. You can start to use our model within 5 lines of code. The package is under license, and it does not support commercial use.

1. Installation

pip install peft-ser

2. Model Loading

# whisper style loading
import peft_ser
model = peft_ser.load_model("whisper-base-lora-16-conv")

data = torch.zeros([1, 16000])
output = model(data)

a. Output mapping

The output emotion mappings are: {0: "Neutral", 1: "Angry", 2: "Sad", 3: "Happy"}. We would add a version for 6-emotion later.

b. Training details

For all the released models, we train/evaluate with the same data. Unlike the ACII paper where the audio was restricted to 6s, these open-release models support the audio duration to the maximum of 10s for broader use cases. We also combine the convolutional output along with the transformer encodings for fine-tuning, as we find this further increase the model performance. We used a fix seed of 8, training epoch of 30, and learning rate of 2.5x10e-4.

c. Training/validation/test splits for reproducing the results

The validation set: Session 4 of IEMOCAP and Session 4 of MSP-Improv dataset, Validation set of MSP-Podcast dataset, and Speaker 1059-1073

The evaluation set: Session 5 of IEMOCAP and Session 5 of MSP-Improv dataset, Test set of MSP-Podcast dataset, and Speaker 1074-1091

All rest data are used for training.

d. Performance

Pre-trained Model	Test Performance without PEFT	Test Performance with LoRa	PEFT Model Name
Whisper Tiny	62.26	63.48	whisper-tiny-lora-16-conv
Whisper Base	64.39	64.92	whisper-base-lora-16-conv
Whisper Small	65.77	66.01	whisper-small-lora-16-conv
MMS			mms-lora-16-conv
WavLM Base+	63.06	66.11	wavlm-plus-lora-16-conv
WavLM Large	68.54	68.66	wavlm-large-lora-16-conv

e. You are free to explore the use of existing models to further fine-tune on other SER datasets, more challenging tasks like 6-emotion, 8-emotion recognition, and also transfer learning on Arousal/Valence/Dominance.

Citation

Please cite the following that includes the foundation of the code/methods used for experiments.

PEFT-SER

@article{feng2023peft,
  title={PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models},
  author={Feng, Tiantian and Narayanan, Shrikanth},
  journal={arXiv preprint arXiv:2306.05350},
  year={2023}
}

Trust-SER

@article{feng2023trustser,
  title={TrustSER: On the trustworthiness of fine-tuning pre-trained speech embeddings for speech emotion recognition},
  author={Feng, Tiantian and Hebbar, Rajat and Narayanan, Shrikanth},
  journal={arXiv preprint arXiv:2305.11229},
  year={2023}
}

You should also cite all the datasets being used including IEMOCAP, MSP-Improv, MSP-Podcast, and CREMA-D.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.5

Feb 22, 2024

0.0.4

Feb 22, 2024

0.0.3

Aug 4, 2023

0.0.2

Aug 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peft-ser-0.0.5.tar.gz (16.6 kB view details)

Uploaded Feb 22, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

peft_ser-0.0.5-py3-none-any.whl (25.8 kB view details)

Uploaded Feb 22, 2024 Python 3

File details

Details for the file peft-ser-0.0.5.tar.gz.

File metadata

Download URL: peft-ser-0.0.5.tar.gz
Upload date: Feb 22, 2024
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.17

File hashes

Hashes for peft-ser-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`766ddf44438324cc56a1b0b1160ce0f7c9b1c4faa9ca29e8cc5dfb7baae6aea7`
MD5	`ef848b575765bfa420de939d0eb1883d`
BLAKE2b-256	`1798b7025d13c729e46d20bfa5670e4b2c2c392345e637e8ef1a4871703ce583`

See more details on using hashes here.

File details

Details for the file peft_ser-0.0.5-py3-none-any.whl.

File metadata

Download URL: peft_ser-0.0.5-py3-none-any.whl
Upload date: Feb 22, 2024
Size: 25.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.17

File hashes

Hashes for peft_ser-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac286ea91a7f5df86fce0739601af2b46ef48f330c7880ad8f309ddcf0944ead`
MD5	`44a9ee5a1fc320fad09de099c12a4fd7`
BLAKE2b-256	`3bf088f6990f38b1ae0d9ae1f26afa10ccb5faf787f37edebf79c9ee2e84ae5e`

See more details on using hashes here.

peft-ser 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PEFT-SER [Paper Link] was accepted to 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023.

This python package provides checkpoints that was trained on the combined data of IEMOCAP, MSP-Improv, MSP-Podcast, and CREMA-D, using the following models that shows promising results:

We provide the checkpoints with LoRa rank of 16 according to our paper, as LoRa in general provides the best performance in PEFT SER. You can start to use our model within 5 lines of code. The package is under license, and it does not support commercial use.

1. Installation

2. Model Loading

a. Output mapping

b. Training details

c. Training/validation/test splits for reproducing the results

d. Performance

e. You are free to explore the use of existing models to further fine-tune on other SER datasets, more challenging tasks like 6-emotion, 8-emotion recognition, and also transfer learning on Arousal/Valence/Dominance.

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes