Models for the paper 'Analysis of XLS-R for Speech Quality Assessment'.

Project description

xls-r-analysis-sqa

1. Overview

This repository hosts the models for the paper "Analysis of XLS-R for Speech Quality Assessment".

1.1. Performance On Unseen Datasets

Comparison of model performance on each unseen corpus individually (NISQA, IUB) and combined together (Unseen). The metric is RMSE, lower is better.

V1 Results

Model	NISQA	IUB	Unseen
XLS-R 300M Layer24 Bi-LSTM [1]	0.5907	0.5067	0.5323
DNSMOS [2]	0.8718	0.5452	0.6565
MFCC Transformer	0.8280	0.7775	0.7924
XLS-R 300M Layer5 Transformer	0.6256	0.5049	0.5425
XLS-R 300M Layer21 Transformer	0.5694	0.5025	0.5227
XLS-R 300M Layer5+21 Transformer	0.5683	0.4886	0.5129
XLS-R 1B Layer10 Transformer	0.5456	0.5815	0.5713
XLS-R 1B Layer41 Transformer	0.5657	0.4656	0.4966
XLS-R 1B Layer10+41 Transformer	0.5748	0.5288	0.5425
XLS-R 2B Layer10 Transformer	0.6277	0.4899	0.5334
XLS-R 2B Layer41 Transformer	0.5724	0.4897	0.5150
XLS-R 2B Layer10+41 Transformer	0.6036	0.4743	0.5150
Human	0.6738	0.6573	0.6629

V2 Results

UPDATE: the code has been updated to use version 2 of the models. Version 1 used the final model checkpoint by mistake, version 2 uses the checkpoint with the minimum validation loss.

Model	NISQA	IUB	Unseen
XLS-R 300M Layer24 Bi-LSTM [1]	0.5907	0.5067	0.5323
DNSMOS [2]	0.8718	0.5452	0.6565
MFCC Transformer	0.9291	0.7415	0.8003
XLS-R 300M Layer5 Transformer	0.6494	0.5117	0.5550
XLS-R 300M Layer21 Transformer	0.5852	0.4838	0.5152
XLS-R 300M Layer5+21 Transformer	0.5861	0.4768	0.5108
XLS-R 1B Layer10 Transformer	0.6217	0.4763	0.5225
XLS-R 1B Layer41 Transformer	0.5615	0.4646	0.4946
XLS-R 1B Layer10+41 Transformer	0.6024	0.4624	0.5068
XLS-R 2B Layer10 Transformer	0.5227	0.4447	0.4686
XLS-R 2B Layer41 Transformer	0.5295	0.4926	0.5035
XLS-R 2B Layer10+41 Transformer	0.5191	0.4573	0.4760
Human	0.6738	0.6573	0.6629

[1] Tamm, B., Balabin, H., Vandenberghe, R., Van hamme, H. (2022) Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications. Proc. Interspeech 2022, 4083-4087, doi: 10.21437/Interspeech.2022-10147

[2] C. K. A. Reddy, V. Gopal and R. Cutler, "DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 6493-6497, doi: 10.1109/ICASSP39728.2021.9414878.

1.2. Visualization of MOS Predictions

MOS predictions on two unseen datasets: NISQA (top) and IU Bloomington (bottom). Our proposed model based on embeddings extracted from the 10th layer of the pre-trained XLS-R 2B outperforms DNSMOS and the MFCC baseline. The human ACRs are also visualized for the IUB corpus.

1.3. Example Audio Segments

🔊

Excellent (MOS = 4.808)

Audio Sample	Model	Prediction	Error
\|	DNSMOS	3.699	-1.109
	MFCC Transformer	3.497	−1.311
	XLS-R 2B Layer10 Transformer	3.935	-0.873

🔊

Good (MOS = 4.104)

Audio Sample	Model	Prediction	Error
\|	DNSMOS	3.269	-0.835
	MFCC Transformer	2.498	-1.606
	XLS-R 2B Layer10 Transformer	3.793	-0.311

🔊

Fair (MOS = 3.168)

Audio Sample	Model	Prediction	Error
\|	DNSMOS	3.309	+0.141
	MFCC Transformer	3.931	+0.763
	XLS-R 2B Layer10 Transformer	3.080	-0.088

🔊

Poor (MOS = 2.240)

Audio Sample	Model	Prediction	Error
\|	DNSMOS	2.704	+0.464
	MFCC Transformer	1.927	-0.313
	XLS-R 2B Layer10 Transformer	2.284	+0.044

🔊

Bad (MOS = 1.416)

Audio Sample	Model	Prediction	Error
\|	DNSMOS	2.553	+1.137
	MFCC Transformer	1.806	+0.390
	XLS-R 2B Layer10 Transformer	2.312	+0.896

2. Installation

Option A: Install via `pip` (Recommended)

pip install xls-r-sqa

Option B: Install From Source

First, clone the repository.

git clone https://github.com/lcn-kul/xls-r-analysis-sqa.git

Next, install the requirements to a virtual environment of your choice.

cd xls-r-analysis-sqa/
pip3 install -r requirements.txt

3. Truncated XLS-R Models

This code uses truncated XLS-R models. By default, the code will attempt to auto-download the required truncated XLS-R model from Hugging Face whenever you create an E2EModel that uses XLS-R. For example:

from xls_r_sqa.config import XLSR_2B_TRANSFORMER_32DEEP_CONFIG
from xls_r_sqa.e2e_model import E2EModel

model = E2EModel(
    config=XLSR_2B_TRANSFORMER_32DEEP_CONFIG,
    xlsr_layers=10,
    auto_download=True  # <-- default is True
)

If you do not wish to auto-download, or if you would like to choose your own save location, there are two manual approaches:

Download Truncated Models: Clone the truncated XLS-R repositories from Hugging Face (using Git LFS). Follow [these instructions] in xls_r_sqa/models/xls-r-trunc/README.md.
Truncate Full XLS-R Yourself: Download the full pre-trained XLS-R models (see [these instructions] in xls_r_sqa/models/xls-r/README.md) and then run truncate_w2v2.py to create the truncated versions locally.

Warning: The combined size of all truncated XLS-R repos is approximately 15 GB (plus .git overhead, effectively doubling the storage needed). Make sure you have sufficient disk space before downloading or truncating them yourself.

4. Usage

A working example is provided in test_e2e_sqa.py.

5. Citation

@INPROCEEDINGS{10248049,
  author={Tamm, Bastiaan and Vandenberghe, Rik and Van Hamme, Hugo},
  booktitle={2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Analysis of XLS-R for Speech Quality Assessment}, 
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/WASPAA58266.2023.10248049}
}

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xls_r_sqa-0.1.0.tar.gz (15.4 MB view details)

Uploaded Feb 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xls_r_sqa-0.1.0-py3-none-any.whl (15.4 MB view details)

Uploaded Feb 10, 2025 Python 3

File details

Details for the file xls_r_sqa-0.1.0.tar.gz.

File metadata

Download URL: xls_r_sqa-0.1.0.tar.gz
Upload date: Feb 10, 2025
Size: 15.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.16

File hashes

Hashes for xls_r_sqa-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9d27598ba93bf2763fdaf09e1e473fbb55cd36f3dd0b194069b8668f8cbaab68`
MD5	`fee7587f0ceb8e5a2187ac0267ac9914`
BLAKE2b-256	`0163cb3583ba44471a0c662bd64b4ba8bc5b472265ba511271b75d466c48e1f9`

See more details on using hashes here.

File details

Details for the file xls_r_sqa-0.1.0-py3-none-any.whl.

File metadata

Download URL: xls_r_sqa-0.1.0-py3-none-any.whl
Upload date: Feb 10, 2025
Size: 15.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.16

File hashes

Hashes for xls_r_sqa-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ed4587df6f46879bc4ad8773cc58fef80169d809edd5dab63bd9021b1355f0f`
MD5	`4526244ed21a3b31d43f799c98ff1e28`
BLAKE2b-256	`71830c630d81dc5ae183c3d32df7004949db026eeff5872e386ed3b2a820ba08`

See more details on using hashes here.

xls-r-sqa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

xls-r-analysis-sqa

1. Overview

1.1. Performance On Unseen Datasets

1.2. Visualization of MOS Predictions

1.3. Example Audio Segments

2. Installation

Option A: Install via `pip` (Recommended)

Option B: Install From Source

3. Truncated XLS-R Models

4. Usage

5. Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

xls-r-sqa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

xls-r-analysis-sqa

1. Overview

1.1. Performance On Unseen Datasets

1.2. Visualization of MOS Predictions

1.3. Example Audio Segments

2. Installation

Option A: Install via pip (Recommended)

Option B: Install From Source

3. Truncated XLS-R Models

4. Usage

5. Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Option A: Install via `pip` (Recommended)