Skip to main content

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Project description

NISQA: Speech Quality and Naturalness Assessment

+++ News: The NISQA model has recently been updated to NISQA v2.0. The new version offers multidimensional predictions with higher accuracy and allows for training and finetuning the model.

Speech Quality Prediction:
NISQA is a deep learning model/framework for speech quality prediction. The NISQA model weights can be used to predict the quality of a speech sample that has been sent through a communication system (e.g telephone or video call). Besides overall speech quality, NISQA also provides predictions for the quality dimensions Noisiness, Coloration, Discontinuity, and Loudness to give more insight into the cause of the quality degradation.

TTS Naturalness Prediction:
The NISQA-TTS model weights can be used to estimate the Naturalness of synthetic speech generated by a Voice Conversion or Text-To-Speech system (Siri, Alexa, etc.).

Training/Finetuning:
NISQA can be used to train new single-ended or double-ended speech quality prediction models with different deep learning architectures, such as CNN or DFF -> Self-Attention or LSTM -> Attention-Pooling or Max-Pooling. The provided model weights can also be applied to finetune the trained model towards new data or for transfer-learning to a different regression task (e.g. quality estimation of enhanced speech, speaker similarity estimation, or emotion recognition) .

Speech Quality Datasets:
We provide a large corpus of more than 14,000 speech samples with subjective speech quality and speech quality dimension labels.

Table of Contents

More information about the deep learning model structure, the used training datasets, and the training options, see the NISQA paper and the Wiki.

Installation

Using conda

To install requirements install Anaconda and then use:

conda env create -f env.yml

This will create a new environment with the name "nisqa". Activate this environment to go on:

conda activate nisqa

Using pip

Firstly, ensure that you have installed libsndfile through your Linux distribution's package manager.

Then, run

pip install -r requirements.txt

To use nisqa as a package, you may run

pip install nisqa

Note that this only installs NISQA_lib and NISQA_model package, not the weights or configurations.

Using NISQA

We provide examples for using NISQA to predict the quality of speech samples, to train a new speech quality model, and to evaluate the performance of a trained speech quality model.

There are three different model weights available, the appropriate weights should be loaded depending on the domain:

Model Prediction Output Domain Filename
NISQA (v2.0) Overall Quality, Noisiness, Coloration, Discontinuity, Loudness Transmitted Speech nisqa.tar
NISQA (v2.0) mos only Overall Quality only (for finetuning/transfer learning) Transmitted Speech nisqa_mos_only.tar
NISQA-TTS (v1.0) Naturalness Synthesized Speech nisqa_tts.tar

Prediction

There are three modes available to predict the quality of speech via command line arguments:

  • Predict a single file
  • Predict all files in a folder
  • Predict all files in a CSV table

Important: Select "nisqa.tar" to predict the quality of a transmitted speech sample and "nisqa_tts.tar" to predict the Naturalness of a synthesized speech sample.

To predict the quality of a single .wav file use:

python run_predict.py --mode predict_file --pretrained_model weights/nisqa.tar --deg /path/to/wav/file.wav --output_dir /path/to/dir/with/results

To predict the quality of all .wav files in a folder use:

python run_predict.py --mode predict_dir --pretrained_model weights/nisqa.tar --data_dir /path/to/folder/with/wavs --num_workers 0 --bs 10 --output_dir /path/to/dir/with/results

To predict the quality of all .wav files listed in a csv table use:

python run_predict.py --mode predict_csv --pretrained_model weights/nisqa.tar --csv_file files.csv --csv_deg column_name_of_filepaths --num_workers 0 --bs 10 --output_dir /path/to/dir/with/results

The results will be printed to the console and saved to a csv file in a given folder (optional with --output_dir). To speed up the prediction, the number of workers and batch size of the Pytorch Dataloader can be increased (optional with --num_workers and --bs). In case of stereo files --ms_channel can be used to select the audio channel.

Training

Finetuning / Transfer Learning

To use the model weights to finetune the model on a new dataset, only a CSV file with the filenames and labels is needed. The training configuration is controlled from a YAML file and can be started as follows:

python run_train.py --yaml config/finetune_nisqa.yaml
  • If the NISQA Corpus is used, only two arguments need to updated in the YAML file and you are ready to go: The data_dir to the extracted NISQA_Corpus folder and the output_dir, where the results should be stored.

  • If you use your own dataset or want to load the NISQA-TTS model, some other updates are needed.

    Your CSV file needs to contain at least three columns with the following names

    • db with the individual dataset names for each file
    • filepath_deg filepath to the degraded WAV file, either absolute paths or relative to the data_dir (CSV column name can be changed in YAML)
    • mos with the target labels (CSV column name can be changed in YAML)

    The finetune_nisqa.yaml needs to be updated as follows:

    • data_dir path to the main folder, which contains the CSV file and the datasets
    • output_dir path to output folder with saved model weights and results
    • pretrained_model filename of the pretrained model, either nisqa_mos_only.tar for natural speech or nisqa_tts.tar for synthesized speech
    • csv_file name of the CSV with filepaths and target labels
    • csv_deg CSV column name that contains filepaths (e.g. filepath_deg)
    • csv_mos_train and csv_mos_val CSV column names of the target value (e.g. mos)
    • csv_db_train and csv_db_val names of the datasets you want to use for training and validation. Datasets names must be in the db column.

See the comments in the YAML configuration file and the Wiki (not yet added) for more advanced training options. A good starting point would be to use the NISQA Corpus to get the training started with the standard configuration.

Training a new model

NISQA can also be used as a framework to train new speech quality models with different deep learning architectures. The general model structure is as follows:

  1. Framewise model: CNN or Feedforward network
  2. Time-Dependency model: Self-Attention or LSTM
  3. Pooling: Average, Max, Attention or Last-Step-Pooling

The framewise and time-dependency models can be skipped, for example to train an LSTM model without CNN that uses the last-time step for prediction. Also a second time-dependency stage can be added, for example for LSTM-Self-Attention structure. The model structure can be easily controlled via the YAML configuration file. The training with the standard NISQA model configuration can be started with the NISQA Corpus as follows:

python run_train.py --yaml config/train_nisqa_cnn_sa_ap.yaml

If the NISQA Corpus is used, only the data_dir needs to be updated to the unzipped NISQA_Corpus folder and the output_dir in the YAML file. Otherwise, see the previous finetuning section for updating the YAML file if a custom dataset is applied.

It is also possible to train any other combination of neural networks, for example, to train a model with LSTM instead of Self-Attention, the train_nisqa_cnn_lstm_avg.yaml example configuration file is provided.

To train a double-ended model for full-reference speech quality prediction, the train_nisqa_double_ended.yaml configuration file can be used as an example. See the comments in the YAML files and the Wiki (not yet added) for more details on different possible model structures and advanced training options.

Evaluation

Trained models can be evaluated on a given dataset as follows (can also be used as a conformance test of the model installation):

python run_evaluate.py

Before running, the options and paths inside the Python script run_evaluate.py should be updated. If the NISQA Corpus is used, only the data_dir and output_dir paths need to be adjusted. Besides Pearson's Correlation and RMSE, also an RMSE after first-order polynomial mapping is calculated. If a CSV file with per-condition labels is provided, the script will also output per-condition results and RMSE*. Optionally, correlation diagrams can be plotted. The script should return the same results as in the NISQA paper when it is run on the NISQA Corpus.

NISQA Corpus

The NISQA Corpus includes more than 14,000 speech samples with simulated (e.g. codecs, packet-loss, background noise) and live (e.g. mobile phone, Zoom, Skype, WhatsApp) conditions.

For the download link and more details on the datasets and used source speech samples see the NISQA Corpus Wiki.

Paper and License

The NISQA code is licensed under MIT License.

The model weights (nisqa.tar, nisqa_mos_only.tar, nisqa_tts.tar) are provided under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License

The NISQA Corpus is provided under the original terms of the used source speech and noise samples. More information can be found in the NISQA Corpus Wiki.

Copyright © 2021 Gabriel Mittag
www.qu.tu-berlin.de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nisqa-2.0.post2.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

nisqa-2.0.post2-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file nisqa-2.0.post2.tar.gz.

File metadata

  • Download URL: nisqa-2.0.post2.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for nisqa-2.0.post2.tar.gz
Algorithm Hash digest
SHA256 5be440be043aa69610b3126bf2f84be19c573b3c63b5a6078dce39773584e68b
MD5 8d981c95448b98d4e890d4e4c80d73b0
BLAKE2b-256 665ed9556fa4d85a2ba0876c85aba5e3985b392fec92126ec1c62544a554fe01

See more details on using hashes here.

File details

Details for the file nisqa-2.0.post2-py3-none-any.whl.

File metadata

  • Download URL: nisqa-2.0.post2-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for nisqa-2.0.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 386205dc930aa98114a689f301112852dfe661d81eb4e6e630675ae9acd2c593
MD5 8f5f610893ce73ad6c08f93ab2eec91e
BLAKE2b-256 04ecf5386fbc01ecc3c66f4ce668bad97ea12781f7ba3f81b9bf100a902b823b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page