Psychometric testing on Automatic Speech Recognition systems
Project description
HumanlikeHearing
A Python package for applying a range of psychometric tests on automatic speech recognition (ASR) systems. For more information on the psychometric tests and the ASR systems this toolbox supports, see our accompanying paper:
The Psychometrics of Automatic Speech Recognition Lotte Weerts, Stuart Rosen, Claudia Clopath, Dan F. M. Goodman bioRxiv 2021.04.19.440438; doi: https://doi.org/10.1101/2021.04.19.440438
Installation
The easiest way to install the toolbox is by installing the latest stable release that lives on PyPI:
pip install humanlikehearing
To ensure all dependencies are correctly installed, we recommend using Anaconda to install numpy and scipy beforehand.
To build the toolbox from source use:
python setup.py build
python setup.py install
If your installation went well, you should now be able to execute the demo script run.py
:
python run.py \
--asr_system_name TestASR
--dataset TestDataSet
--data_path .
--results_folder ../results
--sentences_per_condition 1
IMPORTANT: installing the toolbox DOES NOT install any of the automatic speech recognition systems - the sample script will run a dummy ASR system that always prints 'hello world'.
Prepare ASR systems
By default, no ASR systems are included in the toolbox. However, the toolbox provides support for specific versions of three freely available ASR systems. If you just want to quickly test out the toolbox, we recommend installing Mozilla DeepSpeech v0.6.1, as it is the easiest to install.
After installation, you can start running experiments by setting the --asr_system_name and --model_path accordingly:
python examples/run.py \
--asr_system_name <ASR CLASS NAME>
--model_path <PATH TO ASR MODEL FILE>
--dataset TestDataSet
--data_path .
--results_folder ../results
--sentences_per_condition 1
MozillaDeepSpeech (LSTM model)
Installation instructions can be found on https://deepspeech.readthedocs.io/en/v0.6.1/USING.html
This code assumes the model follows Mozilla DeepSpeech version 6.1 and may not work for later models! When defining model_path
refer to the unzipped directory (e.g. /path/to/downloads/deepspeech-0.6.1-models
).
Vosk's Kaldi nnet3 model (DNN-HMM model)
Installation instructions can be found on https://alphacephei.com/vosk/install
The Vosk model used in the paper is vosk-model-en-us-daanzu-20200905 and can be downloaded here: https://alphacephei.com/vosk/models
Note that to be able to run this model, you also need to install Kaldi: http://www.kaldi-asr.org/doc/install.html
When defining model_path
refer to the unzipped directory (e.g. /path/to/downloads/vosk-model-en-us-daanzu-20200905
).
Fairseq's Wav2vec 2.0 (CNN-Transformer model)
Installation instructions can be found on: https://github.com/pytorch/fairseq/tree/828960f5dace4787ad81aeadca60043c907adc67/examples/wav2vec
The Wav2Vec model used in the paper is the Wav2Vec 2.0 Large model trained for 960 hours.
When defining model_path
refer to the .pt
file of the model (e.g. /path/to/downloads/wav2vec_big_960h.pt
). Note that it is assumed that in the same folder, a dict.ltr.txt
file is present. This file can be downloaded here: https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt
Prepare DataSets
The toolbox supports the use of two freely available speech datasets, the ARU speech corpus (which contains recordings of the IEEE sentences) and the LibriSpeech dataset (which contains recordings of audiobooks). We generally recommend the ARU speech corpus for testing as it is most similar to the type of data humans tend to be tested on, and not all experiments are currently compatible with the LibriSpeech dataset (but will be in the future).
ARU Speech Corpus
The ARU dataset can be downloaded here: http://datacat.liverpool.ac.uk/681/
To run an experiment on the ARU speech corpus:
python examples/run.py \
--dataset ARUDataSet
--data_path /your/path/to/ARU_Speech_Corpus_v1_0
--results_folder ../results
--sentences_per_condition 100
LibriSpeech Corpus
The LibriSpeech test data can be downloaded here: https://www.openslr.org/12
Note: We recommend to only use the "test-clean.tar.gz" subset of the LibriSpeech data set, as many freely available ASR systems are trained using LibriSpeech, so testing using the training data will overestimate the ASR performance.
To run an experiment on the LibriSpeech corpus:
python examples/run.py \
--dataset LibriSpeechDataSet
--data_path /your/path/to/test-clean
--results_folder ../results
--sentences_per_condition 100
Run an experiment
To run an experiment, you can either use run.py to load the correct asr system and data set and write the outputs to a results folder. By default, run.py will run all experiments described in the paper:
python examples/run.py \
--asr_system_name <ASR SYSTEM CLASS>
--model_path <PATH TO ASR MODEL>
--dataset <DATA SET CLASS NAME>
--data_path <PATH TO DATA>
--results_folder <RESULTS FOLDER>
--sentences_per_condition 100
Here, --asr_system_name, --model_path, --dataset and --data_path can be defined as described above. The --results_folder indicates the folder in which experimental outcomes will be stored as pandas tables. --sentences_per_condition indicates how many sentences are used per condition. For most experiments, in particular the SRT experiments, you want this number to be at least 20, but closer to 100 will give you a better view on the model performance.
If you only want to run a subset of the experiments or if you want to change any parameters, you can simply edit run.py as desired.
Analyse your experimental results
To view the outcomes of your experiments, locate your experiment folder in your results folder, which are organised as results/test_report_<ASRNAME>_<TIMESTAMP>/<EXPERIMENT CLASS>_<RESULTS TYPE>_<TIMESTAMP>
. Here, <RESULTS_TYPE>
is usually 'standard', but in some cases may indicate a sub-experiment (e.g. a clipping experiment will have a 'peak' and 'center' results type).
To load your experimental outcomes, you can use pandas:
import pandas as pd
results = pd.read_pickle('path/to/experiment/results.pk1')
In most cases, it will be relatively straight forward to analyse the outcomes. However, in the case of speech reception threshold (SRT) experiments, one extra step of analysis is required to obtain the SRTs from the results file. See examples/srt_analysis.ipynb
for an example of how to obtain SRT results.
Citing
If you wish to cite HumanlikeHearing in a scholarly work, please cite the following:
The Psychometrics of Automatic Speech Recognition Lotte Weerts, Stuart Rosen, Claudia Clopath, Dan F. M. Goodman bioRxiv 2021.04.19.440438; doi: https://doi.org/10.1101/2021.04.19.440438
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file humanlikehearing-0.1.0.tar.gz
.
File metadata
- Download URL: humanlikehearing-0.1.0.tar.gz
- Upload date:
- Size: 41.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e6afc57d2d8958bdab3399f1c495e7e3e2c9c5f145c80f846a18dc367b85576 |
|
MD5 | 29c84e323b88d200662cfc188538df30 |
|
BLAKE2b-256 | 1979c5db4032767979ce568ec45afa74b851a389cf2a7a38b0bc4b9a2c15f00e |
File details
Details for the file humanlikehearing-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: humanlikehearing-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca2d12cdde3bcf9f4c08211cee84d40ac4ca11f06a722b40ce2bf5a5a39009d0 |
|
MD5 | 67bc214ace8ef098ea71f93b61334f21 |
|
BLAKE2b-256 | de2b1ec8d99aec835e2b111f720cae2f381dfffb7061b49800a2a39b6784e3c9 |