A pip installable version of the phonological function from jcvazquezc's DisVoice library
Project description
Phonological features
Phonological features
phonological.py
Compute phonological features from continuous speech files.
18 descriptors are computed, bases on 18 different phonological classes from the phonet toolkit https://phonet.readthedocs.io/en/latest/?badge=latest
It computes the phonological log-likelihood ratio features from phonet
Static or dynamic matrices can be computed:
Static matrix is formed with 108 features formed with (18 descriptors) x (6 functionals: mean, std, skewness, kurtosis, max, min)
Dynamic matrix is formed with the 18 descriptors computed for frames of 25 ms with a time-shift of 10 ms.
Running
Script is called as follows
python phonological.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>
Examples:
Extract features in the command line
python phonological.py "../audios/001_ddk1_PCGITA.wav" "phonologicalfeaturesAst.txt" "true" "true" "txt"
python phonological.py "../audios/001_ddk1_PCGITA.wav" "phonologicalfeaturesUst.csv" "true" "true" "csv"
python phonological.py "../audios/001_ddk1_PCGITA.wav" "phonologicalfeaturesUdyn.pt" "false" "true" "torch"
python phonological.py "../audios/" "phonologicalfeaturesst.txt" "true" "false" "txt"
python phonological.py "../audios/" "phonologicalfeaturesst.csv" "true" "false" "csv"
python phonological.py "../audios/" "phonologicalfeaturesdyn.pt" "false" "false" "torch"
python phonological.py "../audios/" "phonologicalfeaturesdyn.csv" "false" "false" "csv"
KALDI_ROOT=/home/camilo/Camilo/codes/kaldi-master2
export PATH=$PATH:$KALDI_ROOT/src/featbin/
python phonological.py "../audios/001_ddk1_PCGITA.wav" "phonologicalfeaturesddk1dyn" "false" "false" "kaldi"
python phonological.py "../audios/" "phonologicalfeaturesdyn" "false" "false" "kaldi"
Extract features directly in Python
phonological=Phonological()
file_audio="../audios/001_ddk1_PCGITA.wav"
features1=phonological.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
features2=phonological.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
features3=phonological.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
phonological.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
path_audio="../audios/"
features1=phonological.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
features2=phonological.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
features3=phonological.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
phonological.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
Results:
Phonological analysis ! ! !
References
[1] Vásquez-Correa, J. C., Klumpp, P., Orozco-Arroyave, J. R., & Nöth, E. (2019). Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. In INTERSPEECH (pp. 549-553).
[2] Diez, M., Varona, A., Penagarikano, M., Rodriguez-Fuentes, L. J., & Bordel, G. (2014). On the projection of PLLRs for unbounded feature distributions in spoken language recognition. IEEE Signal Processing Letters, 21(9), 1073-1077.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for disvoice-phonological-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f659d48a5211aa491b38ccb6816a841c985ada1b7dff220f4e9bb91bbbad8cbb |
|
MD5 | 21622d219713a3fef759389c16457002 |
|
BLAKE2b-256 | 57c6421da1166c852cd62f5de3b623fa0b2508b2709a0e768d7010f038661a04 |
Hashes for disvoice_phonological-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2e799bb6e7a99280ab5f3f908439df903e66e5a2ee40b429f7e4ac8d23d63f5 |
|
MD5 | 49865fc5882e729471e62c353302befc |
|
BLAKE2b-256 | ee194058669c238bee123a36ab4ab11da3838391b64c7b0ad29fc0df110de01c |