A pip installable version of the prosody function from jcvazquezc's DisVoice library
Project description
Prosody features
prosody.py
Compute prosody features from continuous speech based on duration, fundamental frequency and energy.
Static or dynamic features can be computed:
Static matrix is formed with 103 features and include
Num Feature Description
Features based on F0
1-6 F0-contour Avg., Std., Max., Min., Skewness, Kurtosis
7-12 Tilt of a linear estimation of F0 for each voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
13-18 MSE of a linear estimation of F0 for each voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
19-24 F0 on the first voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
25-30 F0 on the last voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
Features based on energy
31-34 energy-contour for voiced segments Avg., Std., Skewness, Kurtosis
35-38 Tilt of a linear estimation of energy contour for V segments Avg., Std., Skewness, Kurtosis
39-42 MSE of a linear estimation of energy contour for V segment Avg., Std., Skewness, Kurtosis
43-48 energy on the first voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
49-54 energy on the last voiced segment Avg., Std., Max., Min., Skewness, Kurtosis
55-58 energy-contour for unvoiced segments Avg., Std., Skewness, Kurtosis
59-62 Tilt of a linear estimation of energy contour for U segments Avg., Std., Skewness, Kurtosis
63-66 MSE of a linear estimation of energy contour for U segments Avg., Std., Skewness, Kurtosis
67-72 energy on the first unvoiced segment Avg., Std., Max., Min., Skewness, Kurtosis
73-78 energy on the last unvoiced segment Avg., Std., Max., Min., Skewness, Kurtosis
Features based on duration
79 Voiced rate Number of voiced segments per second
80-85 Duration of Voiced Avg., Std., Max., Min., Skewness, Kurtosis
86-91 Duration of Unvoiced Avg., Std., Max., Min., Skewness, Kurtosis
92-97 Duration of Pauses Avg., Std., Max., Min., Skewness, Kurtosis
98-103 Duration ratios Pause/(Voiced+Unvoiced), Pause/Unvoiced, Unvoiced/(Voiced+Unvoiced), Voiced/(Voiced+Unvoiced), Voiced/Puase, Unvoiced/Pause
The dynamic feature matrix is formed with 13 features computed for each voiced segment and contains:
- 1 Duration of the voiced segment
- 2-7. Coefficients of 5-degree Lagrange polynomial to model F0 contour
- 8-13. Coefficients of 5-degree Lagrange polynomial to model energy contour
Dynamic prosody features are based on Najim Dehak, "Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification", 2007
Notes:
-
The fundamental frequency is computed the PRAAT algorithm. To use the RAPT method, change the "self.pitch method" variable in the class constructor.
-
When Kaldi output is set to "true" two files will be generated, the ".ark" with the data in binary format and the ".scp" Kaldi script file
Runing
Script is called as follows
python prosody.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>
Examples:
Extract features in the command line
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesAst.txt" "true" "true" "txt"
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUst.csv" "true" "true" "csv"
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn.pt" "false" "true" "torch"
python prosody.py "../audios/" "prosodyfeaturesst.txt" "true" "false" "txt"
python prosody.py "../audios/" "prosodyfeaturesst.csv" "true" "false" "csv"
python prosody.py "../audios/" "prosodyfeaturesdyn.pt" "false" "false" "torch"
python prosody.py "../audios/" "prosodyfeaturesdyn.csv" "false" "false" "csv"
KALDI_ROOT=/home/camilo/Camilo/codes/kaldi-master2
export PATH=$PATH:$KALDI_ROOT/src/featbin/
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn" "false" "false" "kaldi"
python prosody.py "../audios/" "prosodyfeaturesdyn" "false" "false" "kaldi"
Extract features directly in Python
from prosody import Prosody
prosody=Prosody()
file_audio="../audios/001_ddk1_PCGITA.wav"
features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
Results:
Prosody analysis from continuous speech static
References
[1]. N., Dehak, P. Dumouchel, and P. Kenny. "Modeling prosodic features with joint factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 15.7 (2007): 2095-2103.
[2]. J. R. Orozco-Arroyave, J. C. Vásquez-Correa et al. "NeuroSpeech: An open-source software for Parkinson's speech analysis." Digital Signal Processing (2017).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file disvoice-prosody-0.0.5.tar.gz
.
File metadata
- Download URL: disvoice-prosody-0.0.5.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba94d03c7f7b66dc526f029339003d1d0e7d0dccaeb89994d0636fd2e98a64bf |
|
MD5 | a018ae6330bcef5ca140ddf24e3bcd22 |
|
BLAKE2b-256 | 86b0d6659644387ea68c391792b3a58a32e1fa7681ba607a7d4e5ef352b868cf |
Provenance
File details
Details for the file disvoice_prosody-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: disvoice_prosody-0.0.5-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a002b58fad5d0863c97777cf65c78c6c47d64a5cc6ad60fd32bb16d64dccfb1 |
|
MD5 | aa21446c6761638adc81b9a17e343372 |
|
BLAKE2b-256 | 83fb88a0fa72adf818862d0ac3296592ca8697777818593698576868071010d5 |