Skip to main content

A pip installable version of the prosody function from jcvazquezc's DisVoice library

Project description

Prosody features

prosody.py

Compute prosody features from continuous speech based on duration, fundamental frequency and energy.

Static or dynamic features can be computed:

Static matrix is formed with 103 features and include

Num Feature Description

                            Features based on F0

1-6 F0-contour Avg., Std., Max., Min., Skewness, Kurtosis

7-12 Tilt of a linear estimation of F0 for each voiced segment Avg., Std., Max., Min., Skewness, Kurtosis

13-18 MSE of a linear estimation of F0 for each voiced segment Avg., Std., Max., Min., Skewness, Kurtosis

19-24 F0 on the first voiced segment Avg., Std., Max., Min., Skewness, Kurtosis

25-30 F0 on the last voiced segment Avg., Std., Max., Min., Skewness, Kurtosis


                            Features based on energy

31-34 energy-contour for voiced segments Avg., Std., Skewness, Kurtosis

35-38 Tilt of a linear estimation of energy contour for V segments Avg., Std., Skewness, Kurtosis

39-42 MSE of a linear estimation of energy contour for V segment Avg., Std., Skewness, Kurtosis

43-48 energy on the first voiced segment Avg., Std., Max., Min., Skewness, Kurtosis

49-54 energy on the last voiced segment Avg., Std., Max., Min., Skewness, Kurtosis

55-58 energy-contour for unvoiced segments Avg., Std., Skewness, Kurtosis

59-62 Tilt of a linear estimation of energy contour for U segments Avg., Std., Skewness, Kurtosis

63-66 MSE of a linear estimation of energy contour for U segments Avg., Std., Skewness, Kurtosis

67-72 energy on the first unvoiced segment Avg., Std., Max., Min., Skewness, Kurtosis

73-78 energy on the last unvoiced segment Avg., Std., Max., Min., Skewness, Kurtosis


                            Features based on duration

79 Voiced rate Number of voiced segments per second

80-85 Duration of Voiced Avg., Std., Max., Min., Skewness, Kurtosis

86-91 Duration of Unvoiced Avg., Std., Max., Min., Skewness, Kurtosis

92-97 Duration of Pauses Avg., Std., Max., Min., Skewness, Kurtosis

98-103 Duration ratios Pause/(Voiced+Unvoiced), Pause/Unvoiced, Unvoiced/(Voiced+Unvoiced), Voiced/(Voiced+Unvoiced), Voiced/Puase, Unvoiced/Pause


The dynamic feature matrix is formed with 13 features computed for each voiced segment and contains:

  • 1 Duration of the voiced segment
  • 2-7. Coefficients of 5-degree Lagrange polynomial to model F0 contour
  • 8-13. Coefficients of 5-degree Lagrange polynomial to model energy contour

Dynamic prosody features are based on Najim Dehak, "Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification", 2007

Notes:

  1. The fundamental frequency is computed the PRAAT algorithm. To use the RAPT method, change the "self.pitch method" variable in the class constructor.

  2. When Kaldi output is set to "true" two files will be generated, the ".ark" with the data in binary format and the ".scp" Kaldi script file

Runing

Script is called as follows

python prosody.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples:

Extract features in the command line

python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesAst.txt" "true" "true" "txt"
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUst.csv" "true" "true" "csv"
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn.pt" "false" "true" "torch"

python prosody.py "../audios/" "prosodyfeaturesst.txt" "true" "false" "txt"
python prosody.py "../audios/" "prosodyfeaturesst.csv" "true" "false" "csv"
python prosody.py "../audios/" "prosodyfeaturesdyn.pt" "false" "false" "torch"
python prosody.py "../audios/" "prosodyfeaturesdyn.csv" "false" "false" "csv"

KALDI_ROOT=/home/camilo/Camilo/codes/kaldi-master2
export PATH=$PATH:$KALDI_ROOT/src/featbin/
python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn" "false" "false" "kaldi"

python prosody.py "../audios/" "prosodyfeaturesdyn" "false" "false" "kaldi"

Extract features directly in Python

from prosody import Prosody
prosody=Prosody()
file_audio="../audios/001_ddk1_PCGITA.wav"
features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")

Jupyter notebook

Results:

Prosody analysis from continuous speech static Image

Image

References

[1]. N., Dehak, P. Dumouchel, and P. Kenny. "Modeling prosodic features with joint factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 15.7 (2007): 2095-2103.

[2]. J. R. Orozco-Arroyave, J. C. Vásquez-Correa et al. "NeuroSpeech: An open-source software for Parkinson's speech analysis." Digital Signal Processing (2017).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disvoice-prosody-0.0.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

disvoice_prosody-0.0.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file disvoice-prosody-0.0.1.tar.gz.

File metadata

  • Download URL: disvoice-prosody-0.0.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for disvoice-prosody-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4facb83860983bd6136856346e5b7c95220823c4de28ecb6611fae303cfed4fd
MD5 ce0f5c9d620db53857738063dc89fb7c
BLAKE2b-256 67577f41436a8e9fa528064d16574b094ab7680c73972580f532350087667d9e

See more details on using hashes here.

Provenance

File details

Details for the file disvoice_prosody-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: disvoice_prosody-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for disvoice_prosody-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6d3b5226f2245a7c1e0302c5c7737a4cf282cd8da562dd3f09c68442b4996b79
MD5 239b6ed4b04e0e36d543408f7f310759
BLAKE2b-256 b31738c856509a0b83b674c43127d2ff850ea8a4ef4d3dedd29b6b5fafe5ed2a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page