Command-line tool for annotating prosodic features
Project description
AutoRPT: Automatic Rapid Prosody Transcription Tool
AutoRPT is a Python command-line tool designed to automatically annotate prosodic features following the Rapid Prosody Transcription (RPT) protocol. It is currently trained on Standard American English (SAE), with future updates planned to include other language varieties.
Installation Instructions
You can install the tool using pip
pip install praat-autorpt
Alternatively, can find the latest release in the releases tab. Download the wheel, navigate to the file in your terminal, and install it with pip install praat_autorpt-x.x.x-py3-none-any.whl.
Before running, install the SpacY model pipeline.
python -m spacy download en_core_web_sm
Afterwards, you will then be able to run autorpt from the command line.
About the Project
This project is being developed by a team of undergraduate and graduate students, led by PI Associate Professor Jonathan Howell at Montclair State University. It is produced in conjunction with research funded by NSF grant 2316030, focusing on identifying the prosodic features of “Three Varieties of English in NJ”. The tool is designed to streamline the annotation of prosodic events using Rapid Prosodic Transcription (RPT), as outlined by Cole et al. (2017).
LSTM
This module runs a Long Short-Term Memory (LSTM) framework and is focused on the bootstrapping of annotations so that they can be reviewed by human annotators. There is also an RNN version, not currently maintained, available at /RNN.
Why We Built This Tool
- Limited Corpora for Specific Varieties: Few corpora (with the exception of CORAAL) include African American English (AAE) and Latine English (LE).
- Lack of Prosodic Annotations: Even fewer corpora provide prosodic annotations for these varieties of English.
- Incomplete Annotation Schemes: Current annotation schemes often do not account for the unique prosodic features of AAE and LE.
- Challenges in Crowdsourcing: Annotating prosody through crowdsourcing methods can be difficult and inconsistent.
Corpus and Training
AutoRPT was originally trained on the Boston University Radio Corpus, which serves as the foundation for the tool’s prosodic annotations. As research progresses, the model will be adapted to annotate prosodic features in other varieties of English, including those spoken in New Jersey.
Prosodic Event Annotation and Detection in Three Varieties of English
AutoRPT is part of ongoing research into the detection of prosodic events across the following varieties (as spoken in New Jersey):
- Mainstream American English (MAE)
- African American English (AAE)
- Latine English (LE)
Build Instructions
To build AutoRPT, you'll need to install several Python libraries. Follow the steps below to set up the tool on your system.
Prerequisites
- Ensure that you have a Python version between 3.7-3.11 inclusive. You can download Python here.
- Download and unzip a copy of the repo.
- It is recommended to create a virtual environment to manage the dependencies specific to AutoRPT.
Step 1: Create a Virtual Environment (Optional but Recommended)
Setting up a virtual environment ensures that package installations for AutoRPT do not interfere with other Python projects on your machine. Use the command line/terminal to run the following script:
For Windows:
python -m venv AutoRPT
AutoRPT\Scripts\activate
For macOS/Linux:
python3 -m venv AutoRPT
source AutoRPT/bin/activate
Step 2: Install Dependencies
Navigate to the directory containing the AutoRPT folder (this may be in your Downloads unless you have since moved it). Navigate into the AutoRPT-main\AutoRPT-main folder (you should be able to see requirements.txt when you open the folder in the system explorer or use DIR).
You can install the required dependencies by running:
pip install -e .
This command will install all the necessary Python packages listed in the pyproject.toml file.
Required Python Packages
The key dependencies for AutoRPT are:
- Praat-ParselMouth: A Python interface to Praat for conducting phonetic analyses.
- TextGridTools: A library used to handle Praat TextGrid objects for annotating speech.
- Scikit-learn: A widely-used library for machine learning tasks such as classification and regression.
- Pandas: A powerful data manipulation and analysis library.
- Tensorflow: An open-source deep learning framework, used for building and training machine learning models.
Model Pipeline
Install the SpacY model pipeline.
python -m spacy download en_core_web_sm
Step 3: Install the trained models
We recommend that you use the most recent version of the trained models for LSTM, which can be found at the top level of this repo. As mentioned before, the model in this current (AutoRPT) repo is trained on the Boston University Radio News Corpus. The models in the linked repo are trained on the annotated corpus collected as part of this project. The new models have labeled as "_updated" to differentiate.
- Go to the AutoRPT-main/AutoRPT-main/AutoRPT_LSTM/Model_paths that is now on your computer.
- Delete the files out of it.
- Download the models from the provided link.
- Put them into that folder.
- Rename the models to Intensity_LSTM_model.h5 and Pitch_LSTM_model.h5.
Step 4: Run AutoRPT
Navigate into the folder with the main file. e.g.
cd AutoRPT
You can then run AutoRPT with the following commands:
python -m AutoRPT_LSTM.LSTM_RPT [wav] [textgrid]
If you do not manually provide paths to an audio file and a textgrid file, a file selection window will appear prompting you to select your TextGrid file. Select a file and press Open. Another selection window will appear prompting you to select a WAV file. Select a file and press Open.
Return to the command line or terminal and follow the instructions. AutoRPT will then start processing and annotating prosodic features based on the input data.
Script Breakdown
LSTM_RPT
Requires: os, tkinter, praatio, sys, Clean_I_Model, Clean_P_Model, Utilities, SpeakerFile, tgt, parselmouth, traceback
Description: "Main" file. Opens a file dialog or follows a path to select TextGrid and WAV files, creates tiers in the TextGrid in which it marks suspected boundary and prominence and labels them with confidence percentages.
Functions:
- select_tiers(list of strings all_tiers) - Allows user to enter tier names manually when automated methods fail. Returns tier names.
- select_files() - Opens a file dialog to select TextGrid and WAV files. Requests tier names from user. Returns file paths and tier names.
- pull_files_from_path() - Selects source files from filepath in text file. Returns: SpeakerFile object speaker_file, path-like string gen_save_path
- batch_process() - Runs main on whole folder of files using same logic as pull_files_from_path. No returns.
- main(SpeakerFile s, string save_path = None, bool split_utterances=False) - Creates tiers and places prosody annotations and confidence degrees from RPT functions in them. Calls all the other main functions: Pitch.run, Intensity.run, model_join.dict_merge, CTG.create_textgrid, Point_Tier.phone_data, CTG.create_point_tier. No returns.
If __main__: calls select_files to get user input, passes that input to main.
Clean_I_Model
Requires: parselmouth, tgt, numpy, spacy, pandas, re, os, tensorflow.keras.models, sklearn.preprocessing, sys, csv, datetime
Description: Defines and runs a number of functions related to intensity measures.
Class IntensityExtraction functions:
- getIntensity(self, Wav_file: parselmouth.Sound object. start_time: float, end_time: float) - Returns intensity as a parselmouth.Intensity object
- getMaxIntensity(self, intensity_full: parselmouth.Intensity) - Returns maximum intensity of a file as a float.
- getMinIntensity(self, intensity_full: parselmouth.Intensity) - Returns minimum intensity of a file as a float.
- getSTDIntensity(self, intensity_full: parselmouth.Intensity) - Returns standard deviation of intensity of a file as a float.
- getAverageIntensity(self, intensity_full: parselmouth.Intensity) - Returns arithmetic mean of intensity of a file as a float.
Class FileProcessorIntensity functions:
- __init__(self) - Runs model by itself calling IntensityExtraction().
- iterateTextGridforIntensity(self, s: SpeakerFile object, tier_type: string['word' or 'phone') - Creates array Interval_data, iterates through intervals of specified TextGrid tier, and runs calculations. Returns dict interval_data, int error_count, and array error_arr. Calls all IntensityExtraction functions.
Class SpeakerNormalization functions: N.B. For all of the below functions: interval_data is a dict mapping strings to arrays. arr is a string representing the dict key to select the array.
- fileMean(self, interval_data: dict, arr: str) - Takes arr and returns the average of the values
- fileStd(self, interval_data: dict, avg: float, arr: str) - Takes arr and average and returns Standard Deviation (Std) of the values
- fileMin(self, interval_data: dict, arr: str) - Takes arr and returns the minimum value
- fileMax(self, interval_data: dict, arr: str) - Takes arr and returns the maximum value
- zScoreAppend(self, interval_data: dict, avg: float, std: float, arr: str) - Takes arr, average, standard deviation and returns the dict with Z-score appended.
- getZScore(self, key: number, avg: float, std: float) - Takes a specific value and returns the Z-score.
Class IntensityFormatToInterval functions:
- dictToArr(self, arr: dict) - Converts dictionary to array.
- outputArr(self, arr: array) - Prints array.
Class IntensityFormatting functions:
- to_csv(self, data: array, csv_file: str [path]) - Creates CSV file out of array and saves it. No returns.
Class Context functions:
- contextWindow(self, complete_data: dictionary) - allows for only local context as opposed to the total context that the speaker normalization class would gather.
Class POS functions:
- add_pos_column_with_pandas(self, input_csv: str [path], text_column_name: str="Text", new_column_name: str="POS ID's") - Generates POS tags from spaCy model and saves to provided CSV file.
- clean_column(self, input_csv: str [path]) - Keeps only the first number from part of speech IDs.
- extract_first_number(cell) - defined inside clean_column
Class Saved_Model functions:
- intensity_model(self, csv_file: str [path], pred_dict: dict) - Loads model, extracts and normalizes input data, makes predictions, and writes to dictionary. Returns dictionary pred_dict.
Class Intensity functions:
- run(s: SpeakerFile, csv_path: str[path]) - Creates Sound object, does calculations on data, and exports the resulting dict. Calls FileProcessorIntensity.IterateTextGridforIntensity, all SpeakerNormalization except getZScore, IntensityFormatToInterval.dictToArr, all IntensityFormatting, all context, all POS, all Saved_Model functions.
Clean_P_Model
Requires: parselmouth, tgt, numpy, csv, spacy, pandas, re, os, tensorflow.keras.models, sklearn.preprocessing, datetime, traceback
Description: Defines and runs a number of functions related to pitch measures.
Class PitchExtraction functions: N.B. an important intermediate data structure is the parselmouth.Pitch object.
- getMaxPitch(self, Wav_file: parselmouth.Sound object, start_time: float, end_time: float) - Returns maximum pitch of a file.
- getMinPitch(self, Wav_file: parselmouth.Sound object, start_time: float, end_time: float) - Returns minimum pitch of a file.
- getPitchStandardDeviation(self, Wav_file: parselmouth.Sound object, start_time: float, end_time: float) - Returns standard deviation of pitch of a file.
- getAveragePitch(self, Wav_file: parselmouth.Sound object, start_time: float, end_time: float) - Returns arithmetic mean of pitch of an interval.
Class SpeakerNormalization functions: N.B. For all of the below functions: interval_data is a dict mapping strings to arrays. arr is a string representing the dict key to select the array.
- fileMean(self, interval_data: dict, arr: str) - Takes arr and returns the average of the values
- fileStd(self, interval_data: dict, avg: float, arr: str) - Takes arr and average and returns Standard Deviation (Std) of the values
- fileMin(self, interval_data: dict, arr: str) - Takes arr and returns the minimum value
- fileMax(self, interval_data: dict, arr: str) - Takes arr and returns the maximum value
- zScoreAppend(self, interval_data: dict, avg: float, std: float, arr: str) - Takes arr, average, standard deviation and returns the dict with Z-score appended.
- getZScore(self, key: number, avg: float, std: float) - Takes a specific value and returns the Z-score.
Class FileProcessor functions:
- __init__(self) - Runs model by itself calling PitchExtraction()
- iterateTextGridforPitch(self, s: SpeakerFile object, tier_type: string['word' or 'phone') - Creates array Interval_data, iterates through intervals of specified TextGrid tier, and runs calculations. Returns array interval_data, int error_count, and array error_arr. Calls all PitchExtraction methods.
Class FormatToInterval functions:
- dictToArr(self, arr: dict) - Converts dictionary to array.
- outputArr(self, arr: array) - Prints array.
Class Formatting functions:
- to_csv(self, data: array, csv_file: str [path]) - Creates CSV file out of array and saves it. No returns.
Class Context functions:
- contextWindow(self, complete_data: dictionary) - allows for only local context as opposed to the total context that the speaker normalization class would gather.
Class POS functions:
- add_pos_column_with_pandas(self, input_csv: str [path], text_column_name: str="Text", new_column_name: str="POS ID's") - Generates POS tags from spaCy model and saves to provided CSV file.
- clean_column(self, input_csv: str [path]) - Keeps only the first number from part of speech IDs.
- extract_first_number(cell) - defined inside clean_column
Class Saved_Model functions:
- pitch_model(self, csv_file: str [path], pred_dict: dict) - Loads model, extracts and normalizes input data, makes predictions, and writes to dictionary. Returns dictionary pred_dict.
Class Pitch functions:
- run(s: SpeakerFile object, csv_path: str[path]) - Creates Sound object, does calculations on data, and exports the resulting dict. Calls FileProcessor.IterateTextGridforPitch, all SpeakerNormalization except getZScore, FormatToInterval.dictToArr, all Formatting, all Context, all POS, all Saved_Model functions.
Utilities
Requires: praatio, re, tgt, os, traceback, csv
Description: Contains the functions doing the heavy lifting. Merges dictionaries, creates textgrid with tiers, populates.
- mto_csv(data: array, csv_file: str[path]) - creates CSV file out of array and saves it. Working toward eliminating the other to_csv functions to use this one.
- mdictToArr(d: dictionary) - converts dictionary to array. See mto_csv.
- moutputArr (arr: array) - prints array
Class model_join functions:
- static dict_merge(p_dict: dict, i_dict: dict) - Merges pitch and intensity dictionaries.
Class CTG functions:
- create_textgrid(final_dict: dict, output_file: str [path], reference_textgrid: textgrid.Textgrid object) - Creates a TextGrid object with text, prominence, and boundary tiers. Populates with information from final_dict.
- create_point_tier(final_dict: dict, textgrid_path: str [path], phone_data: str) - Creates point tier in provided textgrid and adds prosody markings according to final_dict. Calls point_tier_setup.
Class Point_Tier functions:
- static phone_data(Textgrid_path: str[path], phone_tier: str) - Creates dictionary from textgrid interval data
- static point_tier_setup(start_time: float, end_time: float, phone_dict: dict, type: string literal ['Prominence', 'Boundary']) - Returns float point_time.
SpeakerFile
Requires: parselmouth, pandas, re, os, traceback, textgrid
Description: Object that contains all data relevant to a specific channel of a specific sound file. This can include the wav file, textgrid, acoustic data, annotations, and a variety of instance variables.
Class SpeakerFile functions:
- __init__(self, textgrid_file_path: string[path]=None, finaldict_file_path: string[path] = None, wav_file_path: string[path] = None, annot_filepath: string[path] = None, existing_file: string[path] = None) - Creates the object from provided arguments and derives all possible information. Calls unpack_tg_output, parse_tiers, read_regex.
- unpack_tg_output(self, point_tier: string[name], w_no: int[index of word_tier], ph_no: int[index of phone_tier], pt_no: int[index of point tier]) Sets instance variables related to textgrids. No returns.
- parse_tiers(self, tiers: array of strings[names]) - Scans list of available tier names given the most likely tier names. Returns the name of the last tier and the indices of the word, phone, and point tiers.
- read_regex(self, m: match object created from a regex evaluation) - Unpacks the file naming convention into information based on regex_definition.txt. You will customize this if you're not using an identical naming convention to me. See regex explanation below.
- __repr__(self) - returns representation
- contents(self) - prints repr
- __str__(self) - returns simple name
- has_annotation_log(self): checker for instance variables implying annotation log exists. Returns boolean.
- has_final_dict(self): checker for filepath of final_dict (acoustic measures). Returns boolean.
- has_wav(self): checker for wav file object. Returns boolean.
- has_textgrid(self): checker for textgrid object. Returns boolean.
- add_annotation_log(self, annot_filepath: string[path]) - Adds an annotation log to the object in the form of a pandas dataframe. No returns.
- add_final_dict(self, final_dict_filepath: string[path]) - Adds an acoustic dictionary as created by AutoRPT to the object as a pandas dataframe. No returns.
- add_textgrid(self, textgrid_file_path: string[path]) - Adds a TextGrid.textgrid object to the file. Unlike logs, textgrid is added by reference and path must remain valid through the lifetime. No returns.
- add_wav(self, wav_file_path: string[path]) - Adds a parselmouth Sound object to the file. Unlike logs, sound file is added by reference and path must remain valid through the lifetime. No returns.
- __getstate__(self) - Copy the object's state from self.__dict__. Returns dictionary containing picklable instance variables.
- __setstate__(self, state: dictionary) - Restores instance variables from pickled state.
- read_from_txt/read_from_txt2/write_to_txt I'm trying to make a function that can instantiate a SpeakerFile object from a text file (and one that will save to it) instead of a pickle object (for backup and human readability) and it's not going well.
Regular expressions and file naming conventions
SpeakerFile operates on the assumption that the way you name your files a) is regular and b) tells you something about what's in them.
We have two naming conventions (two different lab groups looking at two different sets of priorities) for the base files as recorded, known as MMT1 and MMT2.
MMT1
Example: 1234p01mx01ab02cd.
Breakdown: 1234 p01 mx 01ab 02cd. Grant number (4 digits), pairing number (p followed by 2 digits), genders of participants in order from left to right/channel 1 to channel 2 (1 letter each), participant ID for left speaker/channel 1 (2 digits followed by 2 letters), participant ID for right speaker/channel 2 (2 digits followed by 2 letters)
MMT2
Example: 1234-p01-l-ff.
Breakdown: Grant number (4 digits), pairing number (p followed by 2 digits), language variety (1 letter), genders of participants in order from left to right/channel 1 to channel 2 (1 letter each)
Additional tags
Either of these can then be tagged with channel, annotator name, and/or file version (because we don't always ask people to annotate the entire file). SpeakerFile requires that we tag with channel; the rest are optional.
so yeah
This results in a regular expression looking for (in English): grant number, pairing number, language variety (optional), left gender, right gender, left speaker ID (optional), right speaker ID (optional), version (optional), channel, annotator (optional), file extension.
In regex, once capture groups have been added, this is 345 characters long, wholly not human-readable, and it is a huge ask to have someone modify it. So instead, I made a text file breaking down the regex roughly as I just broke it down for you, and wrote in code to read it and turn it into that 345-char-long string. That code is in __init__. The code that turns what the regular expression found into instance variables is in read_regex(self, m). Instead of having to figure out the entire regex, you can break it down into parts, and you only have to know the expression for each piece you need. The code takes care of attaching the capture group name and parentheses and marking whether it's optional.
The regex_definition text file looks like this:
grant_number required [0-9]{4}
pairing_number required -?p[0-9]{2}
race optional -[A-Za-z][A-Za-z]?
So what you're going to do is break every piece of the file naming convention apart and come up with a name (no spaces). That goes in column 1. Tab, then put required or optional. Tab again, the regular expression for just that piece. Then what the __init__ code does is go, "Oh, a four digit regex, cool, wrap that in parentheses and name the capture group grant_number. A p with two digits, optionally following a dash? Wrap in parentheses and name the capture group pairing_number. A dash followed by a letter and an optional second letter? Wrap in parentheses and name the capture group race. Oh, but it's optional? Tack a question mark on the end of it. Glue all these things together, that's the regular expression. Search for the pattern in the file name and send the resulting object to read_regex."
What this does is save you a whole bunch of counting parentheses and figuring out where exactly your question marks go in a way that's human-readable. It lets you change one piece of the file naming convention without having to edit the whole thing. And if you make a mistake, it doesn't propagate through the whole expression.
read_regex
You'll also need to edit read_regex to unpack everything. You don't necessarily want to store channel as _1, you might want to store it as 'left'. So read_regex cleans up the extra dashes and underscores, translates abbreviations into instance variable-worthy text, and sets the instance variables. There are a lot of comments in read_regex to guide you through the process of customizing it. The first thing the function does is call match.groupdict(), which lets you access every variable you put in regex_definition.txt from a simple dictionary mapping the capture group name to its value. You can also do this with integer indices calling match.groups(int), but then you need to know exactly how many variables you're going to have and in which order, which you won't if you have any optional ones. Using a dictionary allows you to check on the optional variables--they will still be keys in the dictionary, but their value will be None. You'll assign those instance variables by wrapping them in an if statement to check they exist first.
#some variables need cleaning up--this one has a -p in front we don't need
self.pairing_number = vars['pairing_number'][-2:]
#optional variables need to be nested in an if statement
if vars['version']: #if version is not None
self.version = vars['version'][1:]
#then chop off the - and push to self.version
It's also just more human-readable by a long shot to access by name instead of index. The only time the function uses match.groups is match.groups(0), which gets the entire capture as a single string.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file praat_autorpt-0.1.1.tar.gz.
File metadata
- Download URL: praat_autorpt-0.1.1.tar.gz
- Upload date:
- Size: 253.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d2d6546e4805490b6216fb2bfc439c89fc16ca026f093d3c756c2e43524eebc
|
|
| MD5 |
7e927f09392800e89663151b07f81a45
|
|
| BLAKE2b-256 |
597ab242ff490653fed435be4837032eac93cdb81e9bb477115a1bf8641102b2
|
File details
Details for the file praat_autorpt-0.1.1-py3-none-any.whl.
File metadata
- Download URL: praat_autorpt-0.1.1-py3-none-any.whl
- Upload date:
- Size: 255.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e647feac44800063970b11055e99a10cd1fc1f5289e705c11bf433793a95d145
|
|
| MD5 |
139d068a145290e5be23786562b3682a
|
|
| BLAKE2b-256 |
985a30e7abca647212978fc6fb8edd790f5e489e5479977a907d0fc1e2f73ffa
|