The base python package for DynamicFluency: Monitor and understand the dynamicity of linguistic aspects in (L2) speech.
Project description
DynamicFluency-core
The base Python package for DynamicFluency: track an L2 speaker's ability through time.
This package is effectively a wrapper around a myriad of other projects to allow for smooth integration with Praat. Praat scripts cannot naively call any other programming language, making calling command-line executable within them the only option. This package exposes those needed scripts.
Theoretically speaking, this project can also be used as a library that exposes functions for the same functionality as the non-trivial scripts. For "documentation" of those, please just look at the scripts themselves and, if that is not specific enough, the tests.
Installing
This Python package can be installed with pip, requiring Python 3.9 or higher. To install, simply run:
pip install dynamicfluency
This does not download everything directly. Some scripts require language models to be installed, which are downloaded on demand. To download those immediately, check out the download_models
script below.
Exposed scripts
These are the scripts exposed by this project. All of them also have a -h
/ --help
argument that shows a similar explanation and all of them can take arguments both with a short -n
and a long --argument_name
name specifier.
Scripts meant for direct use
add_frequency_dictionary
python -m dynamicfluency.scripts.add_frequency_dictionary -t [table_name] -f [dictionary_file] -b [database_file] -s [seperator] -e [if_exists]
DynamicFluency makes use of a sqlite3 .db file to store corpus word frequencies. This script is to be called directly by the user if they want to add an extra corpus. (by default, DynamiFluency has subtlexus
and subtlexnl
) This can be to add support to another language, or because a different type of frequency data is required.
The arguments are the following
table_name
The name to store this dictionary under, this is the name you later need to use to retrieve it in the configurationdictionary_file
The .csv that you want to add to the database. This csv should have one column "WordForms," which contains the lemmas that are actually being looked up. All other columns should contain numerical data that can be put into the output of DynamicFluencydatabase_file
The .db file you want to add the dictionary to. If you're in the main DynamicFluency directory and haven't changed this, this will be./databases/main.db
.- Default:
./databases/main.db
- Default:
separator
The character that separates the columns in the .csv.- Default:
","
- Default:
if_exists
What to do if a table with the specified name already exists in the database. Either "fail" or "replace"- Default:
"fail"
- Default:
download_models
python -m dynamicfluency.scripts.download_models -l [language]
This script pre-downloads the language models needed for running the scripts (and with that DynamicFluency). Running this can be useful if you want to be able to use the system offline, or for debugging. (Something going wrong whilst downloading is one of the most common ways the system can fail.)
The argument is the following:
language
The language to download the models for.- Default:
"en"
(English)
- Default:
Scripts meant only for indirect use via DynamicFluency
convert_aeneas_to_textgrid
python -m dynamicfluency.scripts.convert_auneas_to_textgrid -d [directory]
This script looks for all *.tokens.json
and *.phrases.json
files in a directory and converts them to an alignment TextGrid with the same name and .alignment.TextGrid
extension
The argument is the following:
directory
The directory to look & save in.- Default:
./output
- Default:
get_database_columns
python -m dynamicfluency.scripts.get_databse_clumns -t [table_name] -b [database] -d [directory]
This script exposes getting all the column names from a database table. This is needed to allow users to specify which ones they want the information from. To allow the praat script to find these columns, they are saved to clumns_names.csv
in the specified directory.
The arguments are the following:
table_name
The name of the table to get the column names fromdatabase
The sqlite3 .db file to read from.- Default:
./databases/main.db
- Default:
directory
The directory to save the output column_names.csv to.- Default:
./output
- Default:
make_frequencytagged_grids_from_aligned_grids
python -m dynamicfluency.scripts.make_frequencytagged_grids_from_aligned_grids -t [table_name] -a [alignment] -b [database] -d [directory] -i [to_ignore] -c [columns]
This script uses the specified frequency dictionary table from the specified sqlite3 database file to create a new textgrid from the force-aligned specified one with a tier for each specified column of that database table that contains the information from that column for the word form at that time in the original textgrid.
It finds the alignment grids to use by globbing for for all the *.alignment.TxtGrid
files in that directory, and saves its output as .frequency.TextGrid
files with the same name.
The arguments are the following:
table_name
The name of the table to get the information fromalignment
The type of alignment file to generate from. Either"maus"
"aeneas"
or"whisper"
database
The sqlite3 database file to load- Default:
./databases/main.db
- Default:
directory
The directory to find the*.alligenment.TextGrid
files in.- Default:
./output
- Default:
to_ignore
The words to not assign any value, separated by only commas (e.g."uh,uhm"
)- Default:
None
- Default:
columns
The database columns to get the information from. If None/left empty, all columns of that table are selected- Default:
None
- Default:
make_postagged_grids_from_aligned_grids
python -m dynamicfluency.scripts.make_postagged_grids_from_aligned_grids -a [alignment] -d [directory] -l [language]
This script creates a new textgrid from the alignment textgrid that contains the words plus their Part Of Speech tag in this_JJ, format_NNS
It finds the alignment grids to use by globbing for all the *.alignment.TxtGrid
files in that directory and saves them as *.pos_tags.TextGrid
with the same name.
The arguments are the following:
alignment
The type of alignment file to generate from. Either"maus"
"aeneas"
or"whisper"
directory
The directory to find the*.alligenment.TextGrid
files in.- Default:
./output
- Default:
language
The language to load the model from.- Default:
"en"
(English)
- Default:
make_repetitionstagged_grids_from_postagged_grids
python -m dynamicfluency.scripts.make_repetitionstagged_grids_from_postagged_grids -d [directory] -m [max_read] -i [to_ignore]
This script creates a new textgrid from the pos textgrid that contains two tiers, one with the repetitions, and one with the frequency distribution.
The frequency distribution is simply nltk.FreqDist
, which each word getting assigned the value nr_occurences / nr_words_total
.
The repetition measure is custom, it is one divided by the number of words between this word and the previous occurrence of this word.
It finds the pos grids to use by globbing for all the *.pos_tags.TxtGrid
files in that directory and saves them as *.repititions.TextGrid
with the same name.
The arguments are the following:
directory
The directory to find the*.pos_tags.TextGrids
files in.- Default:
./output
- Default:
max_raed
The maximum amount of words to look back for the repetitions.- Default:
300
- Default:
to_ignore
The words to not assign any value, separated by only commas (e.g."uh,uhm"
)- Default:
None
- Default:
make_syntax_grids_from_postagged_grids
python -m dynamicfluency.scripts.make_syntax_grids_from_postagged_grids -d [directory] -l [language]
This script creates a new textgrid from the pos textgrid that contains two point tiers, one that identifies the clausal verbs, and one that identifies all the verb phrases.
This script is, in many ways, a stub. It is based on TAASSC, which works in a similar manner.
It finds the pos grids to use by globbing for all the *.pos_tags.TxtGrid
files in that directory and saves them as *.repititions.TextGrid
with the same name.
The arguments are the following:
directory
The directory to find the*.pos_tags.TextGrids
files in.- Default:
./output
- Default:
language
The language to load the model from.- Default:
"en"
(English)
- Default:
Code style
For code formatting, Black is used with default settings. Some of the naming from praatio (e.g. entryList
) is taken over, but, generally, the Python convention for naming is taken over.
Also, I, (JJWRoeloffs,) am dyslexic, so there might be some spelling errors in the variable names and docstrings. If you find some, pull requests are welcome!
Lastly, this was my first ever Python project, (which should be especially apparent from older commits,) written for a research internship, please expect to be baffled by weird choices, and do not see this as a portfolio project. I can do better than this.
Tests
Running the tests can be done with the following command:
# Install the current version of the package locally to be able to test it.
python3 -m pip install -e .
python3 -m pytest --cov=autobi tests/
These tests will download a lot of language models on the first go, as it checks all supported languages. They are going to take a few minutes the first time they're run
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dynamicfluency-0.2.0.tar.gz
.
File metadata
- Download URL: dynamicfluency-0.2.0.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 289a156c1cac6b62ad649b25517f0f4e111b182557ffd00ec3d54f660ce21aec |
|
MD5 | 338b6642525bc59ab7e037e259eae2a6 |
|
BLAKE2b-256 | 62532736044451b45c94c3de10e90ba0a811767c6d1ce3083c267059eed0b193 |
File details
Details for the file dynamicfluency-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: dynamicfluency-0.2.0-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 375d0fdf9acec08c25358333d9932eb75cd03fbf3889e539695e034f360c901d |
|
MD5 | 144926057a059355cc2a1920cecf4fbf |
|
BLAKE2b-256 | 2eeeeda2690b32d95d34de7da10746f2df5e784a3cc3d7f7612bf0ded307b0ae |