CLI and library to modify pronunciation dictionaries (any language).
Project description
pronunciation-dictionary-utils
Library and CLI to modify pronunciation dictionaries (any language).
Features
export-vocabulary
: export vocabulary from dictionariesexport-phonemes
: export phoneme set from dictionariesmerge
: merge dictionaries togetherextract
: extract subset of dictionary vocabularymap-symbols-in-pronunciations
: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPAmap-symbols-in-pronunciations-json
: map phonemes/symbols in pronunciations to phoneme/symbol specified in fileremove-symbols-from-vocabulary
: remove phonemes/symbols from vocabularyremove-symbols-from-pronunciations
: remove phonemes/symbols from pronunciationsremove-symbols-from-words
: remove characters/symbols from wordschange-formatting
: change formatting of dictionariesselect-single-pronunciation
: select single pronunciationchange-word-casing
: transform all words to upper- or lower-casesort-words
: sort dictionary after wordssort-pronunciations
: sort dictionary pronunciationsnormalize-weights
: normalize pronunciation weights for each word
Roadmap
- Adding tests
- Implementation of printing of statistics
- Add change of pronunciation for a word via CLI
Installation
pip install pronunciation-dictionary-utils --user
Usage
usage: dict-cli [-h] [-v]
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
...
This program provides methods to modify pronunciation dictionaries.
positional arguments:
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
description
export-vocabulary export vocabulary from dictionaries
export-phonemes export phoneme set from dictionaries
merge merge dictionaries together
extract extract subset of dictionary vocabulary
map-symbols-in-pronunciations map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
map-symbols-in-pronunciations-json map phonemes/symbols in pronunciations to phoneme/symbol specified in file
remove-symbols-from-vocabulary remove phonemes/symbols from vocabulary
remove-symbols-from-pronunciations remove phonemes/symbols from pronunciations
remove-symbols-from-words remove characters/symbols from words
change-formatting change formatting of dictionaries
select-single-pronunciation select single pronunciation
change-word-casing transform all words to upper- or lower-case
sort-words sort dictionary after words
sort-pronunciations sort dictionary pronunciations
normalize-weights normalize pronunciation weights for each word
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Example
# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
-O "/tmp/example.dict"
# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
"/tmp/example.dict" \
--deserialization-encoding "ISO-8859-1" \
--consider-numbers \
--consider-pronunciation-comments \
--serialization-encoding "UTF-8"
# Export phoneme set
dict-cli export-phonemes \
"/tmp/example.dict" \
"/tmp/example-phoneme-set.txt"
# Export vocabulary
dict-cli export-vocabulary \
"/tmp/example.dict" \
"/tmp/example-vocabulary.txt"
# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
"/tmp/example.dict" \
--mode "first"
# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
"/tmp/example.dict" \
"ER0" "ER"
Contributing
Development setup
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv \
python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev
Running the tests
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox
Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Taubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pronunciation-dictionary-utils-0.0.5.tar.gz
.
File metadata
- Download URL: pronunciation-dictionary-utils-0.0.5.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f3d2b51c7f4076241174bcd910f6fe61c2ed6e837ded8040614628790a3b42a |
|
MD5 | bda7a6bd0a4a25cd89009153750800ef |
|
BLAKE2b-256 | bd23f50d150e24b1e62fa4c83c1fae4469061a9b1dd02d451683814da38499b0 |
File details
Details for the file pronunciation_dictionary_utils-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: pronunciation_dictionary_utils-0.0.5-py3-none-any.whl
- Upload date:
- Size: 62.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbaec5b3cf78a138a43705f9e55b7701b254654aec84fa46e05903547d9080da |
|
MD5 | 7d18c73ed60bb92cf1f2d9a640bb89f9 |
|
BLAKE2b-256 | d52b802314e91e86ce8cb844d50223d0c7c51cd69532334bec9ac4daebf970d6 |