CLI and library to modify pronunciation dictionaries (any language).
Project description
pronunciation-dictionary-utils
Library and CLI to modify pronunciation dictionaries (any language).
Features
export-vocabulary
: export vocabulary from dictionariesexport-phonemes
: export phoneme set from dictionariesmerge
: merge dictionaries togetherextract
: extract subset of dictionary vocabularymap-symbols-in-pronunciations
: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPAmap-symbols-in-pronunciations-json
: map phonemes/symbols in pronunciations to phoneme/symbol specified in fileremove-symbols-from-vocabulary
: remove phonemes/symbols from vocabularyremove-symbols-from-pronunciations
: remove phonemes/symbols from pronunciationsremove-symbols-from-words
: remove characters/symbols from wordschange-formatting
: change formatting of dictionariesselect-single-pronunciation
: select single pronunciationchange-word-casing
: transform all words to upper- or lower-casesort-words
: sort dictionary after wordssort-pronunciations
: sort dictionary pronunciationsnormalize-weights
: normalize pronunciation weights for each word
Roadmap
- Adding tests
- Implementation of printing of statistics
- Add change of pronunciation for a word via CLI
Installation
pip install pronunciation-dictionary-utils --user
Usage
usage: dict-cli [-h] [-v]
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
...
This program provides methods to modify pronunciation dictionaries.
positional arguments:
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
description
export-vocabulary export vocabulary from dictionaries
export-phonemes export phoneme set from dictionaries
merge merge dictionaries together
extract extract subset of dictionary vocabulary
map-symbols-in-pronunciations map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
map-symbols-in-pronunciations-json map phonemes/symbols in pronunciations to phoneme/symbol specified in file
remove-symbols-from-vocabulary remove phonemes/symbols from vocabulary
remove-symbols-from-pronunciations remove phonemes/symbols from pronunciations
remove-symbols-from-words remove characters/symbols from words
change-formatting change formatting of dictionaries
select-single-pronunciation select single pronunciation
change-word-casing transform all words to upper- or lower-case
sort-words sort dictionary after words
sort-pronunciations sort dictionary pronunciations
normalize-weights normalize pronunciation weights for each word
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Example
# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
-O "/tmp/example.dict"
# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
"/tmp/example.dict" \
--deserialization-encoding "ISO-8859-1" \
--consider-numbers \
--consider-pronunciation-comments \
--serialization-encoding "UTF-8"
# Export phoneme set
dict-cli export-phonemes \
"/tmp/example.dict" \
"/tmp/example-phoneme-set.txt"
# Export vocabulary
dict-cli export-vocabulary \
"/tmp/example.dict" \
"/tmp/example-vocabulary.txt"
# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
"/tmp/example.dict" \
--mode "first"
# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
"/tmp/example.dict" \
"ER0" "ER"
Contributing
Development setup
# update
sudo apt update
# install Python 3.8, 3.9, 3.10 & 3.11 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev
Running the tests
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox
Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
congratulations :)
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Changelog
- v0.0.4 (2023-06-02)
- Bugfixes:
- Support for Python 3.11 was not correctly defined
- Adjusted return value for
map_symbols
inpronunciations_map_symbols
- Corrected log messages
- Added:
- Support for partial mapping in
map-symbols-in-pronunciations-json
- Added testing units for
map-symbols-in-pronunciations-json
- Support for partial mapping in
- Bugfixes:
- v0.0.3 (2023-01-12)
- Added:
- Added
sort-words
to support sorting of words - Added
sort-pronunciations
to support sorting of pronunciations - Added
normalize-weights
to support normalization of pronunciation weights - Added
map-symbols-in-pronunciations-json
to map phonemes/symbols in pronunciations to phonemes/symbols specified in json file - Added
--mode
to symbol removal in pronunciations - Added returning of an exit code
- Included tests in package distribution
- Added
- Changed:
- Symbols are now positional in
remove-symbols-from-words
,remove-symbols-from-pronunciations
andremove-symbols-from-vocabulary
- Notify if something changed after merging dictionary
- Update CLI
- Update pronunciation-dictionary version to 0.0.5
- Symbols are now positional in
- Removed:
- Removed parameter 'ratio' for merging pronunciation weights
- Bugfixes
- Added:
- v0.0.2 (2022-12-02)
- Support sorting of words
- Support sorting of pronunciations
- Support normalization of pronunciation weights
- Added
mode
to symbol removal in pronunciations - Symbols are now positional in symbol removal
- Removed parameter
ratio
for merging pronunciation weights - Notify if something changed after merging dictionary
- Update CLI
- Update
pronunciation-dictionary
version to0.0.5
- Bugfixes
- v0.0.1 (2022-09-30)
- Initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pronunciation-dictionary-utils-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 359310bd0bc1e57e5690b9e0cb04992bace7aa85b40f06898014a6a3991e3f27 |
|
MD5 | bf6a8a2b454b81304bc0677544ba0648 |
|
BLAKE2b-256 | 445afb39b6566c51fa5c96e7b6ec45352027c79994a56fe3903965725991cc27 |
Close
Hashes for pronunciation_dictionary_utils-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c47acc63af707a4b6cec23eb5014a38dc1bd00a865b692e31e9b85d4ad5f0df |
|
MD5 | f9435935c9b905ec0eae2c41a558cf61 |
|
BLAKE2b-256 | 40c1bbd9f21fe9766a54353128d25d76504a5b98271e2fe27828098245a96ce3 |