Skip to main content

CLI and library to modify pronunciation dictionaries (any language).

Project description

pronunciation-dictionary-utils

PyPI PyPI MIT PyPI DOI

Library and CLI to modify pronunciation dictionaries (any language).

Features

  • export-vocabulary: export vocabulary from dictionaries
  • export-phonemes: export phoneme set from dictionaries
  • merge: merge dictionaries together
  • extract: extract subset of dictionary vocabulary
  • map-symbols-in-pronunciations: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
  • map-symbols-in-pronunciations-json: map phonemes/symbols in pronunciations to phoneme/symbol specified in file
  • remove-symbols-from-vocabulary: remove phonemes/symbols from vocabulary
  • remove-symbols-from-pronunciations: remove phonemes/symbols from pronunciations
  • remove-symbols-from-words: remove characters/symbols from words
  • change-formatting: change formatting of dictionaries
  • select-single-pronunciation: select single pronunciation
  • change-word-casing: transform all words to upper- or lower-case
  • sort-words: sort dictionary after words
  • sort-pronunciations: sort dictionary pronunciations
  • normalize-weights: normalize pronunciation weights for each word

Roadmap

  • Adding tests
  • Implementation of printing of statistics
  • Add change of pronunciation for a word via CLI

Installation

pip install pronunciation-dictionary-utils --user

Usage

usage: dict-cli [-h] [-v]
                {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                ...

This program provides methods to modify pronunciation dictionaries.

positional arguments:
  {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                                        description
    export-vocabulary                   export vocabulary from dictionaries
    export-phonemes                     export phoneme set from dictionaries
    merge                               merge dictionaries together
    extract                             extract subset of dictionary vocabulary
    map-symbols-in-pronunciations       map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
    map-symbols-in-pronunciations-json  map phonemes/symbols in pronunciations to phoneme/symbol specified in file
    remove-symbols-from-vocabulary      remove phonemes/symbols from vocabulary
    remove-symbols-from-pronunciations  remove phonemes/symbols from pronunciations
    remove-symbols-from-words           remove characters/symbols from words
    change-formatting                   change formatting of dictionaries
    select-single-pronunciation         select single pronunciation
    change-word-casing                  transform all words to upper- or lower-case
    sort-words                          sort dictionary after words
    sort-pronunciations                 sort dictionary pronunciations
    normalize-weights                   normalize pronunciation weights for each word

optional arguments:
  -h, --help                            show this help message and exit
  -v, --version                         show program's version number and exit

Example

# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
  -O "/tmp/example.dict"

# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
  "/tmp/example.dict" \
  --deserialization-encoding "ISO-8859-1" \
  --consider-numbers \
  --consider-pronunciation-comments \
  --serialization-encoding "UTF-8"

# Export phoneme set
dict-cli export-phonemes \
  "/tmp/example.dict" \
  "/tmp/example-phoneme-set.txt"
  
# Export vocabulary
dict-cli export-vocabulary \
  "/tmp/example.dict" \
  "/tmp/example-vocabulary.txt"

# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
  "/tmp/example.dict" \
  --mode "first"

# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
  "/tmp/example.dict" \
  "ER0" "ER"

Contributing

Development setup

# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv \
  python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev

Running the tests

# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox

Final lines of test result output:

py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).

Taubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pronunciation-dictionary-utils-0.0.5.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file pronunciation-dictionary-utils-0.0.5.tar.gz.

File metadata

File hashes

Hashes for pronunciation-dictionary-utils-0.0.5.tar.gz
Algorithm Hash digest
SHA256 2f3d2b51c7f4076241174bcd910f6fe61c2ed6e837ded8040614628790a3b42a
MD5 bda7a6bd0a4a25cd89009153750800ef
BLAKE2b-256 bd23f50d150e24b1e62fa4c83c1fae4469061a9b1dd02d451683814da38499b0

See more details on using hashes here.

File details

Details for the file pronunciation_dictionary_utils-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for pronunciation_dictionary_utils-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fbaec5b3cf78a138a43705f9e55b7701b254654aec84fa46e05903547d9080da
MD5 7d18c73ed60bb92cf1f2d9a640bb89f9
BLAKE2b-256 d52b802314e91e86ce8cb844d50223d0c7c51cd69532334bec9ac4daebf970d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page