Skip to main content

CLI and library to modify pronunciation dictionaries (any language).

Project description

pronunciation-dictionary-utils

PyPI PyPI MIT PyPI DOI

Library and CLI to modify pronunciation dictionaries (any language).

Features

  • export-vocabulary: export vocabulary from dictionaries
  • export-phonemes: export phoneme set from dictionaries
  • merge: merge dictionaries together
  • extract: extract subset of dictionary vocabulary
  • map-symbols-in-pronunciations: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
  • map-symbols-in-pronunciations-json: map phonemes/symbols in pronunciations to phoneme/symbol specified in file
  • remove-symbols-from-vocabulary: remove phonemes/symbols from vocabulary
  • remove-symbols-from-pronunciations: remove phonemes/symbols from pronunciations
  • remove-symbols-from-words: remove characters/symbols from words
  • change-formatting: change formatting of dictionaries
  • select-single-pronunciation: select single pronunciation
  • change-word-casing: transform all words to upper- or lower-case
  • sort-words: sort dictionary after words
  • sort-pronunciations: sort dictionary pronunciations
  • normalize-weights: normalize pronunciation weights for each word

Roadmap

  • Adding tests
  • Implementation of printing of statistics
  • Add change of pronunciation for a word via CLI

Installation

pip install pronunciation-dictionary-utils --user

Usage

usage: dict-cli [-h] [-v]
                {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                ...

This program provides methods to modify pronunciation dictionaries.

positional arguments:
  {export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
                                        description
    export-vocabulary                   export vocabulary from dictionaries
    export-phonemes                     export phoneme set from dictionaries
    merge                               merge dictionaries together
    extract                             extract subset of dictionary vocabulary
    map-symbols-in-pronunciations       map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
    map-symbols-in-pronunciations-json  map phonemes/symbols in pronunciations to phoneme/symbol specified in file
    remove-symbols-from-vocabulary      remove phonemes/symbols from vocabulary
    remove-symbols-from-pronunciations  remove phonemes/symbols from pronunciations
    remove-symbols-from-words           remove characters/symbols from words
    change-formatting                   change formatting of dictionaries
    select-single-pronunciation         select single pronunciation
    change-word-casing                  transform all words to upper- or lower-case
    sort-words                          sort dictionary after words
    sort-pronunciations                 sort dictionary pronunciations
    normalize-weights                   normalize pronunciation weights for each word

optional arguments:
  -h, --help                            show this help message and exit
  -v, --version                         show program's version number and exit

Example

# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
  -O "/tmp/example.dict"

# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
  "/tmp/example.dict" \
  --deserialization-encoding "ISO-8859-1" \
  --consider-numbers \
  --consider-pronunciation-comments \
  --serialization-encoding "UTF-8"

# Export phoneme set
dict-cli export-phonemes \
  "/tmp/example.dict" \
  "/tmp/example-phoneme-set.txt"
  
# Export vocabulary
dict-cli export-vocabulary \
  "/tmp/example.dict" \
  "/tmp/example-vocabulary.txt"

# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
  "/tmp/example.dict" \
  --mode "first"

# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
  "/tmp/example.dict" \
  "ER0" "ER"

Contributing

Development setup

# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv \
  python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev

Running the tests

# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox

Final lines of test result output:

py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).

Taubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pronunciation-dictionary-utils-0.0.5.tar.gz (30.6 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page