Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Project description
dict-from-dict
Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Features
- ignore casing of words while lookup
- trimming symbols at start and end of word before lookup
- separate word on hyphen before lookup
- if the dictionary contains words with hyphens they will be considered first (see example below)
- words with multiple pronunciations are supported
- weights will be multiplied for hyphenated words (see example below)
- outputting OOV words
- multiprocessing
Installation
pip install dict-from-dict --user
Usage
dict-from-dict-cli
Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF
# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test 0.7 T E0 S T
test 0.3 T E1 S T
def 0.4 D E0 F
def 0.6 D E1 F
xyz 2.0 ?
"xyz? 1.0 ' X Y Z ??
uv 2.0 ?
w 2.0 ?
uv-w 1.0 U V - W
EOF
# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
/tmp/vocabulary.txt \
/tmp/dictionary.dict --consider-weights \
/tmp/result.dict \
--ignore-case --split-on-hyphen \
--trim "?" "\"" "," "." \
--n-jobs 4 \
--oov-out /tmp/oov.txt
cat /tmp/result.dict
# -------
# Output:
# -------
Test? 0.7 T E0 S T ?
Test? 0.3 T E1 S T ?
"def 0.4 " D E0 F
"def 0.6 " D E1 F
Test-def. 0.27999999999999997 T E0 S T - D E0 F .
Test-def. 0.42 T E0 S T - D E1 F .
Test-def. 0.12 T E1 S T - D E0 F .
Test-def. 0.18 T E1 S T - D E1 F .
"xyz? 1.0 ' X Y Z ??
"uv-w? 1.0 " U V - W ?
# -------
cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------
Development setup
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv \
python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/pronunciation-dict-creation.git
cd pronunciation-dict-creation
# create virtual environment
python3.8 -m pipenv install --dev
Running the tests
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dict-creation
# activate environment
python3.8 -m pipenv shell
# run tests
tox
Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Taubert, S. (2024). dict-from-dict (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.10560441
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dict_from_dict-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a3bfdf6820b85ea167ede1a6cfe10130e5fe0a168a7f3147e3828898ec77c4c |
|
MD5 | 86dd2bc4bad7c528b2350a104b16e520 |
|
BLAKE2b-256 | 76a6094515c8d84e0ddbb56851cf70f8d5935d7520f74fd01e779ee50e453f1e |