Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Project description
dict-from-dict
Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Features
- ignore casing of words while lookup
- trimming symbols at start and end of word before lookup
- separate word on hyphen before lookup
- if the dictionary contains words with hyphens they will be considered first (see example below)
- words with multiple pronunciations are supported
- weights will be multiplied for hyphenated words (see example below)
- outputting OOV words
- multiprocessing
Installation
pip install dict-from-dict --user
Usage
dict-from-dict-cli
Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF
# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test 0.7 T E0 S T
test 0.3 T E1 S T
def 0.4 D E0 F
def 0.6 D E1 F
xyz 2.0 ?
"xyz? 1.0 " X Y Z ?
uv 2.0 ?
w 2.0 ?
uv-w 1.0 U V - W
EOF
# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
/tmp/vocabulary.txt \
/tmp/dictionary.dict --consider-weights \
/tmp/result.dict \
--ignore-case --split-on-hyphen \
--n-jobs 4 \
--oov-out /tmp/oov.txt
cat /tmp/result.dict
# -------
# Output:
# -------
# Test? 0.7 T E0 S T ?
# Test? 0.3 T E1 S T ?
# "def 0.4 " D E0 F
# "def 0.6 " D E1 F
# Test-def. 0.27999999999999997 T E0 S T - D E0 F .
# Test-def. 0.42 T E0 S T - D E1 F .
# Test-def. 0.12 T E1 S T - D E0 F .
# Test-def. 0.18 T E1 S T - D E1 F .
# "xyz? 1.0 " X Y Z ?
# "uv-w? 1.0 " U V - W ?
# -------
cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------
License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry:
@misc{tsdfd22,
author = {Taubert, Stefan},
title = {dict-from-dict},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/stefantaubert/pronunciation-dict-creation}}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dict-from-dict-0.0.2.tar.gz
(10.0 kB
view hashes)
Built Distribution
Close
Hashes for dict_from_dict-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ebe99027201e2101f388f29e02cd9d64e7d87cd97f8b05de45dab4eee0324ad |
|
MD5 | 0c7b312c8ddde12a29f2d43111459a4f |
|
BLAKE2b-256 | ab9ac4331f3378634b12c4ac569ffea4cee3a2ebfd5ba5fc6c4a34174d9d0039 |