CLI to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Project description
dict-from-dict
CLI to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Features
- ignore casing of words while lookup
- trimming symbols at start and end of word before lookup
- separate word on hyphen before lookup
- if the dictionary contains words with hyphens they will be considered first (see example below)
- words with multiple pronunciations are supported
- weights will be multiplied for hyphenated words (see example below)
- outputting OOV words
- multiprocessing
Installation
pip install dict-from-dict --user
Usage
dict-from-dict-cli
Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF
# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test 0.7 T E0 S T
test 0.3 T E1 S T
def 0.4 D E0 F
def 0.6 D E1 F
xyz 2.0 ?
"xyz? 1.0 " X Y Z ?
uv 2.0 ?
w 2.0 ?
uv-w 1.0 U V - W
EOF
# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
/tmp/vocabulary.txt \
/tmp/dictionary.dict --consider-weights \
/tmp/result.dict \
--ignore-case --split-on-hyphen \
--n-jobs 4 \
--oov-out /tmp/oov.txt
cat /tmp/result.dict
# -------
# Output:
# -------
# Test? 0.7 T E0 S T ?
# Test? 0.3 T E1 S T ?
# "def 0.4 " D E0 F
# "def 0.6 " D E1 F
# Test-def. 0.27999999999999997 T E0 S T - D E0 F .
# Test-def. 0.42 T E0 S T - D E1 F .
# Test-def. 0.12 T E1 S T - D E0 F .
# Test-def. 0.18 T E1 S T - D E1 F .
# "xyz? 1.0 " X Y Z ?
# "uv-w? 1.0 " U V - W ?
# -------
cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dict-from-dict-0.0.1.tar.gz
(9.3 kB
view hashes)
Built Distribution
Close
Hashes for dict_from_dict-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d7b01f8fa6b1aa6851ff60c1ac716f7861b9f8d3dc32d9282d1196885317c05 |
|
MD5 | b0c2db5acd24d978ae7c2fbeb6869eb6 |
|
BLAKE2b-256 | 3597d29f7f89097a25a78500b88b4e57b32445b976dc0521021e0d57268144fb |