Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Project description
dict-from-dict
Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.
Features
- ignore casing of words while lookup
- trimming symbols at start and end of word before lookup
- separate word on hyphen before lookup
- if the dictionary contains words with hyphens they will be considered first (see example below)
- words with multiple pronunciations are supported
- weights will be multiplied for hyphenated words (see example below)
- outputting OOV words
- multiprocessing
Installation
pip install dict-from-dict --user
Usage
dict-from-dict-cli
Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF
# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test 0.7 T E0 S T
test 0.3 T E1 S T
def 0.4 D E0 F
def 0.6 D E1 F
xyz 2.0 ?
"xyz? 1.0 ' X Y Z ??
uv 2.0 ?
w 2.0 ?
uv-w 1.0 U V - W
EOF
# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
/tmp/vocabulary.txt \
/tmp/dictionary.dict --consider-weights \
/tmp/result.dict \
--ignore-case --split-on-hyphen \
--trim "?" "\"" "," "." \
--n-jobs 4 \
--oov-out /tmp/oov.txt
cat /tmp/result.dict
# -------
# Output:
# -------
Test? 0.7 T E0 S T ?
Test? 0.3 T E1 S T ?
"def 0.4 " D E0 F
"def 0.6 " D E1 F
Test-def. 0.27999999999999997 T E0 S T - D E0 F .
Test-def. 0.42 T E0 S T - D E1 F .
Test-def. 0.12 T E1 S T - D E0 F .
Test-def. 0.18 T E1 S T - D E1 F .
"xyz? 1.0 ' X Y Z ??
"uv-w? 1.0 " U V - W ?
# -------
cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------
Contributing
If you notice an error, please don't hesitate to open an issue.
License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dict_from_dict-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3abca10c3b60e5837188ddba001a9d21237bb07d9eb8202a93caac67dba28cb0 |
|
MD5 | 2d5d9bfa5bc465ad2fe7587a7e53219f |
|
BLAKE2b-256 | 2b43d777812d918b0ed26d3695664a54ffd65ab682452b44a985832830d70318 |