Library to load and save pronunciation dictionaries (any language).
Project description
pronunciation-dictionary
Library to load and save pronunciation dictionaries (any language).
Features
- Load dictionary from file or URL
- Parsing of
- line comments
- pronunciation comments
- numbers indicating alternative pronunciations for words
- weights
- Multiprocessing for faster deserialization
- Parsing of
- Save dictionary to file
- including numbers for alternative pronunciations
- include weights
- set word/weight/pronunciation separator
- Select pronunciation via
- weight
- highest/lowest weight
- first/last
- random
- Get phoneme set
Roadmap
- Adding tests
Example dictionaries and deserialization arguments
- Montreal Forced Aligner dictionaries
encoding: "UTF-8"
- CMU
encoding: "ISO-8859-1"
consider_numbers: True
consider_pronunciation_comments: True
- LibriSpeech
encoding: "UTF-8"
- Prosodylab
- Old: CMU 0.7b
encoding: "ISO-8859-1"
consider_comments: True
consider_numbers: True
Excerpt from CMU (as example)
a.d. EY2 D IY1
a.m. EY2 EH1 M
a.s EY1 Z
aaa T R IH2 P AH0 L EY1
aaberg AA1 B ER0 G
aachen AA1 K AH0 N
aachener AA1 K AH0 N ER0
aaker AA1 K ER0
aalborg AO1 L B AO0 R G # place, danish
aalborg(2) AA1 L B AO0 R G
Installation
pip install pronunciation-dictionary --user
Usage
from pronunciation_dictionary import load_dict, save_dict, MultiprocessingOptions, DeserializationOptions, SerializationOptions
Citation
If you want to cite this repo, you can use this bibtex-entry:
@misc{tspd22,
author = {Taubert, Stefan},
title = {pronunciation-dictionary},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/stefantaubert/pronunciation-dictionary}}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pronunciation-dictionary-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 624522dac3bedb6f5612037acb1d8390a1faccbb1913c5d25a638dcdb7b9fc64 |
|
MD5 | 06363a0d35ded779d2f54d4873fef641 |
|
BLAKE2b-256 | 52a7a2623047b2b47c12d20d28a3c7a1ce3846e536f5a2ffc4fbf99380a7731e |
Close
Hashes for pronunciation_dictionary-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c6afcdbb25ce4ab0fd7bff75ce26e532809a4ee2cb3143073f6b56c0f0af88f |
|
MD5 | 1d8cc7622cb4e905ad210a14943861bf |
|
BLAKE2b-256 | 9f7c3a4e5531aad55475dbcaea74069020f40f9be49e303bf6f725c8fb19acb3 |