CLI to create a pronunciation dictionary by predicting English ARPAbet phonemes using seq2seq model from g2pE and the possibility of ignoring punctuation and splitting on hyphens before prediction.
Project description
dict-from-g2pE
CLI to create a pronunciation dictionary by predicting English ARPAbet phonemes using seq2seq model from g2pE and the possibility of ignoring punctuation and splitting on hyphens before prediction.
Installation
pip install dict-from-g2pE --user
Usage
dict-from-g2pE-cli
Example
# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF
# Create dictionary from vocabulary and example dictionary
dict-from-g2pE-cli \
/tmp/vocabulary.txt \
/tmp/result.dict \
--split-on-hyphen \
--n-jobs 4
cat /tmp/result.dict
Output:
Test? T EH1 S T ?
abc, AE1 B K ,
"def " D EH1 F
Test-def. T EH1 S T - D EH1 F .
"xyz? " Z IH1 JH IH0 Z ?
"uv-w? " AH1 V - V IY1 ?
Development setup
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv \
python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/dict-from-g2p.git
cd dict-from-g2p
# create virtual environment
python3.8 -m pipenv install --dev
Running the tests
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd dict-from-g2p
# activate environment
python3.8 -m pipenv shell
# run tests
tox
Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
License
MIT License
Acknowledgments
g2pE: A Simple Python Module for English Grapheme To Phoneme Conversion
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Taubert, S. (2024). dict-from-g2pE (Version 0.0.2) [Computer software]. https://doi.org/10.5281/zenodo.10561178
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dict_from_g2pE-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19300df7c83a6a4b0d101f858a4dc75ea8b9d7791be4b2cba8c6cd9d2a5f871b |
|
MD5 | 83db0145eb3eeb4088c0178c2d6faca4 |
|
BLAKE2b-256 | bf5832146701b058d8ea14241c629c033ac3d4a8f31388fd7b834bdbb3f29457 |