Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional Orthography.

These details have not been verified by PyPI

Project links

Homepage

Project description

Aquila Resolve - Grapheme-to-Phoneme Converter

PyPI - Python Version

Augmented Recurrent Neural G2P with Inflectional Orthography

Aquila Resolve presents a new approach for accurate and efficient English to ARPAbet G2P resolution. The pipeline employs a context layer, multiple transformer and n-gram morpho-orthographical search layers, and an autoregressive recurrent neural transformer base. The current implementation offers state-of-the-art accuracy for out-of-vocabulary (OOV) words, as well as contextual analysis for correct inferencing of English Heteronyms.

The package is offered in a pre-trained state that is ready for use as a dependency or in notebook environments. There are no additional resources needed, other than the model checkpoint which is automatically downloaded on the first usage. See Installation more information.

1. Dynamic Word Mappings based on context:

g2p.convert('I read the book, did you read it?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'

g2p.convert('The researcher was to subject the subject to a test.')
# >> '{DH AH0} {R IY1 S ER0 CH ER0} {W AA1 Z} {T UW1} {S AH0 B JH EH1 K T} {DH AH0} {S AH1 B JH IH0 K T} {T UW1} {AH0} {T EH1 S T}.'

	'The subject was told to read. Eight records were read in total.'
Ground Truth	The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were `R EH1 D` in total.
Aquila Resolve	The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were `R EH1 D` in total.
Deep Phonemizer (en_us_cmudict_forward.pt)	The S AH B JH EH K T was told to R EH D. Eight R AH K AO R D Z were `R EH D` in total.
CMUSphinx Seq2Seq (checkpoint)	The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight R IH0 K AO1 R D Z were R IY1 D in total.
ESpeakNG (with phonecodes)	The S AH1 B JH EH K T was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were R IY1 D in total.

2. Leading Accuracy for unseen words:

g2p.convert('Did you kalpe the Hevinet?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'

	"tensorflow"	"agglomerative"	"necrophages"
Aquila Resolve	`T EH1 N S ER0 F L OW2`	`AH0 G L AA1 M ER0 EY2 T IH0 V`	`N EH1 K R OW0 F EY2 JH IH0 Z`
Deep Phonemizer (en_us_cmudict_forward.pt)	`T EH N S ER F L OW`	AH G L AA M ER AH T IH V	`N EH K R OW F EY JH IH Z`
CMUSphinx Seq2Seq (checkpoint)	T EH1 N S ER0 L OW0 F	AH0 G L AA1 M ER0 T IH0 V	N AE1 K R AH0 F IH0 JH IH0 Z
ESpeakNG (with phonecodes)	T EH1 N S OW0 R F L OW2	AA G L AA1 M ER0 R AH0 T IH2 V	N EH1 K R AH0 F IH JH EH0 Z

Installation

pip install aquila-resolve

A pre-trained model checkpoint (~106 MB) will be automatically downloaded on the first use of relevant public methods that require inferencing. For example, when instantiating G2p. You can also start this download manually by calling Aquila_Resolve.download().

If you are in an environment where remote file downloads are not possible, you can also transfer the checkpoint manually, placing model.pt within the Aquila_Resolve.data module folder.

Usage

from Aquila_Resolve import G2p

g2p = G2p()

g2p.convert('The book costs $5, will you read it?')
# >> '{DH AH0} {B UH1 K} {K AA1 S T S} {F AY1 V} {D AA1 L ER0 Z}, {W IH1 L} {Y UW1} {R IY1 D} {IH1 T}?'

Additional optional parameters are available when defining a G2p instance:

Parameter	Default	Description
`device`	`'cpu'`	Device for Pytorch inference model. GPU is supported using `'cuda'`
`process_numbers`	`True`	Toggles conversion of some numbers and symbols to their spoken pronunciation forms. See numbers.py for details on what is covered.

Model Architecture

In evaluation[^1], neural G2P models have traditionally been extremely sensitive to orthographical variations in graphemes. Attention-based mapping of contextual recognition has traditionally been poor for languages like English with a low correlative relationship between grapheme and phonemes[^2]. Furthermore, both static methods (i.e. CMU Dictionary), and dynamic methods (i.e. G2p-seq2seq, Phonetisaurus, DeepPhonemizer) incur a loss of sentence context during tokenization for training and inference, and therefore make it impossible to accurately resolve words with multiple pronunciations based on grammatical context (Heteronyms).

This model attempts to address these issues to optimize inference accuracy and run-time speed. The current architecture employs additional natural language analysis steps, including Part-of-speech (POS) tagging, n-gram segmentation, lemmatization searches, and word stem analysis. Some layers are universal for all text, such as POS tagging, while others are activated when deemed required for the requested word. Layer information is retained with the token in vectorized and tensor operations. This allows morphological variations of seen words, such as plurals, possessives, compounds, inflectional stem affixes, and lemma variations to be resolved with near ground-truth level of accuracy. This also improves out-of-vocabulary (OOV) inferencing accuracy, by truncating individual tensor size and characteristics to be closer to seen data.

The inferencing layer is built as an autoregressive implementation of the forward DeepPhonemizer model, as a 4-layer transformer with 256 hidden units. The pre-trained checkpoint for Aquila Resolve is trained using the CMU Dict v0.7b corpus, with 126,456 unique words. The validation dataset was split as a uniform 5% sample of unique words, sorted by grapheme length. The learning rate was linearly increased during the warmup steps, and step-decreased during fine-tuning.

Symbol Set

The 2 letter ARPAbet symbol set is used, with numbered vowel stress markers.

Vowels

Phoneme	Example	Phoneme	Example	Phoneme	Example	Phoneme
AA0	Balm	AW0	Ourself	EY0	Mayday	OY0
AA1	Bot	AW1	Shout	EY1	Mayday	OY1
AA2	Cot	AW2	Outdo	EY2	airfreight	OY2
AE0	Bat	AY0	Ally	IH0	Cooking	UH0
AE1	Fast	AY1	Bias	IH1	Exist	UH1
AE2	Midland	AY2	Alibi	IH2	Outfit	UH2
AH0	Central	EH0	Enroll	IY0	Lady	UW0
AH1	Chunk	EH1	Bless	IY1	Beak	UW1
AH2	Outcome	EH2	Telex	IY2	Turnkey	UW2
AO0	Story	ER0	Chapter	OW0	Reo
AO1	Adore	ER1	Verb	OW1	So
AO2	Blog	ER2	Catcher	OW2	Cargo

License

The code in this project is released under Apache License 2.0.

References

[^1]: r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation

[^2]: OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.4

Sep 16, 2022

0.1.3

May 24, 2022

This version

0.1.2

May 18, 2022

0.1.1

May 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

Aquila_Resolve-0.1.2-py3-none-any.whl (1.0 MB view details)

Uploaded May 18, 2022 Python 3

File details

Details for the file Aquila_Resolve-0.1.2-py3-none-any.whl.

File metadata

Download URL: Aquila_Resolve-0.1.2-py3-none-any.whl
Upload date: May 18, 2022
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.8.10

File hashes

Hashes for Aquila_Resolve-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d34ab86ca080d7d327d7858ce3e50ade5d245799af063ad4fbab23393c6d0e2d`
MD5	`860541dbc23ddec5dd7cf0a404c8ea7e`
BLAKE2b-256	`934ca552873e93e7534118bb54a4be3288e0135b853eafa7dbdaef2225759f14`