Module to get stress and syllables for words in a given sentence in Lithuanian language.

These details have not been verified by PyPI

Project links

Homepage

Project description

About

At the core of this library is text normalization and word stressing processor from LIEPA speach synthesizer. The native code related to text processing was cut out of the synthesizer library code and wrapped in Python.

License

BSD liscense

Intro

The library takes text in Lithuanian and does following:

Normalizes it. Converts numbers to word reprezentations (e.g. "1" > "vienas").
Splits text into phrases/sentences.
Splits phrases into words
Identifies word syllables
Identifies possible grammar forms of the word, and identifies stressed letter and stress type according the grammar form
Chooses one rule
Returns either structured results or collapsed

Library supports following environments:

Python: 2.7, 3.*
OS: Linux, Windows
Architecture: 32bit, 64bit

Installing

pip install phonology_engine

Using

Normalize text

Conversion from numbers to word representation.

from phonology_engine import PhonologyEngine
pe = PhonologyEngine()
res = pe.normalize_and_collapse('31 kačiukas perbėgo kelią.')
print(res)

Would result in

TRISDEŠIMT VIENAS KAČIUKAS PERBĖGO KELIĄ.

Process

Determining word stresses.

from phonology_engine import PhonologyEngine
pe = PhonologyEngine()
res = pe.process_and_collapse('31 kačiukas perbėgo kelią.', 'utf8_stressed_word')
print(res)

Would result in

TRÌSDEŠIMT VÍENAS KAČIÙKAS PÉRBĖGO KẼLIĄ.

Determining word stresses, syllables, grammar form from word.

from phonology_engine import PhonologyEngine
from pprint import pprint
pe = PhonologyEngine()
res = pe.process(u'31 kačiukas perbėgo kelią.')
for word_details, phrase, normalized_phrase, letter_map in res:
    for word_detail in word_details:
        pprint (word_detail)

Would result in

... 
{'ascii_stressed_word': 'TRI`SDEŠIMT VI^ENAS',
 'normalized': True,
 'number_stressed_word': 'TRI0SDEŠIMT VI1ENAS',
 'span_normalized': (0, 17),
 'span_source': (0, 2),
 'stress_options': {'decoded_options': [{'rule': 'Nekaitomas žodis'}],
                    'options': [(2, 0, 1, 1688)],
                    'selected_index': 0},
 'syllables': [0, 3, 6],
 'utf8_stressed_word': 'TRÌSDEŠIMT VÍENAS',
 'word': 'TRISDEŠIMT VIENAS',
 'word_with_all_numeric_stresses': 'TRI0SDEŠIMT VI1ENAS',
 'word_with_only_multiple_numeric_stresses': 'TRISDEŠIMT VIENAS',
 'word_with_syllables': 'TRI-SDE-ŠIMT VIE-NAS'}
{'ascii_stressed_word': 'kačiu`kas',
 'normalized': True,
 'number_stressed_word': 'kačiu0kas',
 'span_normalized': (18, 26),
 'span_source': (3, 11),
 'stress_options': {'decoded_options': [{'grammatical_case': 'Vardininkas',
                                         'number': 'vienaskaita',
                                         'rule': 'Linksnis ir kamieno tipas',
                                         'stem_type': 0,
                                         'stress_type': 0,
                                         'stressed_letter_index': 4}],
                    'options': [(4, 0, 2, 0)],
                    'selected_index': 0},
 'syllables': [0, 2, 5],
 'utf8_stressed_word': 'kačiùkas',
 'word': 'kačiukas',
 'word_with_all_numeric_stresses': 'kačiu0kas',
 'word_with_only_multiple_numeric_stresses': 'kačiukas',
 'word_with_syllables': 'ka-čiu-kas'}
{'ascii_stressed_word': 'pe^rbėgo',
 'normalized': False,
 'number_stressed_word': 'pe1rbėgo',
 'span_normalized': (27, 34),
 'span_source': (12, 19),
 'stress_options': {'decoded_options': [{'rule': 'Veiksmazodžių kamienas ir '
                                                 'galune (taisytina)'}],
                    'options': [(1, 1, 0, 465)],
                    'selected_index': 0},
 'syllables': [0, 3, 5],
 'utf8_stressed_word': 'pérbėgo',
 'word': 'perbėgo',
 'word_with_all_numeric_stresses': 'pe1rbėgo',
 'word_with_only_multiple_numeric_stresses': 'perbėgo',
 'word_with_syllables': 'per-bė-go'}
{'ascii_stressed_word': 'ke~lią',
 'normalized': False,
 'number_stressed_word': 'ke2lią',
 'span_normalized': (35, 40),
 'span_source': (20, 25),
 'stress_options': {'decoded_options': [{'grammatical_case': 'Galininkas',
                                         'number': 'vienaskaita',
                                         'rule': 'Linksnis ir kamieno tipas',
                                         'stem_type': 2,
                                         'stress_type': 2,
                                         'stressed_letter_index': 1}],
                    'options': [(1, 2, 2, 515)],
                    'selected_index': 0},
 'syllables': [0, 2],
 'utf8_stressed_word': 'kẽlią',
 'word': 'kelią',
 'word_with_all_numeric_stresses': 'ke2lią',
 'word_with_only_multiple_numeric_stresses': 'kelią',
 'word_with_syllables': 'ke-lią'}

References

Kirčiavimas internetu - Online dictionarry with word stresses and grammar annotation, has a GitHub repo. It is likely based on VDU dictionary.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.8

Feb 9, 2020

0.2.7

Feb 2, 2020

0.2.6

Feb 1, 2020

0.2.5

Jan 30, 2020

0.2.4

Jan 29, 2020

0.2.3

Jan 26, 2020

0.2.2

Jan 21, 2020

0.2.1

Jan 12, 2020

0.2.0

Jan 12, 2020

0.1.18

Jan 11, 2020

0.1.17

Jan 11, 2020

0.1.16

Dec 25, 2019

0.1.15

Dec 24, 2019

0.1.14

Dec 29, 2018

0.1.13

Oct 20, 2018

0.1.12

Oct 19, 2018

0.1.10

Oct 17, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

phonology_engine-0.2.8-py2.py3-none-any.whl (2.8 MB view details)

Uploaded Feb 9, 2020 Python 2Python 3

File details

Details for the file phonology_engine-0.2.8-py2.py3-none-any.whl.

File metadata

Download URL: phonology_engine-0.2.8-py2.py3-none-any.whl
Upload date: Feb 9, 2020
Size: 2.8 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/2.7.12

File hashes

Hashes for phonology_engine-0.2.8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`679e44efaf3b6a45266c0c8c346668985a329df317f31d8cf03055e9b323ca3d`
MD5	`04e65023137e7f6dd25c6033eecbeb73`
BLAKE2b-256	`3c434821dabf4971f0e80271f6a901abe4ed4f76c91f7a5d886526dcdecdd497`

See more details on using hashes here.

phonology-engine 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

License

Intro

Installing

Using

Normalize text

Process

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes