Skip to main content

Module to get stress and syllables for words in a given sentence in Lithuanian language.

Project description

Build status PyPI

About

At the core of this library is text normalization and word stressing processor from LIEPA speach synthesizer. The native code related to text processing was cut out of the synthesizer library code and wrapped in Python.

License

Intro

The library takes text in Lithuanian and does following:

  • Normalizes it. Converts numbers to word reprezentations (e.g. "1" > "vienas").
  • Splits text into phrases/sentences.
  • Splits phrases into words
  • Identifies word syllables
  • Identifies possible grammar forms of the word, and identifies stressed letter and stress type according the grammar form
  • Chooses one rule
  • Returns either structured results or collapsed

Library supports following environments:

  • Python: 2.7, 3.*
  • OS: Linux, Windows
  • Architecture: 32bit, 64bit

Installing

pip install phonology_engine

Using

Normalize text

Conversion from numbers to word representation.

from phonology_engine import PhonologyEngine
pe = PhonologyEngine()
res = pe.normalize_and_collapse('31 kačiukas perbėgo kelią.')
print(res)

Would result in

TRISDEŠIMT VIENAS KAČIUKAS PERBĖGO KELIĄ.

Process

Determining word stresses.

from phonology_engine import PhonologyEngine
pe = PhonologyEngine()
res = pe.process_and_collapse('31 kačiukas perbėgo kelią.', 'utf8_stressed_word')
print(res)

Would result in

TRÌSDEŠIMT VÍENAS KAČIÙKAS PÉRBĖGO KẼLIĄ.

Determining word stresses, syllables, grammar form from word.

from phonology_engine import PhonologyEngine
from pprint import pprint
pe = PhonologyEngine()
res = pe.process(u'31 kačiukas perbėgo kelią.')
for word_details, phrase, normalized_phrase, letter_map in res:
    for word_detail in word_details:
        pprint (word_detail)

Would result in

... 
{'ascii_stressed_word': 'TRI`SDEŠIMT VI^ENAS',
 'normalized': True,
 'number_stressed_word': 'TRI0SDEŠIMT VI1ENAS',
 'span_normalized': (0, 17),
 'span_source': (0, 2),
 'stress_options': {'decoded_options': [{'rule': 'Nekaitomas žodis'}],
                    'options': [(2, 0, 1, 1688)],
                    'selected_index': 0},
 'syllables': [0, 3, 6],
 'utf8_stressed_word': 'TRÌSDEŠIMT VÍENAS',
 'word': 'TRISDEŠIMT VIENAS',
 'word_with_all_numeric_stresses': 'TRI0SDEŠIMT VI1ENAS',
 'word_with_only_multiple_numeric_stresses': 'TRISDEŠIMT VIENAS',
 'word_with_syllables': 'TRI-SDE-ŠIMT VIE-NAS'}
{'ascii_stressed_word': 'kačiu`kas',
 'normalized': True,
 'number_stressed_word': 'kačiu0kas',
 'span_normalized': (18, 26),
 'span_source': (3, 11),
 'stress_options': {'decoded_options': [{'grammatical_case': 'Vardininkas',
                                         'number': 'vienaskaita',
                                         'rule': 'Linksnis ir kamieno tipas',
                                         'stem_type': 0,
                                         'stress_type': 0,
                                         'stressed_letter_index': 4}],
                    'options': [(4, 0, 2, 0)],
                    'selected_index': 0},
 'syllables': [0, 2, 5],
 'utf8_stressed_word': 'kačiùkas',
 'word': 'kačiukas',
 'word_with_all_numeric_stresses': 'kačiu0kas',
 'word_with_only_multiple_numeric_stresses': 'kačiukas',
 'word_with_syllables': 'ka-čiu-kas'}
{'ascii_stressed_word': 'pe^rbėgo',
 'normalized': False,
 'number_stressed_word': 'pe1rbėgo',
 'span_normalized': (27, 34),
 'span_source': (12, 19),
 'stress_options': {'decoded_options': [{'rule': 'Veiksmazodžių kamienas ir '
                                                 'galune (taisytina)'}],
                    'options': [(1, 1, 0, 465)],
                    'selected_index': 0},
 'syllables': [0, 3, 5],
 'utf8_stressed_word': 'pérbėgo',
 'word': 'perbėgo',
 'word_with_all_numeric_stresses': 'pe1rbėgo',
 'word_with_only_multiple_numeric_stresses': 'perbėgo',
 'word_with_syllables': 'per-bė-go'}
{'ascii_stressed_word': 'ke~lią',
 'normalized': False,
 'number_stressed_word': 'ke2lią',
 'span_normalized': (35, 40),
 'span_source': (20, 25),
 'stress_options': {'decoded_options': [{'grammatical_case': 'Galininkas',
                                         'number': 'vienaskaita',
                                         'rule': 'Linksnis ir kamieno tipas',
                                         'stem_type': 2,
                                         'stress_type': 2,
                                         'stressed_letter_index': 1}],
                    'options': [(1, 2, 2, 515)],
                    'selected_index': 0},
 'syllables': [0, 2],
 'utf8_stressed_word': 'kẽlią',
 'word': 'kelią',
 'word_with_all_numeric_stresses': 'ke2lią',
 'word_with_only_multiple_numeric_stresses': 'kelią',
 'word_with_syllables': 'ke-lią'}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

phonology_engine-0.2.8-py2.py3-none-any.whl (2.8 MB view details)

Uploaded Python 2Python 3

File details

Details for the file phonology_engine-0.2.8-py2.py3-none-any.whl.

File metadata

  • Download URL: phonology_engine-0.2.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/2.7.12

File hashes

Hashes for phonology_engine-0.2.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 679e44efaf3b6a45266c0c8c346668985a329df317f31d8cf03055e9b323ca3d
MD5 04e65023137e7f6dd25c6033eecbeb73
BLAKE2b-256 3c434821dabf4971f0e80271f6a901abe4ed4f76c91f7a5d886526dcdecdd497

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page