Skip to main content

A library to put stress marks in Russian text

Project description

Automatic accentuation for texts in Russian

Accentuation is a common task in such speech-related fields as text-to-speech, speech recognition and language learning. This library is used to mark stressed vowels in given texts using the data from Wiktionary and syntactic analysis of Spacy.

Installation

Python 3.10 and above supported.

pip install tsnorm

General usage

from tsnorm import Normalizer


normalizer = Normalizer(stress_mark=chr(0x301), stress_mark_pos="after")
normalizer("Словно куклой в час ночной теперь он может управлять тобой")

# Output: Сло́вно ку́клой в час ночно́й тепе́рь он мо́жет управля́ть тобо́й

Change stress mark and its position

normalizer = Normalizer(stress_mark="+", stress_mark_pos="before")
normalizer("Трупы оживали, землю разрывали")

# Output: Тр+упы ожив+али, з+емлю разрыв+али

Stress yo (Ё)

normalizer = Normalizer(stress_mark="+", stress_mark_pos="before", stress_yo=True)
normalizer("Погаснет день, луна проснётся, и снова зверь во мне очнётся")

# Output: Пог+аснет день, лун+а просн+ётся, и сн+ова зверь во мне очн+ётся

Stress monosyllabic words

normalizer = Normalizer(stress_mark="+", stress_mark_pos="before", stress_monosyllabic=True)
normalizer("Панки грязи не боятся, кто устал — ушёл сдаваться!")

# Output: П+анки гр+язи н+е бо+ятся, кт+о уст+ал — ушёл сдав+аться!

Change minimum length of words to be stressed

normalizer = Normalizer(stress_mark="+", stress_mark_pos="before", stress_monosyllabic=True)
normalizer("Разбежавшись, прыгну со скалы, вот я был и вот меня не стало")

# Output: Разбеж+авшись, пр+ыгну с+о скал+ы, в+от +я б+ыл +и в+от мен+я н+е ст+ало


normalizer = Normalizer(stress_mark="+", stress_mark_pos="before", stress_monosyllabic=True, min_word_len=2)
normalizer("Разбежавшись, прыгну со скалы, вот я был и вот меня не стало")

# Output: Разбеж+авшись, пр+ыгну с+о скал+ы, в+от я б+ыл и в+от мен+я н+е ст+ало


normalizer = Normalizer(stress_mark="+", stress_mark_pos="before", stress_monosyllabic=True, min_word_len=3)
normalizer("Разбежавшись, прыгну со скалы, вот я был и вот меня не стало")

# Output: Разбеж+авшись, пр+ыгну со скал+ы, в+от я б+ыл и в+от мен+я не ст+ало

Expand normalizer dictionary

from tsnorm import Normalizer, CustomDictionary, WordForm, Lemma, WordFormTags, LemmaPOS


normalizer = Normalizer("+", "before")

normalizer("Охотник Себастьян, что спал на чердаке")
# Output: Ох+отник Себастьян, что спал на чердак+е

dictionary = CustomDictionary(
    word_forms=[
        WordForm("Себастьян", 7, WordFormTags(singular=True, nominative=True), "Себастьян")
    ],
    lemmas=[
        Lemma("Себастьян", LemmaPOS(PNOUN=True))
    ]
)

normalizer.update_dictionary(dictionary)

normalizer("Охотник Себастьян, что спал на чердаке")
# Output: Ох+отник Себасть+ян, что спал на чердак+е

It's also possible to pass CustomDictionary at normalizer initialization:

normalizer = Normalizer("+", "before", custom_dictionary=dictionary)

To add your custom words to normalizer dictionary you must pass two lists to CustomDictionary:

  1. a list of WordForm objects, which are forms of each word with case, tense and lemma information, as well as the positions of stressed letters,
  2. a list of Lemma objects, which are records of lemmas with their parts of speech.

Parts of speech for lemmas are configured using the LemmaPOS class which stores universal POS tags.

Acknowledgement

This library is based on code by @einhornus from his article on Habr.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsnorm-1.1.2-py3-none-any.whl (17.2 MB view details)

Uploaded Python 3

File details

Details for the file tsnorm-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: tsnorm-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for tsnorm-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9cffa03a38f3382a362c4d14f2b0810b0feed547a8deceb3fea0f292218928d9
MD5 333553b34de80987d52dcbada8c8805b
BLAKE2b-256 d699dec2a2b0666e93631b2365974ed5b9bdfb6feab43929462a073b5d742fd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page