Skip to main content

Uzbek Lemmatizer for Python

Project description

UzLemma

A Uzbek language lemmatizer for Python

All studies on uzbek language stems have a common statement: stemming of Uzbek language is hard. Uzbek is an agglutinative language with a highly rich morphological structure. Uzbek words are composed of a stem and of affix(es). In Uzbek language, there is two form of affixes: prefixes and suffixes. Affixes are affixed to the stem according to definite grammatical rules. In addition, both stem and affixes may be transformed according to the harmony rules. Those rules and their exceptions make stemming harder for Uzbek texts. For more about stemming Uzbek language please see the article titled "UZBEK AFFIX FINITE STATE MACHINE FOR STEMMING."

All text analysis studies require a stemmer at one point. This Python code attempts to stem Uzbek words with a simple approach. It first extracts syllables of the given word and then tries to identify the stem by comparing syllables with a list of affixes and their allomorphs. If any affix is identified it is removed and then remaining word is searched in a list of Uzbek words. If there is a match in the word list, it is returned as the stem. Otherwise function reiterates with the new word. If it can't stem, it returns the given word.

Once the functions are loaded into Python environment you can begin to stem by using stem function:

stem("maktablarimizning")

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

UzLemma-1.0-py3-none-any.whl (2.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page