Skip to main content

Uzbek Lemmatizer for Python

Project description

UzLemma

A Uzbek language lemmatizer for Python

All studies on uzbek language stems have a common statement: stemming of Uzbek language is hard. Uzbek is an agglutinative language with a highly rich morphological structure. Uzbek words are composed of a stem and of affix(es). In Uzbek language, there is two form of affixes: prefixes and suffixes. Affixes are affixed to the stem according to definite grammatical rules. In addition, both stem and affixes may be transformed according to the harmony rules. Those rules and their exceptions make stemming harder for Uzbek texts. For more about stemming Uzbek language please see the article titled "UZBEK AFFIX FINITE STATE MACHINE FOR STEMMING."

All text analysis studies require a stemmer at one point. This Python code attempts to stem Uzbek words with a simple approach. It first extracts syllables of the given word and then tries to identify the stem by comparing syllables with a list of affixes and their allomorphs. If any affix is identified it is removed and then remaining word is searched in a list of Uzbek words. If there is a match in the word list, it is returned as the stem. Otherwise function reiterates with the new word. If it can't stem, it returns the given word.

Once the functions are loaded into Python environment you can begin to stem by using stem function:

stem("maktablarimizning")

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

UzLemma-1.0-py3-none-any.whl (2.7 kB view details)

Uploaded Python 3

File details

Details for the file UzLemma-1.0-py3-none-any.whl.

File metadata

  • Download URL: UzLemma-1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.56.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1

File hashes

Hashes for UzLemma-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba67c9ba3c47643cee20c07c590b18d1c2dad187e0d31f843c95369699cdec4b
MD5 bfe9ae6147983b46c3bff624d033a346
BLAKE2b-256 9f7d24f645fefe3e3505590cb1056e491d6aaa55040403600bdf3187723902bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page