Skip to main content

Finds the lemma of Uzbek words

Project description

Authors

Author1: Maksudbek

Author2: Dasturbek

Lemma & Lemmatization

The package finds lemmas of Uzbek words based on the dictionary.

The process of finding a lemma is called lemmatization.

There are 4 different ways of lemmatization: rule, dictionary, model, hybrid.

It is dictionary-based lemmatization algorithm [program, package].

Install & Clone

pip install UzbekLemma
git clone https://github.com/ddasturbek/UzbekLemma.git

Usage

import UzbekLemma as UL

print(UL.lemmatize("kelganlar")) #kelmoq

The algorithm flowchart

Flowchart algorithm

The dictionary structure

soz_turkumlari

Scientific field

Certificate

Patent

image

Some results of the program

image

Corpus & Results

We collected an equal number of texts from 23 different fields and stored them as a corpus.

We tested all the files (i.e. corpora) in the program and got these results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uzbeklemma-1.0.2.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

UzbekLemma-1.0.2-py3-none-any.whl (4.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page