Skip to main content

The Uzbek Natural Language Toolkit (NLTK) is a Python package for natural language processing.

Project description

uznltk

https://pypi.org/project/uznltk
https://github.com/UlugbekSalaev/uznltk

uznltk is Uzbek Natural Language ToolKit
It is created as a python library and uploaded to PyPI. It is simply easy to use in your python project or other programming language projects via the API.

About project

The Natural Language Toolkit (NLTK) is a Python package for natural language processing.

Quick links

Demo

You can use web interface.

Features

  • Corpus
  • Morphological annotated dataset
  • Help function

uznltk ????

Natural Language Toolkit for Uzbek ? O?zbek tili uchun NLP kutubxonasi

Function

  • Tokenization into words
  • Sentence segmentation
  • Stop-word identification
  • Normalization of apostrophes in text
  • Extraction of words with punctuation marks

Installatoin

pip install uznltk

## Usage

Three options to run uznltk:

- pip
- API 
- Web interface

### pip installation

To install uznltk, simply run:

```code
pip install uznltk

After installation, use in python like following:

# import the library
from uznltk import Tagger
# create an object 
tagger = Tagger()
# call tagging method
tagger.pos_tag('Bizlar bugun maktabga bormoqchimiz.')
# output
[('Bizlar','NOUN'),('bugun', 'NOUN'), ('maktabga', 'NOUN'), ('bormoqchimiz', 'VERB'), ('.', 'PUNC')]

API

API configurations:

  • Method: GET
  • Response type: string
  • URL: https://nlp.urdu.uz:8080/uznltk/pos_tag
    • Parameters: text:string
  • Sample Request: https://nlp.urdu.uz:8080/uznltk/pos_tag?text=Ular%20maktabga%20borayaptilar.
  • Sample output: [("Ular","NOUN"),("maktabga",""),("borayaptilar",""),(".","PUNC")]

Web-UI

The web interface created to use easily the library: You can use web interface here.

Demo image

POS tag list

Tagger using following options as POS tag:
NOUN Noun {Ot}
VERB Verb {Fe'l}
ADJ Adjective {Sifat}
NUM Numeric {Son}
ADV Adverb {Ravish}
PRN Pronoun {Olmosh}
CNJ Conjunction {Bog'lovchi}
ADP Adposition {Ko'makchi}
PRT Particle {Yuklama}
INTJ Interjection {Undov}
MOD Modal {Modal}
IMIT Imitation {Taqlid}
AUX Auxiliary verb {Yordamchi fe'l}
PPN Proper noun {Atoqli ot}
PUNC Punctuation {Tinish belgi}
SYM Symbol {Belgi}

Result Explaining

The method pos_tag returns list, that an item of the list contain tuples for each token of the text with following format: (token, pos), for POS tag list, see POS Tag List section on above.

Result from tagger method

[('Bizlar','NOUN'),('bugun', 'NOUN'), ('maktabga', 'NOUN'), ('bormoqchimiz', 'VERB'), ('.', 'PUNC')]

Documentation

See here.

Citation

@article{10.1063/5.0241461,
    author = {Salaev, Ulugbek},
    title = {UzMorphAnalyser: A morphological analysis model for the Uzbek language using inflectional endings},
    journal = {AIP Conference Proceedings},
    volume = {3244},
    number = {1},
    pages = {030058},
    year = {2024},
    month = {11},
    abstract = {As Uzbek language is agglutinative, has many morphological features which words formed by combining root and affixes. Affixes play an important role in the morphological analysis of words, by adding additional meanings and grammatical functions to words. Inflectional endings are utilized to express various morphological features within the language. This feature introduces numerous possibilities for word endings, thereby significantly expanding the word vocabulary and exacerbating issues related to data sparsity in statistical models. This paper present modeling of the morphological analysis of Uzbek words, including stemming, lemmatizing, and the extraction of morphological information while considering morpho-phonetic exceptions. Main steps of the model involve developing a complete set of word-ending with assigned morphological information, and additional datasets for morphological analysis. The proposed model was evaluated using a curated test set comprising 5.3K words. Through manual verification of stemming, lemmatizing, and morphological feature corrections carried out by linguistic specialists, it obtained a word-level accuracy of over 91\%. The developed tool based on the proposed model is available as a web-based application and an open-source Python library.},
    issn = {0094-243X},
    doi = {10.1063/5.0241461},
    url = {https://doi.org/10.1063/5.0241461},
    eprint = {https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0241461/20272108/030058\_1\_5.0241461.pdf},
}

Contact

For help and feedback, please feel free to contact the author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uznltk-0.0.5.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uznltk-0.0.5-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file uznltk-0.0.5.tar.gz.

File metadata

  • Download URL: uznltk-0.0.5.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for uznltk-0.0.5.tar.gz
Algorithm Hash digest
SHA256 194375bc50a0bb7346f13e952e38aae9d34ebee00e31888cbd39710a4ec27bc2
MD5 8baf31ba1f0148ea4cdd8ce761bb4f98
BLAKE2b-256 0d190f8c06496347f3863fb4a19a878a23c745f1b89311d62a390ec39a22d3ef

See more details on using hashes here.

File details

Details for the file uznltk-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: uznltk-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for uznltk-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3ce5b0119297ef22b70c7d806b1d1f4c57409da97a67be7a9cd0fc8adcfaa31a
MD5 01a1716fa9d03e5672c594a0ad85c94a
BLAKE2b-256 f92fa2ed9caa1e26dc480c95a1feafec4eec79d92729da1e24e56b007ff2bc8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page