Skip to main content

The Uzbek Natural Language Toolkit (NLTK) is a Python package for natural language processing.

Project description

uznltk

uznltk — is a lightweight and convenient NLP (Natural Language Processing) library for the Uzbek language. It includes text cleaning, morphological analysis, number and text conversions, syllable splitting, and many other functions.

🔗 Links

👤 Authors

🔧 Install

pip install uznltk

🚀 Usage

from uznltk import *

📚 Functions

clean_text(text)

Corrects characters specific to the Uzbek language (g', o', ( ’ )).

clean_text("O'zbekistonda ta'lim kuchli rivojlanmoqda")
# Natija: "O‘zbekistonda ta’lim kuchli rivojlanmoqda"

solid_sign(text)

Returns words with a ( ’ ) character as a list.

solid_sign("ta'lim bo'lishi oldindan ma'lum edi")
# Natija: ['ta’lim', 'ma’lum']

lemmatize(text) and stem_word(text)

Extracts the stem of a word.

lemmatize("mexanizatorlashtirilganlardan")
# Natija: "mexanizatorlashtirilgan"

number_to_text(number)

Converts a number to Uzbek text.

number_to_text(54)
# Natija: "ellik to‘rt"

text_to_number(text)

Converts a number in text to numeric form.

text_to_number("yetmish olti")
# Natija: 76

download(name)

Downloads various resources (e.g. books, news).

download("book")

clean_stopword(text)

Removes stop words from the text.

clean_stopword("salom dunyo, biz sen va u bilan bugun maktabga bordik")
# Natija: "salom dunyo, bugun maktabga bordik"

syllables(text)

Divides words into syllables.

syllables("Bizga ma’lum ishlar yuz bermoqda!")
# Natija: ['Biz-ga', 'ma’-lum', 'ish-lar', 'yuz', 'ber-moq-da!']

hyphenation(text)

Each word is divided into syllables and presented in a list.

hyphenation("salom dunyo")
# Natija: ['sa-lom dunyo', 'salom dun-yo']

count_syllable(text)

Counts the number of syllables in the text.

count_syllable("Salom Dunyo")
# Natija: 4

count_text(text)

Counts the number of words in the text.

count_text("Salom Dunyo")
# Natija: 2

split_sentences(text)

Sorts the sentences in the text into lists.

split_sentences("Salom Dunyo. Bugun ob-havo qisman bulutli")
# Natija: ['Salom Dunyo', 'Bugun ob-havo qisman bulutli']

split_words(text)

Extracts only words from the text (without IP, email, emoji, URLs) into a list.

split_words("sen 192.168.1.18 va helloworld@example.com elektron manzilidasan. Manba https://pypi.org")
# Natija: ['sen', 'va', 'elektron', 'manzilidasan', 'Manba']

💡 Information

  • The library is entirely designed for the Uzbek language.
  • It includes basic NLP components such as number processing, lemmatization, and syntacticization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uznltk-0.0.13.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uznltk-0.0.13-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file uznltk-0.0.13.tar.gz.

File metadata

  • Download URL: uznltk-0.0.13.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for uznltk-0.0.13.tar.gz
Algorithm Hash digest
SHA256 7457c318353c725f5f091cd2d387695e1dfbf9891b57c299b8d745e54993dabe
MD5 2e0a12bdca6ecc86c7abb768dabd510c
BLAKE2b-256 e94547b430a9028320b6307b1fc63edbe3470b85564f9203fa675a4e11b6b619

See more details on using hashes here.

File details

Details for the file uznltk-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: uznltk-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for uznltk-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 1be3ff7907aab0502e26b803b8fb5ed21d9dc6ee2b262ca4deb4c796cd821906
MD5 b7824c488a13da3e5fbf7f33e7cdff73
BLAKE2b-256 c785b0835edbb351dc22771581b4b774f0703bf7aa28fb50fd00b51b8d1cc616

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page