Skip to main content

Natural language processing tools for the Dhivehi language.

Project description

dhivehi-nlp logo

dhivehi_nlp

Natural language processing tools for the Dhivehi language.

Demo website to test features: https://dhivehi-nlp.herokuapp.com/

Read the docs: https://dhivehi-nlp.herokuapp.com/docs/index.html

Installation

pip install dhivehi_nlp

Modules

Tokenizer - Tokenize text into separate sentences or words (tokens).

Stopwords - Remove stopwords from text and return the resulting tokens.

Stemmer - Remove suffixes from words to return their root form.

Language Models - Create language models to predict future additions. Language models will give probability based on selected ngram. An ngram is a contiguous sequence of n tokens from the given input text. Use previously built models to predict future words.

Dictionary - Get definitions of Dhivehi words and the word list. Definitions obtained from radheef.mv.

Corpus - Collections of various Dhivehi texts.

Trigram Similarity - Divides words or phrases into sequences of three consecutive letters, placed in a set where the order doesn't matter and duplicates are removed. Used to find string matches even if certain characters are different or out of order, based on similarity value.

Tagger - Tag words in text according to specified rules or patterns. For example, tagging words based on which part of speech it belongs to.

Contribution

There are many potential improvements to be made to this library whether it be optimizing current modules, creating new modules or fixing bugs. For instance, the stemmer and stopwords modules use a predefined set of rules to perform their operations. These rule sets are still incomplete and can be expanded upon.

To propose any changes, simply open a pull request.

Bug reports, suggestions, questions, etc., can be done by creating a new issue.

Code formatting is done using black to ensure consistency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dhivehi_nlp-1.0.13.tar.gz (4.8 MB view details)

Uploaded Source

File details

Details for the file dhivehi_nlp-1.0.13.tar.gz.

File metadata

  • Download URL: dhivehi_nlp-1.0.13.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.0

File hashes

Hashes for dhivehi_nlp-1.0.13.tar.gz
Algorithm Hash digest
SHA256 d0db80759393045504ab36de4aabece7a7a2ad7f390788cfa033c50c66b31add
MD5 58c263623a4a0e1a929a91ffc1219cf4
BLAKE2b-256 597c75fcbc82db7616388af1034b1fd71f09bb9c2e3101c6fa17aee5418ac48f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page