Natural language processing tools for the Dhivehi language.
Project description
dhivehi_nlp
Natural language processing tools for the Dhivehi language.
Installation
pip install dhivehi_nlp
Modules
Tokenizer - Tokenize text into separate sentences or words (tokens).
Stopwords - Remove stopwords from text and return the resulting tokens.
Stemmer - Remove suffixes from words to return their root form.
Language Models - Create language models to predict future additions. Language models will give probability based on selected ngram. An ngram is contiguous sequence of n tokens from the given input text.
Dictionary- Get definitions definitions of Dhivehi words and the word list. Definitions obtained from radheef.mv.
Corpus - Collections of various Dhivehi texts.
Trigram Similarity - Trigram similarity divides words or phrases into sequences of three consecutive letters, placed in a set where the order doesn't matter and duplicates are removed. Used to find string matches even if certain characters are different or out of order, based on similarity value.
Tagger - Tag words in text according to specified rules or patterns. For example, tagging words based on which part of speech it belongs to.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dhivehi_nlp-1.0.7.tar.gz
.
File metadata
- Download URL: dhivehi_nlp-1.0.7.tar.gz
- Upload date:
- Size: 5.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
137a5afa6db0e1fbbc5d5a5b888f5009852df0fce17558b6ec72504c3201505d
|
|
MD5 |
2780767704e05cd5de84c3130a99a5f5
|
|
BLAKE2b-256 |
87a20d1ab67ab4595e1eaba80bd26445542f612140a7c48479d4a9773cdc9a2c
|