A simple Nepali stemmer
Project description
Nepali Stemmer
This is a simple Nepali stemmer. It iteratively separates out the suffixes (postpositions) until no more separation can be processed. The algorithm is based on hindi-stemmer.
Features:
- Iterative separation
- Handles the postposition attached with punctuations carefully
- Example: नेपाललाई, -> नेपाल लाई,
- Basic text cleaning
- Cross-verification with Nepali dictionary
How to run
>>> from nepali_stemmer.stemmer import NepStemmer
>>> nepstem = NepStemmer()
>>> nepstem.stem("नेपालको एमाले पार्टीका झोले, मण्डलेहरु अमेरिका आउने रे !")
'नेपाल को एमाले पार्टी का झोले, मण्डले हरु अमेरिका आउने रे !'
To-do:
- Word transformation with stemming process
- IR evaluation
- Code-mixed data
References:
- Suffix list: https://github.com/birat-bade/NepaliStemmer
- Nepali Dictionary : https://github.com/PraveshKoirala/stemmer
- Algorithm : https://github.com/sainimohit23/hindi-stemmer
Contact
Email: oyashi
Note: Project created during COVID-19 quarantine out-of-boredom and necessity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nepali-stemmer-0.0.2.tar.gz
(144.8 kB
view details)
Built Distribution
File details
Details for the file nepali-stemmer-0.0.2.tar.gz
.
File metadata
- Download URL: nepali-stemmer-0.0.2.tar.gz
- Upload date:
- Size: 144.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c18723468c0fd73cc91ea860e402b6f282cf0885e96919465bc0203747b5d0e8 |
|
MD5 | c3a9adfc949dcaae37740b06f2122f1d |
|
BLAKE2b-256 | 70558ebc655ebf54eca51bd67d684bbb80e7a37111dc145730b2c00edf09df09 |
File details
Details for the file nepali_stemmer-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: nepali_stemmer-0.0.2-py3-none-any.whl
- Upload date:
- Size: 149.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2a446f13ccbd6f81f7cc130a5d293f67ef91392ed785ed65954c7fea011478a |
|
MD5 | b14265913fdec8d7bc8bd2783a9394da |
|
BLAKE2b-256 | a7c153db9fef18d13b1ee6b02044e56fffd04e697225ac03e113c9d8d7db3bdd |