A simple Nepali stemmer
Project description
Nepali Stemmer
This is a simple Nepali stemmer. It iteratively separates out the suffixes (postpositions) until no more separation can be processed. The algorithm is based on hindi-stemmer.
Features:
- Iterative separation
- Handles the postposition attached with punctuations carefully
- Example: नेपाललाई, -> नेपाल लाई,
- Basic text cleaning
- Cross-verification with Nepali dictionary
How to run
>>> from nepali_stemmer.stemmer import NepStemmer
>>> nepstem = NepStemmer()
>>> nepstem.stem("नेपालको एमाले पार्टीका झोले, मण्डलेहरु अमेरिका आउने रे !")
'नेपाल को एमाले पार्टी का झोले, मण्डले हरु अमेरिका आउने रे !'
To-do:
- Word transformation with stemming process
- IR evaluation
- Code-mixed data
References:
- Suffix list: https://github.com/birat-bade/NepaliStemmer
- Nepali Dictionary : https://github.com/PraveshKoirala/stemmer
- Algorithm : https://github.com/sainimohit23/hindi-stemmer
Contact
Email: oyashi
Note: Project created during COVID-19 quarantine out-of-boredom and necessity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nepali-stemmer-0.0.2.tar.gz
(144.8 kB
view hashes)
Built Distribution
nepali_stemmer-0.0.2-py3-none-any.whl
(149.0 kB
view hashes)
Close
Hashes for nepali_stemmer-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2a446f13ccbd6f81f7cc130a5d293f67ef91392ed785ed65954c7fea011478a |
|
MD5 | b14265913fdec8d7bc8bd2783a9394da |
|
BLAKE2b-256 | a7c153db9fef18d13b1ee6b02044e56fffd04e697225ac03e113c9d8d7db3bdd |