A simple Nepali stemmer
Project description
Nepali Stemmer
This is a simple Nepali stemmer. It iteratively separates out the suffixes (postpositions) until no more separation can be processed. The algorithm is based on hindi-stemmer.
Features:
- Iterative separation
- Handles the postposition attached with punctuations carefully
- Example: नेपाललाई, -> नेपाल लाई,
- Basic text cleaning
- Cross-verification with Nepali dictionary
How to run
>>> from nepali_stemmer.stemmer import NepStemmer
>>> nepstem = NepStemmer()
>>> nepstem.stem("नेपालको एमाले पार्टीका झोले, मण्डलेहरु अमेरिका आउने रे !")
'नेपाल को एमाले पार्टी का झोले, मण्डले हरु अमेरिका आउने रे !'
To-do:
- Word transformation with stemming process
- IR evaluation
- Code-mixed data
References:
- Suffix list: https://github.com/birat-bade/NepaliStemmer
- Nepali Dictionary : https://github.com/PraveshKoirala/stemmer
- Algorithm : https://github.com/sainimohit23/hindi-stemmer
Contact
Email: oyashi
Note: Project created during COVID-19 quarantine out-of-boredom and necessity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nepali-stemmer-0.0.2.tar.gz.
File metadata
- Download URL: nepali-stemmer-0.0.2.tar.gz
- Upload date:
- Size: 144.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c18723468c0fd73cc91ea860e402b6f282cf0885e96919465bc0203747b5d0e8
|
|
| MD5 |
c3a9adfc949dcaae37740b06f2122f1d
|
|
| BLAKE2b-256 |
70558ebc655ebf54eca51bd67d684bbb80e7a37111dc145730b2c00edf09df09
|
File details
Details for the file nepali_stemmer-0.0.2-py3-none-any.whl.
File metadata
- Download URL: nepali_stemmer-0.0.2-py3-none-any.whl
- Upload date:
- Size: 149.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2a446f13ccbd6f81f7cc130a5d293f67ef91392ed785ed65954c7fea011478a
|
|
| MD5 |
b14265913fdec8d7bc8bd2783a9394da
|
|
| BLAKE2b-256 |
a7c153db9fef18d13b1ee6b02044e56fffd04e697225ac03e113c9d8d7db3bdd
|