Skip to main content

A simple Nepali stemmer

Project description

Nepali Stemmer

This is a simple Nepali stemmer. It iteratively separates out the suffixes (postpositions) until no more separation can be processed. The algorithm is based on hindi-stemmer.

Features:

  • Iterative separation
  • Handles the postposition attached with punctuations carefully
    • Example: नेपाललाई, -> नेपाल लाई,
  • Basic text cleaning
  • Cross-verification with Nepali dictionary

How to run

>>> from nepali_stemmer.stemmer import NepStemmer
>>> nepstem = NepStemmer()
>>> nepstem.stem("नेपालको एमाले पार्टीका झोले, मण्डलेहरु अमेरिका आउने रे !")                                                                                                      

'नेपाल को एमाले पार्टी का झोले, मण्डले हरु अमेरिका आउने रे !'

To-do:

  • Word transformation with stemming process
  • IR evaluation
  • Code-mixed data

References:

Contact

Email: oyashi

Note: Project created during COVID-19 quarantine out-of-boredom and necessity

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nepali-stemmer-0.0.2.tar.gz (144.8 kB view hashes)

Uploaded Source

Built Distribution

nepali_stemmer-0.0.2-py3-none-any.whl (149.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page