Skip to main content

Natural language processing library for Nepali langauge

Project description

This projects aims to build a library for all the NLP processes for Nepali Language.

Getting the module

!pip install git+https://github.com/sushil79g/Nepali_nlp.git

Loading Embedding

from Nepali_nlp import Embeddings
word_vec = Embeddings().load_large_vector()
#word_vec = Embeddings().load_vector() #For small Embedding
#from fasttext_embedding import Fasttext
#word_vec = Fasttext().load()

For Nepali Synonym

from Nepali_nlp import Synonym
Synonym().raw_synonym(word = 'माया',word_vec=word_vec) #method: 1
#output -> स्नेह','प्रेम','आदर','मायाँ','दया','मायालु','श्रद्धा','आत्मियता','स्पर्श','तिमी
Synonym().filter_synonym(word = 'साथी',word_vec=word_vec) #method: 2
#output -> 'भाइहरू','सहपाठी','प्रेमी','दाइ','प्रेमि','बहिनी'

Word-spell corrector

from Nepali_nlp import Corrector
Corrector().corrector(word='सुशल') #In a very raw stage for now.
#output-> ['सुशील', 'सुशील']
Corrector().spell_correct("कस्त भको हेरौ है")
#output-> "कस्तो भयो हेर है"

Nepali text summerizer

from Nepali_nlp import Summerize
Summerize().show_summary(word_vec,text, length_sentence_predict=5)

Nepali unicode to Devnagiri Font

from Nepali_nlp import Unicode
text = 'ma ghara jaanchhu'
Unicode().unicode_word(text) #output-> 'म घर जान्छु'

Preeti-font character to Devnagiri Font

from Nepali_nlp import preeti
unicode_word = 'g]kfnL'
print(preeti(unicode_word)) #output-> नेपाली

OCR(optical character reader)

from Nepali_nlp import OCR
text = OCR(image_location)

Nepali Tokenizer

from Nepali_nlp import Tokenizer
Tokenizer().sentence_tokenize(text) #To tokenize sentence
Tokenizer().word_tokenize(text) #To tokenize word
Tokenizer().character_tokenize(text) #To tokenize character

Nepali sentence similarity

from Nepali_nlp import  Avg_vector_similar
sentences = ["कुपोषणकै कारण शारीरिक र मानसिक रुपमा कमजोर मात्र होइन, अकालमै ज्यान पनि गुमाउनुको परेको समाचार बग्रेल्ती सुन्न सकिन्छ","कर्णाली प्रदेश सामाजिक विकास मन्त्रालयले उपलब्ध गराएको तथ्यांकले कर्णालीमा प्रत्येक वर्ष जन्मिएका ५ वर्षमुनीका बालबालिका १ हजार जनामध्ये ५८ जनाले ज्यान गुमाउँदै आएको देखाएको छ"]
Avg_vector_similar().pair_similarity(word_vec, sentences) #output-> 0.6817289590835571

Nepali new-portal Scrapper (onlinekhabar and ekantipur for now)

from Nepali_nlp import extract_news
news_link = 'https://www.onlinekhabar.com/2019/12/821094'
title, news = extract_news(news_link) #onlinekhabar and ekantipur is supported at the moment.

Show latest news summary

from Nepali_nlp import UpdateNews
title, links, summerized_news = UpdateNews().show_latest(word_vec=word_vec,portal='onlinekhabar',number_of_news=5) #ekantipur portal is also supported

TODOs:

  • Nepali Embeddings
  • Tokenizers (sentence, word, character)
  • Stop Words
  • Nepali Words Collection
  • Nepali Word synonym
  • Roman Nepali to Nepali
  • Nepali OCR
  • Summerization
  • Pos_tag
  • Sentence similarity score
  • Spell correction
  • Named Entity Recognition (Currently)
  • Translation(Nepali<->English)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Nepali nlp-0.0.0.tar.gz (14.8 kB view details)

Uploaded Source

File details

Details for the file Nepali nlp-0.0.0.tar.gz.

File metadata

  • Download URL: Nepali nlp-0.0.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.2

File hashes

Hashes for Nepali nlp-0.0.0.tar.gz
Algorithm Hash digest
SHA256 8908a06ba488a49b49d370fb8e67f150d79295374fb3f6f0272ccc2d91cb324b
MD5 d308a3ac0dee3c7cdf60be112dc1b338
BLAKE2b-256 5406cbe1c44596b2ec24942379237f44ed31f503c3f9a09a3406ee925d3da05c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page