Skip to main content

Bangla Natural Language Processing Toolkit

Project description

Bangla NLTK

banglanltk is a python package for Bengali Natural Language Processing Toolkit. It includes modules for Cleaning Text, Word Tokenization, Sentence Tokenization, Stemming, Synonym and Parts of speech tagging.

Installation

pip install banglanltk

Usage

Cleaning Text

import banglanltk as bn
s = 'আজ আকাশ পরিষ্কার!!! মনে হয় আজ আর বৃষ্টি হবে না .........!'

print(bn.clean_text(s))

Word Tokenization

import banglanltk as bn

s = 'প্রাচীন কালে মানুষ একসময় সংখ্যা বুঝানোর জন্য ঝিনুক, নুড়ি, দড়ির গিট ইত্যাদি ব্যবহার করত।'
print(bn.word_tokenize(s))

Sentence Tokenization

import banglanltk as bn

s = ''' কম্পিউটার শব্দটি গ্রিক "কম্পিউট" শব্দ থেকে এসেছে। Compute শব্দের অর্থ গণনা করা। আর কম্পিউটার শব্দের অর্থ গণনাকারী যন্ত্র। '''
print(bn.sent_tokenize(s))

Stemming

import banglanltk as bn

# For single word
print(bn.stemmer('শান্তিনিকেতনে'))

# For multiple words
text = 'আজ বৃষ্টি হবে।'
words = bn.word_tokenize(text)
for w in words:
    print(bn.stemmer(w))

Synonym

import banglanltk as bn

print(bn.synonym('হাত'))

POS Tagging

import banglanltk as bn

# For single word
print(bn.pos_tag('কম্পিউটার'))

# For multiple words
text = 'আজ বৃষ্টি হবে।'
words = bn.word_tokenize(text)
for w in words:
    print(bn.pos_tag(w))

List of POS tags

POS Meaning
CC Conjunction
CD Cardinal number
DM Demonstrative
DT Determiner
EX Existential there
FW Foreign word
IN Preposition
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
MD Modal
NN Noun, singular or mass
NNP Proper noun, singular
NNS Noun, plural
NNV Verbal Noun
PR Pronoun
PRP Personal pronoun
PRP$ Possessive pronoun
PSP Postposition
RB Adverb
RBR Adverb, comparative
RP Particles
SYM Symbol
TO to
UH Interjection
UNK Unknown tag
VB Verb, base form
VBD Verb, past tense
VBG Verb, present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
WDT Wh-determiner
WH Wh words
WP Wh-pronoun
WRB Wh-adverb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

banglanltk-0.0.4-py3-none-any.whl (462.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page