Skip to main content

BNLTK(Bangla Natural Language Processing Toolkit)

Project description

BNLTK

License: MIT Downloads

BNLTK(Bangla Natural Language Processing Toolkit) is an open-source python package for Natural Language Processing in Bangla. It offers functionalities to perform some basic NLP tasks such as Tokenization, Stemming and Parts of speech tagging. BNLTK requires Python version 3.6, 3.7, 3.8, 3.9 or 3.10.

Web documentation: https://ashwoolford.github.io/bnltk/

installation

pip install bnltk 

Note: If you are using version 0.7.6, please see the documentation here

Version 0.7.8 (latest)

Tokenizer

from bnltk.tokenize import Tokenizers
t = Tokenizers()
print(t.bn_word_tokenizer('আজ আবহাওয়া খুব ভালো।'))
# ["আজ", "আবহাওয়া", "খুব", "ভালো", "।"]

Stemmer

from bnltk.stemmer import BanglaStemmer
bn_stemmer = BanglaStemmer()
print(bn_stemmer.stem('হেসেছিলেন'))
# হাসা

Parts of speech tagger

To use the Parts of Speech Tagger, please download the pretrained model's weights. Our trained model achieves an accuracy of 96%

from bnltk.bnltk_downloads import DataFiles
DataFiles.download()	

After successfully downloading the files, you can use this module as follows:

from bnltk.pos_tagger import PosTagger

p_tagger = PosTagger()
print(p_tagger.tagger('দুশ্চিন্তার কোন কারণই নাই'))  
# [('দুশ্চিন্তার', 'NC'), ('কোন', 'JQ'), ('কারণই', 'NC'), ('নাই', 'VM')]

Version 0.7.6

Tokenizer

from bnltk.tokenize import Tokenizers
t = Tokenizers()
print(t.bn_word_tokenizer('আজ আবহাওয়া খুব ভালো।'))
# ["আজ", "আবহাওয়া", "খুব", "ভালো"]

Stemmer

from bnltk.stemmer import BanglaStemmer
bn_stemmer = BanglaStemmer()
print(bn_stemmer.stem('হেসেছিলেন'))
# হাসা

Parts of speech tagger

To use the Parts of Speech Tagger, please download the pretrained model's weights. Our trained model achieves an accuracy of 96%

from bnltk.bnltk_downloads import DataFiles
DataFiles().download()	

After successfully downloading the files, you can use this module as follows:

from bnltk.pos_tagger import PosTagger

p_tagger = PosTagger()
p_tagger.loader()
print(p_tagger.tagger('দুশ্চিন্তার কোন কারণই নাই'))  
# [('দুশ্চিন্তার', 'NC'), ('কোন', 'JQ'), ('কারণই', 'NC'), ('নাই', 'VM')]

Description of the POS tag set

Categories Types
Noun (N) Common (NC)
Proper (NP)
Verbal (NV)
Spatio-temporal (NST)
Pronoun (P) Pronominal (PPR)
Reflexive (PRF)
Reciprocal (PRC)
Relative (PRL)
Wh (PWH)
Nominal Modifier (J) Adjectives (JJ)
Quantifiers (JQ)
Demonstratives (D) Absolutive (DAB)
Relative (DRL)
Wh (DWH)
Adverb (A) Manner (AMN)
Location (ALC)
Participle (L) Relative (LRL)
Verbal (LV)
Postposition (PP)
Particles (C) Coordinating (CCD)
Subordinating (CSB)
Classifier (CCL)
Interjection (CIN)
Others (CX)
Punctuations (PU)
Residual (RD) Foreign Word (RDF)
Symbol (RDS)
Other (RDX)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnltk-0.7.8.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

bnltk-0.7.8-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file bnltk-0.7.8.tar.gz.

File metadata

  • Download URL: bnltk-0.7.8.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for bnltk-0.7.8.tar.gz
Algorithm Hash digest
SHA256 e29c7d79fb8ee0207b7de29fe535a87811c3f28cdab9318119b68f3774b7ec54
MD5 2e986528356527095f55c1e73ddfb73d
BLAKE2b-256 068ccf51ad8eb76a5d0d1f98531fbc970f68ce470ac9f43382148b67fccb9950

See more details on using hashes here.

File details

Details for the file bnltk-0.7.8-py3-none-any.whl.

File metadata

  • Download URL: bnltk-0.7.8-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for bnltk-0.7.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8613526830886a43682b24df16e705964d8a56d07e023a1fe6c334f012979a31
MD5 0abf8a0d1e8e645bb14af99f4d34cde4
BLAKE2b-256 b4c038be287c6efab04508711caa7ef3fad1d28fda38b5b595bde45fcbc12218

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page