Skip to main content

Bengali Language Model

Project description

Bengal Language Model

Bengali language model is build with fastai's ULMFit and ready for prediction and classfication task.

NB:

  • This tool mostly followed inltk
  • We separated Bengali part with better evaluation results

Installation

pip install bnlm

Evaluation Result

Language Model

  • Accuracy 48.26% on validation dataset
  • Perplexity: ~22.79

Training

To train with your own corpus follow this repository

Features and API

Download pretrained Model

To start, first download pretrained Language Model and Sentencepiece model

from bnlm.bnlm import download_models

download_models()

Predict N Words

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import predict_n_words
model_path = 'model'
input_sen = "আমি বাজারে"
output = predict_n_words(input_sen, 3, model_path)
print("Word Prediction: ", output)

Get Sentence Encoding

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
encoding = get_sentence_encoding(input_sentence, model_path, sp_model)
print("sentence encoding is: ", encoding)

Get Embedding Vectors

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
embed = get_embedding_vectors(input_sentence, model_path, sp_model)
print("sentence embedding is : ", embed)

Sentence Similarity

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
from bnlm.bnlm import get_similar_sentences
model_path = 'model'
sp_model = "model/bn_spm.model"
sentence_1 = "আমি ভাত খাই।"
sentence_2 = "আমি ভাত খাই।"
sim = get_sentence_similarity(sentence_1, sentence_2, model_path, sp_model)
print("similarity is: ", sim)

Get Simillar Sentences

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
from bnlm.bnlm import get_similar_sentences

model_path = 'model'
sp_model = "model/bn_spm.model"

input_sentence = "আমি ভাত খাই।"
sen_pred = get_similar_sentences(input_sentence, 3, model_path, sp_model)
print(sen_pred)

Classification

upcomming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnlm-1.0.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

bnlm-1.0.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file bnlm-1.0.0.tar.gz.

File metadata

  • Download URL: bnlm-1.0.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9

File hashes

Hashes for bnlm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d5eb8d80c80fc25beee0a16035974f60ed58c2d71532eab5bec97f4f8979d649
MD5 a3edaa9076ff313573733f3d9355fb02
BLAKE2b-256 cc75369394d455c7b5068091cca7609bacb50ad120c56cb1f13d33014ad38758

See more details on using hashes here.

File details

Details for the file bnlm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bnlm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9

File hashes

Hashes for bnlm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 142135b438c5b361d455aa9ef876c2bd239a1b2ab2fc6a3d8123633cbb60eb80
MD5 c415dd79555b2682432f608f89ed7f53
BLAKE2b-256 91ccde65b81d2b4c013bd5d829e83b919ae1b7f62691f4cced31a6aab6a75fe5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page