Skip to main content

Bengali Language Model

Project description

Bengal Language Model

Bengali language model is build with fastai's ULMFit and ready for prediction and classfication task.

NB:

  • This tool mostly followed inltk
  • We separated Bengali part with better evaluation results

Installation

pip install bnlm

Evaluation Result

Language Model

  • Accuracy 48.26% on validation dataset
  • Perplexity: ~22.79

Training

To train with your own corpus follow this repository

Features and API

Download pretrained Model

To start, first download pretrained Language Model and Sentencepiece model

from bnlm.bnlm import download_models

download_models()

Predict N Words

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import predict_n_words
model_path = 'model'
input_sen = "আমি বাজারে"
output = predict_n_words(input_sen, 3, model_path)
print("Word Prediction: ", output)

Get Sentence Encoding

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
encoding = get_sentence_encoding(input_sentence, model_path, sp_model)
print("sentence encoding is: ", encoding)

Get Embedding Vectors

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
embed = get_embedding_vectors(input_sentence, model_path, sp_model)
print("sentence embedding is : ", embed)

Sentence Similarity

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
from bnlm.bnlm import get_similar_sentences
model_path = 'model'
sp_model = "model/bn_spm.model"
sentence_1 = "আমি ভাত খাই।"
sentence_2 = "আমি ভাত খাই।"
sim = get_sentence_similarity(sentence_1, sentence_2, model_path, sp_model)
print("similarity is: ", sim)

Get Simillar Sentences

from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
from bnlm.bnlm import get_similar_sentences

model_path = 'model'
sp_model = "model/bn_spm.model"

input_sentence = "আমি ভাত খাই।"
sen_pred = get_similar_sentences(input_sentence, 3, model_path, sp_model)
print(sen_pred)

Classification

upcomming

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnlm-1.0.0.tar.gz (4.5 kB view hashes)

Uploaded source

Built Distribution

bnlm-1.0.0-py3-none-any.whl (6.5 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page