Bengali Language Model
Project description
Bengal Language Model
Bengali language model is build with fastai's ULMFit and ready for prediction
and classfication
task.
NB:
- This tool mostly followed inltk
- We separated
Bengali
part with better evaluation results
Installation
pip install bnlm
Evaluation Result
Language Model
- Accuracy 48.26% on validation dataset
- Perplexity: ~22.79
Training
To train with your own corpus follow this repository
Features and API
Download pretrained Model
To start, first download pretrained Language Model and Sentencepiece model
from bnlm.bnlm import download_models
download_models()
Predict N Words
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import predict_n_words
model_path = 'model'
input_sen = "আমি বাজারে"
output = predict_n_words(input_sen, 3, model_path)
print("Word Prediction: ", output)
Get Sentence Encoding
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
encoding = get_sentence_encoding(input_sentence, model_path, sp_model)
print("sentence encoding is: ", encoding)
Get Embedding Vectors
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
embed = get_embedding_vectors(input_sentence, model_path, sp_model)
print("sentence embedding is : ", embed)
Sentence Similarity
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_sentence_encoding
from bnlm.bnlm import get_similar_sentences
model_path = 'model'
sp_model = "model/bn_spm.model"
sentence_1 = "আমি ভাত খাই।"
sentence_2 = "আমি ভাত খাই।"
sim = get_sentence_similarity(sentence_1, sentence_2, model_path, sp_model)
print("similarity is: ", sim)
Get Simillar Sentences
from bnlm.bnlm import BengaliTokenizer
from bnlm.bnlm import get_embedding_vectors
from bnlm.bnlm import get_similar_sentences
model_path = 'model'
sp_model = "model/bn_spm.model"
input_sentence = "আমি ভাত খাই।"
sen_pred = get_similar_sentences(input_sentence, 3, model_path, sp_model)
print(sen_pred)
Classification
upcomming
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bnlm-1.0.0.tar.gz
(4.5 kB
view details)
Built Distribution
bnlm-1.0.0-py3-none-any.whl
(6.5 kB
view details)
File details
Details for the file bnlm-1.0.0.tar.gz
.
File metadata
- Download URL: bnlm-1.0.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5eb8d80c80fc25beee0a16035974f60ed58c2d71532eab5bec97f4f8979d649 |
|
MD5 | a3edaa9076ff313573733f3d9355fb02 |
|
BLAKE2b-256 | cc75369394d455c7b5068091cca7609bacb50ad120c56cb1f13d33014ad38758 |
File details
Details for the file bnlm-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: bnlm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 142135b438c5b361d455aa9ef876c2bd239a1b2ab2fc6a3d8123633cbb60eb80 |
|
MD5 | c415dd79555b2682432f608f89ed7f53 |
|
BLAKE2b-256 | 91ccde65b81d2b4c013bd5d829e83b919ae1b7f62691f4cced31a6aab6a75fe5 |