No project description provided

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Bangla FastText Model & Toolkit

We have constructed a dataset that contains Bangla text data for training unsupervised ML model, and it contains around 14 GB of text data. One of the largest in Bengali Language model called BanglaLM: Bangla Corpus For Language Model Research. The Bangla FastText model had been developed based on this dataset. We used google cloud to train model. We developed two models based on skipgram and cbow training method. This is open source python module to use these two models easily. We also developed sentence embedding systems for the using of sklearn classifiers. It showed better perfromance than facebook pretrained fasttext model on Bangla Wikidataset.

Dataset (Bengali)

Link for the BanglaLM dataset :

-> Github: BanglaLM: Bangla Corpus For Language Model Research

-> Kaggle: BanglaLM: Bangla Corpus For Language Model Research

Model link:

Bangla FastText

Installation

To install the latest release, we can do :

!pip install BanglaFastText

or, to get the latest development version of BanglaFastText, we can install from our github repository :

$ https://github.com/Kowsher/Bangla-Fasttext.git
$ cd Bangla-Fasttext
$ sudo pip install .
$ # or :
$ sudo python setup.py install

For further information and introduction see README.md

Getting started

In order to learn word vectors, as described here, BanglaFastText function like this:

import BanglaFastText

#There are two variations of training methods cbow and skipgram. By default, it's cbow method and the model preparation path is set default as current working directory 
>>> Bn = BanglaFastText.BanglaFasttext()

#If want to save the model in manual path, we can by using 'sav_path' parameter.

# Skipgram model :
>>> Bn = BanglaFastText.BanglaFasttext(method='skipgram', save_path = './content/model')
# 'path' is the directory to save the downloaded model
>>> model = Bn.model_load()

# or, cbow model :
>>> Bn = BanglaFastText.BanglaFasttext(method='cbow', save_path = './content/model')
>>> model = Bn.model_load()

Where method parameter is to choose the training method and path is to save model.

Loading a model object

If we have already model then we can simply read and load the model as :

# To read a model
>>> Bn = BanglaFastText.BanglaFasttext(model_path = './model_name')

# to load the model as object we can
>>> model = Bn.model_load()

Playing with the parameters

# to get vector of a word
>>> model['দেশ']

# to get most similar words
>>> model.most_similar("দেশ")

# to find word similarity
>>> Bn.word_similarity('কিতাব', 'বই')

# to find sentence similarity
>>> Bn.sent_similarity('আমি দেশকে ভালোবাসি', 'অনেক সুন্দর আমাদের দেশ')

#  for sentence embedding 
>>> corpus = ['আমি দেশকে ভালোবাসি', 'অনেক সুন্দর আমাদের দেশ']
>>> X = Bn.sent_embd(corpus)

Fine Tuning

If we want to fine tuning or update weights by our dataset

>>> corpus = ['আমি দেশকে ভালোবাসি', 'অনেক সুন্দর আমাদের দেশ']
>>> Bn.fine_tuning(corpus, epochs=5)
>>> model = Bn.model_load()

......
>>> tuned_model = Bn.fine_tuning(corpus, epochs=5) # to get the raw model after finetuned, if we want to use it further

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.3

Jul 12, 2021

2.2

Jul 12, 2021

2.1

Jun 28, 2021

1.1

Jun 28, 2021

0.6

Jun 28, 2021

0.5

Jun 28, 2021

0.4

Jun 24, 2021

0.3

Jun 23, 2021

0.2

Jun 22, 2021

0.1

Jun 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

BanglaFastText-2.3-py3-none-any.whl (4.0 kB view details)

Uploaded Jul 12, 2021 Python 3

File details

Details for the file BanglaFastText-2.3-py3-none-any.whl.

File metadata

Download URL: BanglaFastText-2.3-py3-none-any.whl
Upload date: Jul 12, 2021
Size: 4.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for BanglaFastText-2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f74d79ebfc3b5a8bae6c7e84252bed4be73a0c89d8121bc7305cd377511dee10`
MD5	`379d00600da836516af242dfe3bfb0ca`
BLAKE2b-256	`6b4f8e9d796f11714e366ceb8efdbdf1a89c7244c7e9ba4cce6f395bb254f28c`

See more details on using hashes here.

BanglaFastText 2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Bangla FastText Model & Toolkit

Dataset (Bengali)

Link for the BanglaLM dataset :

Model link:

Installation

Getting started

Loading a model object

Playing with the parameters

Fine Tuning

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes