A simple function for mathematical operations
Project description
Bangla Feature Extractor(BFE)
BFE is a Bangla Natural Language Processing based feature extractor.
Current Features
Installation
pip install bfe
Example
1. CountVectorizer
- Fit n Transform
- Transform
- Get Wordset
Fit n Transform
from bfe import CountVectorizer
ct = CountVectorizer()
X = ct.fit_transform(X) # X is the word features
#Output: the countVectorized matrix form of given features
Transform
from bfe import CountVectorizer
ct = CountVectorizer()
get_mat = ct.transform("রাহাত")
#Output: the countVectorized matrix form of given word
Get Wordset
from bfe import CountVectorizer
ct = CountVectorizer()
ct.get_wordSet()
#Output: get the raw wordset used in training model
2. TfIdf
- Fit n Transform
- Transform
- Coefficients
Fit n Transform
from bfe import TfIdfVectorizer
k = TfIdfVectorizer()
doc = ["কাওছার আহমেদ", "শুভ হাইদার"]
matrix1 = k.fit_transform(doc)
print(matrix1)
'''
Output:
[[0.150515 0.150515 0. 0. ]
[0. 0. 0.150515 0.150515]]
'''
Transform
from bfe import TfIdfVectorizer
k = TfIdfVectorizer()
doc = ["আহমেদ সুমন", "কাওছার করিম"]
matrix2 = k.transform(doc)
print(matrix2)
'''
Output:
[[0.150515 0. 0. 0. ]
[0. 0.150515 0. 0. ]]
'''
Coefficients
from bfe import TfIdfVectorizer
k = TfIdfVectorizer()
doc = ["কাওছার আহমেদ", "শুভ হাইদার"]
k.fit_transform(doc)
wordset, idf = k.coefficients()
print(wordset)
#Output: ['আহমেদ', 'কাওছার', 'হাইদার', 'শুভ']
print(idf)
'''
Output:
{'আহমেদ': 0.3010299956639812, 'কাওছার': 0.3010299956639812, 'হাইদার': 0.3010299956639812, 'শুভ': 0.3010299956639812}
'''
3. Word Embedding
-
Word2Vec
- Training
- Get Word Vector
- Get Similarity
- Get n Similar Words
- Get Middle Word
- Get Odd Words
- Get Similarity Plot
Training
from bfe import BN_Word2Vec
#Training Against Sentences
w2v = BN_Word2Vec(sentences=[['আমার', 'প্রিয়', 'জন্মভূমি'], ['বাংলা', 'আমার', 'মাতৃভাষা']])
w2v.train_Word2Vec()
#Training Against one Dataset
w2v = BN_Word2Vec(corpus_file="path to data or txt file")
w2v.train_Word2Vec()
#Training Against Multiple Dataset
'''
path
->data
->1.txt
->2.txt
->3.txt
'''
w2v = BN_Word2Vec(corpus_path="path/data")
w2v.train_Word2Vec(epochs=25)
After training is done the model "w2v_model" along with it's supportive vector files will be saved to current directory.
If you use any pretrained model, specify it while initializing BN_Word2Vec() . Otherwise no model_name is needed.
Get Word Vector
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_wordVector('আমার')
Get Similarity
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_similarity('ঢাকা', 'রাজধানী')
#Output: 67.457879
Get n Similar Words
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_n_similarWord(['পদ্মা'], n=10)
#Output:
'''
[('সেতুর', 0.5857524275779724),
('মুলফৎগঞ্জ', 0.5773632526397705),
('মহানন্দা', 0.5634652376174927),
("'পদ্মা", 0.5617109537124634),
('গোমতী', 0.5605217218399048),
('পদ্মার', 0.5547558069229126),
('তুলসীগঙ্গা', 0.5274507999420166),
('নদীর', 0.5232067704200745),
('সেতু', 0.5225246548652649),
('সেতুতে', 0.5192927718162537)]
'''
Get Middle Word
Get the probability distribution of the center word given words list.
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_outputWord(['ঢাকায়', 'মৃত্যু'], n=2)
#Output: [("হয়েছে।',", 0.05880642), ('শ্রমিকের', 0.05639163)]
Get Odd Words
Get the most unmatched word out from given words list
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_oddWords(['চাল', 'ডাল', 'চিনি', 'আকাশ'])
#Output: 'আকাশ'
Get Similarity Plot
Creates a barplot of similar words with their probability
from bfe import BN_Word2Vec
w2v = BN_Word2Vec(model_name='give the model name here')
w2v.get_oddWords(['চাল', 'ডাল', 'চিনি', 'আকাশ'])
-
FastText
- Training
- Get Word Vector
- Get Similarity
- Get n Similar Words
- Get Middle Word
- Get Odd Words
Training
from bfe import BN_FastText
#Training Against Sentences
ft = FastText(sentences=[['আমার', 'প্রিয়', 'জন্মভূমি'], ['বাংলা', 'আমার', 'মাতৃভাষা']])
ft.train_fasttext()
#Training Against one Dataset
ft = FastText(corpus_file="path to data or txt file")
ft.train_fasttext()
#Training Against Multiple Dataset
'''
path
->data
->1.txt
->2.txt
->3.txt
'''
ft = FastText(corpus_path="path/data")
ft.train_fasttext(epochs=25)
After training is done the model "ft_model" along with it's supportive vector files will be saved to current directory.
If you use any pretrained model, specify it while initializing BN_FastText() . Otherwise no model_name is needed.
Get Word Vector
from bfe import BN_FastText
ft = BN_FastText(model_name='give the model name here')
ft.get_wordVector('আমার')
Get Similarity
from bfe import BN_FastText
ft = BN_FastText(model_name='give the model name here')
ft.get_similarity('ঢাকা', 'রাজধানী')
#Output: 70.56821120
Get n Similar Words
from bfe" import BN_FastText
ft = BN_FastText(model_name='give the model name here')
ft.get_n_similarWord(['পদ্মা'], n=10)
#Output:
'''
[('পদ্মায়', 0.8103810548782349),
('পদ্মার', 0.794012725353241),
('পদ্মানদীর', 0.7747839689254761),
('পদ্মা-মেঘনার', 0.7573559284210205),
('পদ্মা.', 0.7470568418502808),
('‘পদ্মা', 0.7413997650146484),
('পদ্মাসেতুর', 0.716225266456604),
('পদ্ম', 0.7154797315597534),
('পদ্মহেম', 0.6881639361381531),
('পদ্মাবত', 0.6682782173156738)]
'''
Get Odd Words
Get the most unmatched word out from given words list
from "package_name" import BN_FastText
ft = BN_FastText(model_name='give the model name here')
ft.get_oddWords(['চাল', 'ডাল', 'চিনি', 'আকাশ'])
#Output: 'আকাশ'
Get Similarity Plot
Creates a barplot of similar words with their probability
from bfe import BN_FastText
ft = BN_FastText(model_name='give the model name here')
ft.get_oddWords(['চাল', 'ডাল', 'চিনি', 'আকাশ'])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file test_mark-0.1-py3-none-any.whl
.
File metadata
- Download URL: test_mark-0.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e7f75691ea94a0530489efd8b7cccb9dcda4f2864b66fd1122b6417413888d0 |
|
MD5 | 0ab201eb75343413d3595f8a03b850db |
|
BLAKE2b-256 | 6218322898fdd5ac334632316ee87cf2672075f5c53ba54ea883627817ca8fc6 |