No project description provided
Project description
nepali_embedding
Generate text embedding for nepali text
Objective
Any NLP downstream tasks such as text classification, Named Entity Recognition performed on nepali textual data requires embedding of the words as the feature embedding matrix. For english text, there are several ways we can generate embedding such as Word2Vec, Glove, BERT and so on from several open source NLP libraries. However, there seems to be no open source library for such tasks in nepali text. We aim to provide accurate word embedding for nepali text through several NLP based architectures so that it can be used in further NLP downstream tasks in nepali language.
Models used overview
We have developed state of the art models for embedding generation for nepali text.
- Sabda2Vec
- Bakya2Vec
- NepBERT
Usage
Word_embedding
Use this script to get embedding of a word (sabda).
from nepali_embedding.sabda2vec.inference import Sabda2Vec
sabda2vec_obj = Sabda2Vec(model_name = "sabda2vec_sm")
#Get embedding of the token
embedding = sabda2vec_obj.get_embedding("हार")
# Get top similar tokens
top_similar = sabda2vec_obj.get_most_similar("हार",5)
# Get similarity between two tokens
similarity_score = sabda2vec_obj.get_similarity_between_tokens("हार","पराजय")
sabda_to_vec_model: https://www.dropbox.com/s/xkd29spkozoavhk/sabda_to_vec_model?dl=0
sabda_to_vec_model_md: https://www.dropbox.com/s/55m5q4h5ys1l4np/sabda_to_vec_model_md?dl=0
Loading embedding from nepBERT model
from nepali_embedding.nepBERT.embedding_generator import NepBERT
nepbert = NepBERT()
test_sentence = 'आकाश भाई ज्ञानी मन्छे हो'
embedding = nepbert.get_bert_embedding_sentence(test_sentence,)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nepali_embedding-0.1.tar.gz
.
File metadata
- Download URL: nepali_embedding-0.1.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.0.1 pkginfo/1.8.2 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e6f1dae97656040531d6ba0000c3c0dbea6f023f74d9d4e40057d4de6f0a620 |
|
MD5 | 97639b15be9c4fec84ae6860948fce72 |
|
BLAKE2b-256 | d5cd74b782585ff9d46e3cdd6c26eab0b95b1aa36d44808d0f48794998418f4a |
File details
Details for the file nepali_embedding-0.1-py3-none-any.whl
.
File metadata
- Download URL: nepali_embedding-0.1-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.0.1 pkginfo/1.8.2 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7d7afe59be8f0adcd1b3b22996c86fb15797209165c7ea3955e63bc790746dd |
|
MD5 | 0ef32c9236877a7f5f4468cc12d88357 |
|
BLAKE2b-256 | 539d2f5507be049a614a91e5fa22d23a84edf71d8a9e51529cad9a04ef052115 |