Skip to main content

No project description provided

Project description

nepali_embedding

Generate text embedding for nepali text

Objective

Any NLP downstream tasks such as text classification, Named Entity Recognition performed on nepali textual data requires embedding of the words as the feature embedding matrix. For english text, there are several ways we can generate embedding such as Word2Vec, Glove, BERT and so on from several open source NLP libraries. However, there seems to be no open source library for such tasks in nepali text. We aim to provide accurate word embedding for nepali text through several NLP based architectures so that it can be used in further NLP downstream tasks in nepali language.

Models used overview

We have developed state of the art models for embedding generation for nepali text.

  • Sabda2Vec
  • Bakya2Vec
  • NepBERT

Usage

Word_embedding

Use this script to get embedding of a word (sabda).

from nepali_embedding.sabda2vec.inference import Sabda2Vec
sabda2vec_obj = Sabda2Vec(model_name = "sabda2vec_sm")
#Get embedding of the token
embedding = sabda2vec_obj.get_embedding("हार")
# Get top similar tokens
top_similar = sabda2vec_obj.get_most_similar("हार",5)
# Get similarity between two tokens
similarity_score = sabda2vec_obj.get_similarity_between_tokens("हार","पराजय")
    sabda_to_vec_model: https://www.dropbox.com/s/xkd29spkozoavhk/sabda_to_vec_model?dl=0
    sabda_to_vec_model_md: https://www.dropbox.com/s/55m5q4h5ys1l4np/sabda_to_vec_model_md?dl=0

Loading embedding from nepBERT model

from nepali_embedding.nepBERT.embedding_generator import NepBERT
nepbert = NepBERT()
test_sentence = 'आकाश भाई ज्ञानी मन्छे हो'
embedding = nepbert.get_bert_embedding_sentence(test_sentence,)

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nepali_embedding-0.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

nepali_embedding-0.1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file nepali_embedding-0.1.tar.gz.

File metadata

  • Download URL: nepali_embedding-0.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.0.1 pkginfo/1.8.2 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for nepali_embedding-0.1.tar.gz
Algorithm Hash digest
SHA256 3e6f1dae97656040531d6ba0000c3c0dbea6f023f74d9d4e40057d4de6f0a620
MD5 97639b15be9c4fec84ae6860948fce72
BLAKE2b-256 d5cd74b782585ff9d46e3cdd6c26eab0b95b1aa36d44808d0f48794998418f4a

See more details on using hashes here.

File details

Details for the file nepali_embedding-0.1-py3-none-any.whl.

File metadata

  • Download URL: nepali_embedding-0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.0.1 pkginfo/1.8.2 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for nepali_embedding-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7d7afe59be8f0adcd1b3b22996c86fb15797209165c7ea3955e63bc790746dd
MD5 0ef32c9236877a7f5f4468cc12d88357
BLAKE2b-256 539d2f5507be049a614a91e5fa22d23a84edf71d8a9e51529cad9a04ef052115

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page