Skip to main content

Sememe Analysis by SIST NLP Lab XiaoranLi

Project description

meme_analysis

By SIST NLP Lab XiaoranLi

Installation

pip install meme_analysis

parameter

  • --pertrain_data_dir', type=str, default="./data", help='cache_dir : per-train data save(cache_dir = "./data")
  • --dimension', type=int, default=50, help='per-train data word embedding dimension (dimension = 50 or 100 or 200 or 300)
  • --corpus_data_dir', type=str, required=True, help='Corpus for training morphemes (corpus_data = path of wikipedia_english)
  • --word', type=str,default="apple", help='embedding size')
  • --num_clusters', type=int,default=2, help='number of sememe')
  • --dimensionality_reduction', type=bool,default=False, help='Whether to perform dimensionality reduction analysis
  • cache_dir : per-train data save(cache_dir = "./data")
  • dimension : per-train data word embedding dimension (dimension = 50 or 100 or 200 or 300)
  • corpus_data_dir : Corpus for training morphemes (corpus_data = path of wikipedia_english)
  • word_to_index : word to index (word_to_index = glove.stoi)
  • index_to_vec : index to vector (index_to_vec = glove.vectors)
  • sentence_embedding_matrix : from _, _, * = text_preprocessing()
  • sentence_matrix : from _, * _, = text_preprocessing()

output

---sememe analysis start---
number of vocabularies :  400000
corpus data preprocessing...
making sentence matrix...
saved : model,sentence matrix and sentence embedding matrix
------------------------
Calculating morpheme matrix...
------------------------
The morpheme matrix is completed!
------------------------
Trying to cluster the morpheme matrix...
------------------------
Text classification on morphemes <Top10>...
Label: 1  |  the structure of the additive model allows solution for the additive coefficients by simple algebra rather than by matrix calculations
Label: 1  |  connective tissues are fibrous and made up of cells scattered among inorganic material called the extracellular matrix
Label: 1  |  the extracellular matrix contains proteins
Label: 0  |  the matrix can be modified to form a skeleton to support or protect the body
Label: 1  |  the lower layer is the reticular lamina lying next to the connective tissue in the extracellular matrix secreted by the epithelial cells
Label: 1  |  the epithelial cells on the external surface of the body typically secrete an extracellular matrix in the form of a cuticle
Label: 1  |  the outer surface of the epidermis is normally formed of epithelial cells and secretes an extracellular matrix which provides support to the organism
Label: 1  |  in 1925 werner heisenberg published the first consistent mathematical formulation of quantum mechanics matrix mechanics
Label: 0  |  undergoes a change in the arrangement of the atoms of its crystal matrix at a certain temperature usually between and
Label: 0  |  the smaller atoms become trapped in the spaces between the atoms of the crystal matrix
------------------------
The cluster distribution scatter plot is being produced...
------------------------
The classifier is being used to evaluate the clustering results...
Train score: 1.0
Test score: 0.9733333333333334
------------------------
Find the word that is closest to the sum of phonemes...
['example', 'same', 'this', 'is', 'particular', 'form', 'instance', 'which', 'similar', 'of']
------------------------
The closest word of the phoneme of the matrix :
 ['same', 'the', 'which', 'this', 'of', 'it', 'one', 'is', 'as', 'example']
------------------------
The closest word of the phoneme of the matrix :
 ['function', 'i.e.', 'defined', 'hence', 'element', 'example', 'integral', 'corresponding', 'linear', 'formula_1']
------------------------End------------------------

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meme_analysis-1.0.0.tar.gz (5.7 kB view hashes)

Uploaded Source

Built Distribution

meme_analysis-1.0.0-py3-none-any.whl (6.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page