Sememe Analysis by SIST(Shizuoka Institute of Science and Technology) NLP Lab XiaoranLi
Project description
meme_analysis
By SIST NLP Lab XiaoranLi
- Blog URL : www.sauron.online
insatall
pip install meme_analysis
from meme_analysis import meme_analysis
meme = meme_analysis(pertrain_data_dir:str,corpus_data_dir:str)
_, apple_sememe = meme.word2sememe("apple")
parameter
- --pertrain_data_dir', type=str, default="./data", help='cache_dir : per-train data save(cache_dir = "./data")
- --dimension', type=int, default=50, help='per-train data word embedding dimension (dimension = 50 or 100 or 200 or 300)
- --corpus_data_dir', type=str, required=True, help='Corpus for training morphemes (corpus_data = path of wikipedia_english)
- --word', type=str,default="apple", help='embedding size')
- --num_clusters', type=int,default=2, help='number of sememe')
- --dimensionality_reduction', type=bool,default=False, help='Whether to perform dimensionality reduction analysis
- cache_dir : per-train data save(cache_dir = "./data")
- dimension : per-train data word embedding dimension (dimension = 50 or 100 or 200 or 300)
- corpus_data_dir : Corpus for training morphemes (corpus_data = path of wikipedia_english)
- word_to_index : word to index (word_to_index = glove.stoi)
- index_to_vec : index to vector (index_to_vec = glove.vectors)
- sentence_embedding_matrix : from _, _, * = text_preprocessing()
- sentence_matrix : from _, * _, = text_preprocessing()
output
Trying to cluster the morpheme matrix...
------------------------
Trying to search the morphemes of word 2
The classifier is being used to evaluate the clustering results...
Train score: 0.998780487804878
Test score: 0.9725776965265083
Trying to search the morphemes of word 3
The classifier is being used to evaluate the clustering results...
Train score: 1.0
Test score: 0.9524680073126143
Trying to search the morphemes of word 4
The classifier is being used to evaluate the clustering results...
Train score: 1.0
Test score: 0.943327239488117
Trying to search the morphemes of word 5
The classifier is being used to evaluate the clustering results...
Train score: 1.0
Test score: 0.9396709323583181
Trying to search the morphemes of word 6
The classifier is being used to evaluate the clustering results...
Train score: 1.0
Test score: 0.9177330895795247
Text classification on morphemes <Top40>...
Label: 5 | when we look at an apple
Label: 5 | we see an apple
Label: 5 | and we can also analyse a form of an apple
Label: 5 | there is a particular apple and a universal form of an apple
Label: 5 | we can place an apple next to a book
Label: 5 | so that we can speak of both the book and apple as being next to each other
Label: 5 | the form of apple exists within each apple
Label: 5 | some example of screen readers are apple voiceover
Label: 4 | this software is provided free of charge on all apple devices
Label: 5 | apple voiceover includes the option to magnify the screen
Label: 4 | the is a plug and play adapter for ios devices which uses the built in apple voiceover feature in combination with a basic switch
Label: 1 | apple inc
Label: 1 | apple inc
Label: 5 | the apple watch
Label: 4 | the apple tv digital media player
Label: 0 | apple music
Label: 4 | apple tv
Label: 2 | other services include apple store
Label: 2 | apple pay
Label: 2 | apple pay cash
Label: 4 | and apple card
Label: 0 | apple was founded by steve jobs
Label: 2 | and ronald wayne in april 1976 to develop and sell apple i personal computer
Label: 4 | it was incorporated as apple computer
Label: 0 | including the apple ii
Label: 2 | apple went public in 1980 to instant financial success
Label: 4 | apple shipped new computers featuring innovative graphical user interfaces
Label: 2 | wozniak departed apple amicably and remained an honorary employee
Label: 4 | apple lost market share to the lower-priced duopoly of microsoft windows on intel pc clones
Label: 2 | he led apple to buy next
Label: 5 | apple swiftly returned to profitability under the revitalizing think different campaign
Label: 2 | opening the retail chain of apple stores in 2001
Label: 2 | jobs renamed the company apple inc
Label: 5 | left the company to start his own firm but stated he would work with apple as its primary client
Label: 2 | apple is well known for its size and revenues
Label: 2 | apple is the worlds largest technology company by revenue and one of the worlds most valuable companies
Label: 0 | apple became the first public u
Label: 2 | 3 billion apple products are actively in use worldwide
Label: 2 | apple receives significant criticism regarding the labor practices of its contractors
Label: 2 | apple computer company was founded on april 1
------------------------
Find the word that is closest to the sum of phonemes...
['which', 'as', '.', 'new', 'company', 'also', 'for', 'same', 'its', 'based']
------------------------
The closest word of the phoneme of the apple :
['the', 'later', 'first', 'in', 'which', '.', 'as', 'was', 'same', 'on']
The closest word of the phoneme of the apple :
['inc', 'inc.', 'corporation', 'corp', 'subsidiary', 'ltd', 'llc', 'corp.', 'ltd.', 'microsystems']
The closest word of the phoneme of the apple :
['for', 'which', 'now', 'already', 'recently', 'that', 'to', 'new', 'has', 'company']
The closest word of the phoneme of the apple :
['fruit', 'honey', 'cream', 'candy', 'chocolate', 'milk', 'juice', 'cake', 'soft', 'sugar']
The closest word of the phoneme of the apple :
['computer', 'software', 'computers', 'introduced', 'using', 'available', 'used', 'uses', 'use', 'pc']
The closest word of the phoneme of the apple :
['that', 'but', 'same', 'once', 'as', 'one', 'though', '.', 'it', 'this']
------------------------End------------------------
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
meme_analysis-1.0.4.tar.gz
(6.7 kB
view hashes)
Built Distribution
Close
Hashes for meme_analysis-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b3c0dc6cecfb9a720c77416aa7b579e3f1992f165634880a336e9057085570b |
|
MD5 | 4e9b0a4087249d8bbe35f0ad94d400fb |
|
BLAKE2b-256 | 17dcdee01fdd2c8f73257f615127ac98d96f43ae8e6b7f7e3b433de43c0c11dd |