jointtsmodel

jointtsmodel - library of joint topic-sentiment models

Project description

jointtsmodel

This is a consolidated library for joint topic-sentiment (jst) models.

Description

Joint topic-sentiment models extract topical as well as sentiment information for each text. This library contains different jst models - JST, RJST, TSM, sLDA and TSWE.

Installation

git clone https://github.com/victor7246/jointtsmodel.git
cd jointtsmodel
python setup.py install

Or from pip:

pip install jointtsmodel

Usage

We can use vectorized texts to run joint topic-sentiment models.

from jointtsmodel.RJST import RJST
from jointtsmodel.JST import JST
from jointtsmodel.sLDA import sLDA
from jointtsmodel.TSM import TSM
from jointtsmodel.TSWE import TSWE

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import fetch_20newsgroups
from jointtsmodel.utils import *

# This produces a feature matrix of token counts, similar to what
# CountVectorizer would produce on text.
data, _ = fetch_20newsgroups(shuffle=True, random_state=1,
                         remove=('headers', 'footers', 'quotes'),
                         return_X_y=True)
data = data[:1000]
vectorizer = CountVectorizer(max_df=0.7, min_df=10,
                            max_features=5000,
                            stop_words='english')
X = vectorizer.fit_transform(data)
vocabulary = vectorizer.get_feature_names()
inv_vocabulary = dict(zip(vocabulary,np.arange(len(vocabulary))))
lexicon_data = pd.read_excel('lexicon/prior_sentiment.xlsx')
lexicon_data = lexicon_data.dropna()
lexicon_dict = dict(zip(lexicon_data['Word'],lexicon_data['Sentiment']))

For JST model use

model = JST(n_topic_components=5,n_sentiment_components=5,random_state=123,evaluate_every=2)
model.fit(X.toarray(), lexicon_dict)

model.transform()[:2]

top_words = list(model.getTopKWords(vocabulary).values())
coherence_score_uci(X.toarray(),inv_vocabulary,top_words)
Hscore(model.transform())

For RJST model use

model = RJST(n_topic_components=5,n_sentiment_components=5,random_state=123,evaluate_every=2)
model.fit(X.toarray(), lexicon_dict)

model.transform()[:2]

top_words = list(model.getTopKWords(vocabulary).values())
coherence_score_uci(X.toarray(),inv_vocabulary,top_words)
Hscore(model.transform())

For TSM use

model = TSM(n_topic_components=5,n_sentiment_components=5,random_state=123,evaluate_every=2)
model.fit(X.toarray(), lexicon_dict)

model.transform()[:2]

top_words = list(model.getTopKWords(vocabulary).values())
coherence_score_uci(X.toarray(),inv_vocabulary,top_words)
Hscore(model.transform())

For sLDA model use

model = sLDA(n_topic_components=5,n_sentiment_components=5,random_state=123,evaluate_every=2)
model.fit(X.toarray(), vocabulary)

model.transform()[:2]

top_words = list(model.getTopKWords(vocabulary).values())
coherence_score_uci(X.toarray(),inv_vocabulary,top_words)
Hscore(model.transform())

For TSWE model we need word embedding matrix as an input.

embeddings_index = {}
f = open('embeddings/glove.6B.100d.txt','r',encoding='utf8')

for i, line in enumerate(f):
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

embedding_matrix = np.zeros((X.shape[1], 100))

for i, word in enumerate(vocabulary):
    if word in embeddings_index:
        embedding_matrix[i] = embeddings_index[word]
    else:
        embedding_matrix[i] = np.zeros(100)

Run TSWE model

model = TSWE(embedding_dim=100,n_topic_components=5,n_sentiment_components=5,random_state=123,evaluate_every=2)
model.fit(X.toarray(), lexicon_dict, embedding_matrix)

model.transform()[:2]

top_words = list(model.getTopKWords(vocabulary).values())
coherence_score_uci(X.toarray(),inv_vocabulary,top_words)
Hscore(model.transform())

To do

Add parallelization for faster execution
Handle sparse matrix
Add online JST models

References -

[1] https://www.researchgate.net/figure/JST-and-Reverse-JST-sentiment-classification-results-with-multiple-topics_fig1_47454505

[2] https://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1913/2215

[3] https://hal.archives-ouvertes.fr/hal-02052354/document

[4] https://github.com/ayushjain91/Sentiment-LDA

[5] https://gist.github.com/mblondel/542786

[6] http://ceur-ws.org/Vol-1646/paper6.pdf

Project details

Release history Release notifications | RSS feed

This version

1.6

Jun 12, 2020

1.5

Apr 22, 2020

1.4

Apr 22, 2020

1.3

Apr 6, 2020

1.2

Jan 30, 2020

1.1

Jan 20, 2020

0.2

Jan 19, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

jointtsmodel-1.6-py3-none-any.whl (32.2 kB view details)

Uploaded Jun 12, 2020 Python 3

File details

Details for the file jointtsmodel-1.6-py3-none-any.whl.

File metadata

Download URL: jointtsmodel-1.6-py3-none-any.whl
Upload date: Jun 12, 2020
Size: 32.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.1

File hashes

Hashes for jointtsmodel-1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e706656377ccfaeea257339349fdb9db2661521d2903b84d68d64a54bd9c9513`
MD5	`55d881c6c067d3407d485cd04e01c7bc`
BLAKE2b-256	`0bdfe95b907071784da9dd9d046cef0a54cceb97a0d4d29ac698e87ad4d044fa`

See more details on using hashes here.

jointtsmodel 1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

jointtsmodel

Description

Installation

Usage

To do

References -

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes