This Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization.
Project description
Short-text-analyzer
This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.
Documentation Page: https://thisisphume.github.io/short-text-analyzer/
Install
pip install short-text-analyzer
Install all the required packages from the requirement.txt file.
pip install -r requirements.txt
from shorttextanalyzer.core import *
How to use
analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is 2AE with MSE of 0.6560611658549391
Embedding Method for Clustering is 2AE with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is: 5.0
Number of clusters via KMeans is: 4
Here we specify that we want 4 clusters/topic from this data.
Output: result
sentimentScore
: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.Subjective
: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.clusterByKMeans
: assigned cluster number for each comments using KMeansclusterByHDBSCAN
: assigned cluster number for each comments using HDBSCAN
output_result.sample(2)
comments | comment_lang | comments_clean | sentimentScore | subjectiveScore | clusterByKMeans | clusterByHDBSCAN | |
---|---|---|---|---|---|---|---|
50 | sondage parfait | fr | perfect poll | 1.00 | 1.000000 | 2 | 1 |
875 | it wasn't very clear what the purpose of the f... | en | it wasn't very clear what the purpose of the f... | 0.19 | 0.415833 | 1 | 1 |
Visualization: how good is our clusters? HDBSCAN and KMeans
analyzer.plot_output()
Reference
- tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection
- Using UMAP for clustering: https://umap-learn.readthedocs.io/en/latest/clustering.html#traditional-clustering
- https://github.com/dmmiller612/bert-extractive-summarizer
- https://github.com/MilaNLProc/contextualized-topic-models
- https://github.com/MaartenGr/BERTopic
- Natural Language Processing for Beginners: Using TextBlob
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for shorttextanalyzer-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2def863f10eccf7b9927f2f9ee8d9ace0ac6cd2197895dedef6bdeda35eaec8 |
|
MD5 | 1527af249bfc18c78dae7e73a75361d5 |
|
BLAKE2b-256 | eadc0194ae5d5c88e8659fc05d19e4ddea924ba3c2a715297f4b6861319a2a98 |