This Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization.
Project description
Short-text-analyzer
This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.
Documentation Page: https://thisisphume.github.io/short-text-analyzer/
Install
pip install short-text-analyzer
Install all the required packages from the requirement.txt file.
pip install -r requirements.txt
from shorttextanalyzer.core import *
How to use
analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is 2AE with MSE of 0.6560611658549391
Embedding Method for Clustering is 2AE with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is: 5.0
Number of clusters via KMeans is: 4
Here we specify that we want 4 clusters/topic from this data.
Output: result
sentimentScore: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.Subjective: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.clusterByKMeans: assigned cluster number for each comments using KMeansclusterByHDBSCAN: assigned cluster number for each comments using HDBSCAN
output_result.sample(2)
| comments | comment_lang | comments_clean | sentimentScore | subjectiveScore | clusterByKMeans | clusterByHDBSCAN | |
|---|---|---|---|---|---|---|---|
| 50 | sondage parfait | fr | perfect poll | 1.00 | 1.000000 | 2 | 1 |
| 875 | it wasn't very clear what the purpose of the f... | en | it wasn't very clear what the purpose of the f... | 0.19 | 0.415833 | 1 | 1 |
Visualization: how good is our clusters? HDBSCAN and KMeans
analyzer.plot_output()
Reference
- tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection
- Using UMAP for clustering: https://umap-learn.readthedocs.io/en/latest/clustering.html#traditional-clustering
- https://github.com/dmmiller612/bert-extractive-summarizer
- https://github.com/MilaNLProc/contextualized-topic-models
- https://github.com/MaartenGr/BERTopic
- Natural Language Processing for Beginners: Using TextBlob
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shorttextanalyzer-0.1.1.tar.gz.
File metadata
- Download URL: shorttextanalyzer-0.1.1.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a1dc4d13b42e3607ce8b376e4c4238b5bcb8d9206173bde00c6070989967aac
|
|
| MD5 |
05408ad69cff787e1b496537c1dba7cb
|
|
| BLAKE2b-256 |
06f644565ab50528791895ac1ce1f72c0a89409fd962d3b50396591042f90962
|
File details
Details for the file shorttextanalyzer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: shorttextanalyzer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2def863f10eccf7b9927f2f9ee8d9ace0ac6cd2197895dedef6bdeda35eaec8
|
|
| MD5 |
1527af249bfc18c78dae7e73a75361d5
|
|
| BLAKE2b-256 |
eadc0194ae5d5c88e8659fc05d19e4ddea924ba3c2a715297f4b6861319a2a98
|