Skip to main content

This Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization.

Project description

Short-text-analyzer

This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.

Documentation Page: https://thisisphume.github.io/short-text-analyzer/

Install

pip install short-text-analyzer

Install all the required packages from the requirement.txt file.

pip install -r requirements.txt

from shorttextanalyzer.core import *

How to use

analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is  2AE  with MSE of 0.6560611658549391
Embedding Method for Clustering is  2AE  with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is:  5.0
Number of clusters via KMeans is:   4

Here we specify that we want 4 clusters/topic from this data.

Output: result

  • sentimentScore: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.
  • Subjective: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.
  • clusterByKMeans: assigned cluster number for each comments using KMeans
  • clusterByHDBSCAN: assigned cluster number for each comments using HDBSCAN
output_result.sample(2)
comments comment_lang comments_clean sentimentScore subjectiveScore clusterByKMeans clusterByHDBSCAN
50 sondage parfait fr perfect poll 1.00 1.000000 2 1
875 it wasn't very clear what the purpose of the f... en it wasn't very clear what the purpose of the f... 0.19 0.415833 1 1

Visualization: how good is our clusters? HDBSCAN and KMeans

analyzer.plot_output()

png

png

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shorttextanalyzer-0.1.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shorttextanalyzer-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file shorttextanalyzer-0.1.1.tar.gz.

File metadata

  • Download URL: shorttextanalyzer-0.1.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for shorttextanalyzer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4a1dc4d13b42e3607ce8b376e4c4238b5bcb8d9206173bde00c6070989967aac
MD5 05408ad69cff787e1b496537c1dba7cb
BLAKE2b-256 06f644565ab50528791895ac1ce1f72c0a89409fd962d3b50396591042f90962

See more details on using hashes here.

File details

Details for the file shorttextanalyzer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: shorttextanalyzer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for shorttextanalyzer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e2def863f10eccf7b9927f2f9ee8d9ace0ac6cd2197895dedef6bdeda35eaec8
MD5 1527af249bfc18c78dae7e73a75361d5
BLAKE2b-256 eadc0194ae5d5c88e8659fc05d19e4ddea924ba3c2a715297f4b6861319a2a98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page