Visual Text Explorer tool. Enables the exploration and text analysis through word frequency and named entity recognition in Jupyter Notebooks
Project description
TextExplorer
VisualTextAnalyzer help users to understand the text data. It includes word frequency analysis and named entities recognition, which help users to explore the fundamental characteristics of the text data. We use bar charts to create the visualizations integrated with the Jupyter Notebook environment. Word frequency analysis is a frequent task in text analytics. Word frequency measures the most frequently occurring words in a given text. Common stopwords like ‘to’, ‘in’, ‘for’, were removed for the word frequency analysis. Named entity recognition is an information extraction method. The entities that are present in the text are classified into predefined entity types like ‘Person’, ‘Organization’, ‘City’, etc. By using this method, users can get great insights into the types of entities present in the given textual dataset.
Text Exploration
In Jupyter Notebook:
import VisualTextAnalyzer
import pandas as pd
data = pd.read_csv('yelp_labelled_sample.csv')
VisualTextAnalyzer.plot_text_summary(data, category_column='category', text_column='comments')
Demo
In Jupyter Notebook::
import VisualTextAnalyzer
yelp_data = VisualTextAnalyzer.get_yelp_labelled_data()
VisualTextAnalyzer.plot_text_summary(yelp_data, category_column='category', text_column='comments')
Export Texts
You might want to export a subset of selected texts for further analyses. To do so, use the following code (after exporting it through the UI):
obj_text = VisualTextAnalyzer.get_exported_texts()
The returned object has the following attributes:
- texts: List of texts.
- category: All texts belong to that category.
- word: All texts contain that word.
Get Processed data (Words and Entities)
You might want to get the processed data, which includes word and entity frequencies, that is ready for analysis before generating the visualization. To do so, use the code:
processed_data = VisualTextAnalyzer.get_words_entities(data, category_column='category', text_column='comments')
VisualTextAnalyzer.plot_text_summary(words_entities=processed_data)
The function 'get_words_entities' returns an object that contains has the following attributes:
- words: Word frequencies.
- entities: Entity frequencies.
- raw_texts: All texts separated in two categories: positive and negative.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file visual-text-explorer-0.1.9.tar.gz
.
File metadata
- Download URL: visual-text-explorer-0.1.9.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c074ed2e653a38e58714ccbc56b996cd60c59348f711d740922fb68844c76b44 |
|
MD5 | d4e2ba3a955e447fdb6ecdc82e2db809 |
|
BLAKE2b-256 | 324485df2f9e691c54cf5d3f1e727063a075f6cf6dbc3b65614aa3e0c743346b |
File details
Details for the file visual_text_explorer-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: visual_text_explorer-0.1.9-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c22caeaa579eeb936d361dc01d2de206551405c9c0d3793890402fe8390f3f1 |
|
MD5 | 9184186c51d4fec7cd21c1187ebb4e27 |
|
BLAKE2b-256 | ad9fc37248f2ea6a8679fc859338bbecae07fa742f163f68ab340e602d5f5e10 |