Skip to main content

Process reviews data, apply text preprocessing, and generate a chord plot visualization showing word co-occurrence patterns and sentiment analysis.

Project description

Chord Reviews

Overview

ChordReviewsVis is a Python package designed to process and visualize review data by generating chord plots. These visualizations illustrate word co-occurrence patterns and sentiment analysis, providing insights into the textual data. For this, it relies on the following features: • Labels for each node: The top nouns and adjectives extracted from the reviews were displayed around the graphic. • Label bars: Below the labels, there is a bar whose color illustrates the word’s overall frequency in reviews. The darker the color, the more frequent the word. • Edges: The line connecting the words that occur together. • Edge thickness: This characteristic shows how often the connected words appear in the same sentence. The more often they are together, the thicker the line is. • Edge color: The color shows the overall sentiment of the words that are being connected. Red was used for negative sentiments, blue for neutral ones, and green for positive sentiments.

This package was developed by Felix Jose Funes as part of his master's dissertation at Universidade Nova de Lisboa, which was supervised by Prof. Nuno Antonio, PhD.

Installation

To install ChordReviewsVis, use pip:

pip install ChordReviewsVis

Usage

First, import the necessary libraries and the ChordReviews function:

import pandas as pd
from ChordReviewsVis import ChordReviews

Prepare the DataFrame with a text column containing review data. Then call the ChordReviews function:

# Load DataFrame
df = pd.read_csv("filepath")

# Generate chord plot
ChordReviews(df, 'review')

Some datasets that can be used for this purpose are:

Function Parameters

  • df (pandas.DataFrame): DataFrame containing review data.
  • text_column (str): Name of the column containing the text data.
  • size (int, optional): Size of the output chord plot. Default is 300.
  • stopwords_to_add (list, optional): Additional stopwords to include in the stop words set. Default is an empty list.
  • stemming (bool, optional): Whether to apply stemming to words. Default is False.
  • lemmatization (bool, optional): Whether to apply lemmatization to words. Default is True.
  • words_to_replace (dict, optional): A dictionary where keys are words to be replaced and values are the replacements. Default is an empty dictionary.
  • label_text_font_size (int, optional): Font size for the labels in the chord plot. Default is 12.

Returns

  • hv.Chord: A chord plot visualization of word co-occurrence patterns and sentiment analysis.

Examples

Basic Usage

# Import necessary libraries
import pandas as pd
from ChordReviewsVis import ChordReviews

# Load dataset
df = pd.read_csv("https://github.com/felix-funes/ChordReviewsVis/raw/main/Test%20Dataset%20-%20IMDB%20Movie%20Reviews.csv")

# Generate chord plot
ChordReviews(df, 'review')

Chord plot example

Custom Parameters

Though lemmatization is used by default, users have the possibility of using stemming if it suits their needs better.

# Import necessary libraries
import pandas as pd
from ChordReviewsVis import ChordReviews

# Load dataset
df = pd.read_csv("https://github.com/felix-funes/ChordReviewsVis/raw/main/Test%20Dataset%20-%20IMDB%20Movie%20Reviews.csv")

# Generate chord plot
ChordReviews(df, 'review', stemming=True, lemmatization=False)

Chord plot example with stemming

To refine the visualization, it is possible to use the "stopwords_to_add" parameter to remove irrelevant words and "words_to_replace" to unify terms with the same meaning.

# Import necessary libraries
import pandas as pd
from ChordReviewsVis import ChordReviews

# Load dataset
df = pd.read_csv("https://github.com/felix-funes/ChordReviewsVis/raw/main/Test%20Dataset%20-%20IMDB%20Movie%20Reviews.csv")

# Generate chord plot
chord_reviews(df, 'Review', stemming=False, lemmatization=True, stopwords_to_add=["wa", "ha"], words_to_replace={"movie": "film"})

Chord plot using the words_to_replace parameter

Because of the prevalence of the words "film" and "movie", they may be considered stop words. It is possible to remove them using the parameter "stopwords_to_add". For presentation purposes, the final plot and label text can be resized.

# Import necessary libraries
import pandas as pd
from ChordReviewsVis import ChordReviews

# Load dataset
df = pd.read_csv("https://github.com/felix-funes/ChordReviewsVis/raw/main/Test%20Dataset%20-%20IMDB%20Movie%20Reviews.csv")

# Generate chord plot
chord_reviews(df, 'Review', stemming=False, lemmatization=True, stopwords_to_add=["wa", "ha", "movie", "film"], label_text_font_size=13, size=400)

Large chord plot with stop words

Dependencies

Ensure you have the following libraries installed:

  • pandas
  • numpy
  • nltk
  • BeautifulSoup
  • re
  • holoviews

These can be installed via pip:

pip install pandas numpy nltk beautifulsoup4 re holoviews

License

This project is licensed under the MIT License.

Contact

For any issues or inquiries, please contact the package maintainer via LinkedIn.


By using this package, you agree to the terms outlined in the LICENSE file included in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chordreviewsvis-0.2.8.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ChordReviewsVis-0.2.8-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file chordreviewsvis-0.2.8.tar.gz.

File metadata

  • Download URL: chordreviewsvis-0.2.8.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for chordreviewsvis-0.2.8.tar.gz
Algorithm Hash digest
SHA256 db3564882d35780c509a2271e33e7a62f1be992d271c0998b15a44f2f1ecccc7
MD5 58618541dc9c34c8a18016e92f9ea216
BLAKE2b-256 52e64e09cf9645e288278eda38c4c390d86d90fc3f75c283180ecace361864b8

See more details on using hashes here.

File details

Details for the file ChordReviewsVis-0.2.8-py3-none-any.whl.

File metadata

File hashes

Hashes for ChordReviewsVis-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 3dd12972e59b71b8d1b4c8d0d93143aa1513f724648bac5bf6cc4b10807220b4
MD5 4821d9dccf767869163a86e4891357e0
BLAKE2b-256 00e147904951a60a87c32b08dd10e451136faacb9457cda6854e0dc9a8f1ee2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page