Predicting helpfulness of reviews
Project description
Helpful Review Predictor
The Helpful Review Predictor is a Python package that predicts the helpfulness of reviews using machine learning techniques. It takes textual reviews as input and provides a binary classification indicating whether the review is likely to be helpful or not. A prediction of 1 indicates a helpful review, while a prediction of 0 indicates a review that is not helpful.
For a comprehensive understanding of the model's training process and methodology, I have documented it in an academic research paper. To stay updated on the latest developments and access the research paper upon publication, I invite you to follow my LinkedIn profile: Mojtaba Maleki.
Dataset
The data used for training the model is sourced from the Amazon Electronics Reviews dataset available on Kaggle. This 5-core dataset contains product reviews from the Electronics category on Amazon from May 1996 to July 2014, totaling 1,689,188 entries.
The dataset is provided by Julian McAuley, UCSD, and is available here.
Features
-
Preprocesses textual reviews, including lowercasing, punctuation removal, contractions expansion, and lemmatization.
-
Utilizes TF-IDF vectorization to convert text data into numerical feature vectors.
-
Addresses class imbalance using Random Over Sampling.
-
Supports training and evaluation of multiple classifiers, including Gaussian Naive Bayes, Logistic Regression, and Decision Trees.
-
Performs hyperparameter tuning using Grid Search and Stratified K-Fold Cross Validation.
-
Provides visualization tools for comparing different classifiers and evaluating model performance.
-
Saves the best model and TF-IDF vectorizer for future use.
Installation
You can install the Helpful Review Predictor package using pip:
pip install helpful-review-predictor
Usage
from helpfulReviewPredictor import PredictHelpfulness
string_input = "Your input string here"
predictor = PredictHelpfulness(string_input)
result = predictor.get_result()
print(result) # Output: 1 for Helpful, 0 for Not Helpful
Requirements
-
joblib
-
numpy
-
scikit-learn
-
scipy
-
TfidfVectorizer from sklearn.feature_extraction.text
These changes provide more clarity about the purpose of the package, the dataset used, and the expected output. They also improve the formatting and readability of the document.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file helpful_review_predictor-6.tar.gz
.
File metadata
- Download URL: helpful_review_predictor-6.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0058c3d26b13f47d33568b5131952d975a9309a5f2098b12f0e55d49f34da412 |
|
MD5 | 2ffacc1c7f0e32c5fba25df01eb13bfd |
|
BLAKE2b-256 | b3b2becda502e6d6e0bf9826e8dbc873a0ef8d662dafdddb817f427f6f848257 |
File details
Details for the file helpful_review_predictor-6-py3-none-any.whl
.
File metadata
- Download URL: helpful_review_predictor-6-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 686c88a53d1ef225b493efccb55a10127ca14c4b52c2eb5059c37475b8aaedb9 |
|
MD5 | abd1f64c567b0c9513adb03b1914bd63 |
|
BLAKE2b-256 | aa9bb3569d1cda5e47faad9dd7277dd0a14a90ecf65e9f25bf412a94c8a8abe6 |