Skip to main content

Persian Sentiment Analysis Toolkit

Project description

PerSent (Persian Sentiment Analyzer)

Persian

PerSent Logo

Introduction

PerSent is a practical Python library designed for Persian sentiment analysis. The name stands for "Persian Sentiment Analyzer". Currently in its early testing phase, the library provides basic functionality and is available on PyPI. Install it using:

pip install PerSent

Current capabilities include:

  • Sentiment analysis of opinions/comments

  • Emotion analysis of texts (happiness, sadness, anger, surprise, fear, disgust, calmness)

  • Analysis of product/service reviews (recommended/not recommended/no idea)

  • Both single-text and batch CSV processing

  • Output displayed in terminal or saved to CSV with summary statistics

Initial repository that evolved into this library: Click here

We welcome user testing and feedback to improve the library. If you encounter bugs or have suggestions, please:

For installation issues due to dependency conflicts (especially with mingw-w64), consider using online platforms like DeepNote.com.

Structure

Comment Analysis Functions

train(train_csv, test_size=0.2, vector_size=100, window=5)

Parameter Data Type Default Value Description Optional/Required
train_csv str - Path to CSV file containing training data with body and recommendation_status columns Required
test_size float 0.2 Proportion of test data (between 0.0 and 1.0) Optional
vector_size int 100 Output vector dimension for Word2Vec model (embedding size) Optional
window int 5 Context window size for Word2Vec model Optional

recommendation_status must be one of:

  • no_idea

  • recommended

  • not_recommended

Null/NaN values are converted to no_idea, affecting model accuracy.

  • Returns test accuracy score.

analyzeText(text)

Parameter Data Type Description Optional/Required
text str The Persian text to be analyzed Required

The core function that analyzes a text and returns one of: "not_recommended", "recommended", or "no_idea".


saveModel()

loadModel()

Model persistence functions. Models are saved in the model directory.


analyzeCSV(input_csv, output_path, summary_path=None, text_column=0)

Parameter Data Type Default Value Description Optional/Required
input_csv str - Path to input CSV file containing comments to analyze Required
output_path str - Path where analyzed results CSV will be saved Required
summary_path str or None None Optional path to save summary statistics CSV Optional
text_column int or str 0 Column index (int) or name (str) containing the text to analyze Optional

Batch processes comments from a CSV file. For single-column files, text_column isn't needed. Otherwise specify column name/index (0-based, negative indices supported). Output contains:

1- Original text

2- Recommendation status Optional summary_path generates statistics:

  • Total count

  • Recommended count

  • Not recommended count

  • No idea count

  • Model accuracy (not implemented in current version)

Returns a DataFrame and saves results.


Emotion Analysis Functions

loadLex(csv_file, word_col=0, emotion_col=1, weight_col=2)

Parameter Data Type Default Value Description Optional/Required
csv_file str - Path to CSV lexicon file Required
word_col int or str 0 Column index (int) or name (str) containing words Optional
emotion_col int or str 1 Column index (int) or name (str) containing emotion labels Optional
weight_col int or str 2 Column index (int) or name (str) containing weight values Optional

Loads a CSV with three columns:

1- Keywords

2- Emotion (happiness, sadness, anger, fear, disgust, calmness)

3- Emotion weight (defaults to 1 if unspecified, affecting accuracy)

Column indices are optional.


train(train_csv, text_col='text', emotion_col='sentiment', weight_col='weight')

Parameter Data Type Default Value Description Optional/Required
train_csv str - Path to training CSV file Required
text_col str or int 'text' Column name/index containing text data Optional
emotion_col str or int 'emotion' Column name/index containing emotion labels Optional
weight_col str or int 'weight' Column name/index containing weight values Optional

Trains the emotion model using a CSV with specified column names (optional).


saveModel(model_name='weighted_sentiment_model')

Parameter Type Default Value Description Optional/Required
model_name str 'weighted_sentiment_model' Base filename for saving model (without extension) Optional

loadModel(model_name='weighted_sentiment_model')

Parameter Type Default Value Description Optional/Required
model_name str 'weighted_sentiment_model' Base filename of model to load (without extension) Optional

Model persistence functions (saved in model directory).


analyzeText(text)

Parameter Type Description Optional/Required
text str Persian text to analyze Required

Analyzes a single text, returning percentage scores for each emotion.


analyzeCSV(input_csv, output_csv, text_col='text', output_col='sentiment_analysis')

Parameter Type Default Value Description Optional/Required
input_csv str - Path to input CSV file containing text to analyze Required
output_csv str - Path to save analyzed results Required
text_col str/int 'text' Column name/index containing text to analyze Optional
output_col str 'sentiment_analysis' Column name for output results Optional

Batch processes texts from CSV. Returns True on success. Requires:

  • input_csv path

  • output_csv path Optional column names.


Installation

Install via pip:

pip install PerSent

For specific versions:

pip install PerSent==<VERSION_NUMBER>

Usage

  • Comment Analysis

Basic single-text analysis:

from PerSent import CommentAnalyzer

analyzer = CommentAnalyzer()

'''
Training (if you have data):
Requires CSV with comments and recommendation status columns
Status must be: recommended/not_recommended/no_idea
'''
analyzer.train("train.csv")

# Load pre-trained model
analyzer.loadModel()

# Predict
text = "کیفیت عالی داشت" # "Excellent quality"
result = analyzer.analyzeText(text)
print(f"Sentiment: {result}")  # Output: Sentiment: recommended

The included pre-trained model has ~70% accuracy. For better results, you can train with larger datasets. I've prepared a split dataset (due to size):

Download Here


Batch CSV processing:

from PerSent import CommentAnalyzer
analyzer = CommentAnalyzer()
analyzer.loadModel()

# Basic usage (single-column CSV)
analyzer.analyzeCSV(
    input_csv="comments.csv",
    output_path="results.csv"
)

# Alternative usage patterns:
# 1. Using column index (0-based)
analyzer.analyzeCSV("comments.csv", "results.csv", None, 0)

# 2. Negative indices (count from end)
analyzer.analyzeCSV("comments.csv", "results.csv", None, -1)

# 3. Column name
analyzer.analyzeCSV("comments.csv", "results.csv", None, "نظرات") # "Comments" column

# 4. With summary (single-column)
analyzer.analyzeCSV("comments.csv", "results.csv", "summary.csv")

# 5. With summary and column specification
analyzer.analyzeCSV("comments.csv", "results.csv", "summary.csv", 2)
  • Emotion Analysis

Single text analysis with pre-trained model:

from PerSent import SentimentAnalyzer

analyzer = SentimentAnalyzer()
analyzer.loadModel()

sample_text = "امتحانم رو خراب کردم. احساس می‌کنم یک شکست خورده‌ی تمام عیارم."
# "I failed my exam. I feel like a complete failure."

result = analyzer.analyzeText(sample_text)
for emotion, score in sorted(result.items(), key=lambda x: x[1], reverse=True):
    print(f"{emotion}: {score:.2f}%")

output :

غم: 36.00%                     #Sadness
عصبانیت: 36.00%                 #anger
ترس: 28.00%                    #fear
شادی: 0.00%                     #happiness
تنفر: 0.00%                      #disgust
شگفتی: 0.00%                    #surprise
آرامش: 0.00%                    #calmness

To train your own model:

analyzer.train('emotion_dataset.csv')

Required CSV columns:

1- Keywords

2- Emotion (happiness, sadness, anger, disgust, fear, calmness)

3- Emotion weight

Model persistence:

analyzer.saveModel("custom_model_name")
analyzer.loadModel("custom_model_name")

Batch CSV processing:

analyzer.analyzeCSV("input.csv", "output.csv")

Contribution

As mentioned, this library needs community collaboration. Please share suggestions, bugs, or feedback via:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persent-1.3.3.tar.gz (27.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persent-1.3.3-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file persent-1.3.3.tar.gz.

File metadata

  • Download URL: persent-1.3.3.tar.gz
  • Upload date:
  • Size: 27.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for persent-1.3.3.tar.gz
Algorithm Hash digest
SHA256 acb17d73167f491e253d013697e84a2bb1da97f935650b5c160aaa911f3d5934
MD5 f0e7176fc8282527715769a5c378fb4b
BLAKE2b-256 ba68007ca72522c50568e3209c21b01c7cad62754fefa591029b73b6f3f793bf

See more details on using hashes here.

File details

Details for the file persent-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: persent-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 27.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for persent-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c1d41111d0ccf0b2246bd4e49b87ae777b40066bdbda1c22053232bc7334310b
MD5 5920ad8d425671cc634b6018e7ba722e
BLAKE2b-256 b3990a91ebb6b402cda02fb47155957c40422ed93b74690457ddb820dd61b176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page