Skip to main content

Persian Sentiment Analysis Toolkit

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Persian Sentiment Analyzer

Python Version License

A Python library for sentiment analysis of Persian (Farsi) text, capable of classifying opinions as "recommended", "not_recommended", or "no_idea".

Features

  • Text Preprocessing: Normalization, tokenization, stemming, and stopword removal for Persian text
  • Word Embeddings: Built-in Word2Vec implementation for Persian language
  • Sentiment Classification: Logistic Regression classifier trained on Persian sentiment data
  • Model Persistence: Save and load trained models for future use
  • Batch Processing: Analyze sentiment for multiple texts at once

Installation

pip install persian-sentiment-analyzer

Dependencies

  • Python 3.6+

  • hazm

  • gensim

  • scikit-learn

  • numpy

  • pandas

Usage

Basic Usage

from persian_sentiment_analyzer import SentimentAnalyzer

# Initialize with a pre-trained model
analyzer = SentimentAnalyzer(model_path="path/to/pretrained_model")

# Predict sentiment
result = analyzer.predict("این محصول بسیار عالی است")
print(result)  # Output: 'recommended'

Training Your Own Model

from persian_sentiment_analyzer import SentimentAnalyzer
import pandas as pd

# Load your dataset
data = pd.read_csv("persian_reviews.csv")
texts = data['text'].tolist()
labels = data['label'].values  # 0: not_recommended, 1: recommended, 2: no_idea

# Initialize analyzer
analyzer = SentimentAnalyzer()

# Preprocess and tokenize texts
tokenized_texts = [analyzer.preprocessor.preprocess_text(text) for text in texts]

# Train Word2Vec model
analyzer.train_word2vec(tokenized_texts, vector_size=100)

# Prepare feature vectors
X = np.array([analyzer.sentence_vector(tokens) for tokens in tokenized_texts])

# Train classifier
analyzer.train_classifier(X, labels)

# Save the trained model
analyzer.save_model("my_persian_model")

Batch Processing

from persian_sentiment_analyzer import predict_sentiments_for_file

# Process a CSV file containing Persian comments
results_summary = predict_sentiments_for_file(
    analyzer,
    input_file="comments.csv",
    output_file="results.csv",
    summary_file="summary.csv"
)

print(results_summary)

Model Architecture

1- Text Preprocessing:

  • Normalization (Hazm)

  • Tokenization

  • Stemming

  • Stopword removal

2- Feature Extraction:

  • Word2Vec embeddings (100 dimensions)

  • Sentence vectors (average of word vectors)

3- Classification:

  • Logistic Regression with L2 regularization

Performance

The pre-trained model achieves the following performance on our test set:

Metric Value Accuracy 85.2% Precision 84.7% Recall 85.0% F1-score 84.8%

License This project is licensed under the MIT License - see the LICENSE file for details

Github : RezaGooner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persian_sentiment-0.1.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persian_sentiment-0.1.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file persian_sentiment-0.1.0.tar.gz.

File metadata

  • Download URL: persian_sentiment-0.1.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for persian_sentiment-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7627d839b4d7116652afcd5a6f55916c15154b504b0c5738c96a2dc1bc7ac00d
MD5 b6db3db6080f9b20bd9d38fd03f8deeb
BLAKE2b-256 90d8182736c56fcf515163988199a6d575c9185d3851ce7524dbb02db7134ec2

See more details on using hashes here.

File details

Details for the file persian_sentiment-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for persian_sentiment-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cd0d727ed6733d1e5d8d917c75ff74890a6f9982d40505588d8c6d5c503ed4f
MD5 acece8390f893fb1bba6ab12aaa365a2
BLAKE2b-256 a404e0c09f7711e2cc02d4901e8d5b00b872e737ad9a0e8c2ed95d9bf7517fc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page