Persian Sentiment Analysis Toolkit
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
Persian Sentiment Analyzer
A Python library for sentiment analysis of Persian (Farsi) text, capable of classifying opinions as "recommended", "not_recommended", or "no_idea".
Features
- Text Preprocessing: Normalization, tokenization, stemming, and stopword removal for Persian text
- Word Embeddings: Built-in Word2Vec implementation for Persian language
- Sentiment Classification: Logistic Regression classifier trained on Persian sentiment data
- Model Persistence: Save and load trained models for future use
- Batch Processing: Analyze sentiment for multiple texts at once
Installation
pip install persian-sentiment-analyzer
Dependencies
-
Python 3.6+
-
hazm
-
gensim
-
scikit-learn
-
numpy
-
pandas
Usage
Basic Usage
from persian_sentiment_analyzer import SentimentAnalyzer
# Initialize with a pre-trained model
analyzer = SentimentAnalyzer(model_path="path/to/pretrained_model")
# Predict sentiment
result = analyzer.predict("این محصول بسیار عالی است")
print(result) # Output: 'recommended'
Training Your Own Model
from persian_sentiment_analyzer import SentimentAnalyzer
import pandas as pd
# Load your dataset
data = pd.read_csv("persian_reviews.csv")
texts = data['text'].tolist()
labels = data['label'].values # 0: not_recommended, 1: recommended, 2: no_idea
# Initialize analyzer
analyzer = SentimentAnalyzer()
# Preprocess and tokenize texts
tokenized_texts = [analyzer.preprocessor.preprocess_text(text) for text in texts]
# Train Word2Vec model
analyzer.train_word2vec(tokenized_texts, vector_size=100)
# Prepare feature vectors
X = np.array([analyzer.sentence_vector(tokens) for tokens in tokenized_texts])
# Train classifier
analyzer.train_classifier(X, labels)
# Save the trained model
analyzer.save_model("my_persian_model")
Batch Processing
from persian_sentiment_analyzer import predict_sentiments_for_file
# Process a CSV file containing Persian comments
results_summary = predict_sentiments_for_file(
analyzer,
input_file="comments.csv",
output_file="results.csv",
summary_file="summary.csv"
)
print(results_summary)
Model Architecture
1- Text Preprocessing:
-
Normalization (Hazm)
-
Tokenization
-
Stemming
-
Stopword removal
2- Feature Extraction:
-
Word2Vec embeddings (100 dimensions)
-
Sentence vectors (average of word vectors)
3- Classification:
- Logistic Regression with L2 regularization
Performance
The pre-trained model achieves the following performance on our test set:
Metric Value Accuracy 85.2% Precision 84.7% Recall 85.0% F1-score 84.8%
License This project is licensed under the MIT License - see the LICENSE file for details
Github : RezaGooner
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persian_sentiment-0.1.0.tar.gz.
File metadata
- Download URL: persian_sentiment-0.1.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7627d839b4d7116652afcd5a6f55916c15154b504b0c5738c96a2dc1bc7ac00d
|
|
| MD5 |
b6db3db6080f9b20bd9d38fd03f8deeb
|
|
| BLAKE2b-256 |
90d8182736c56fcf515163988199a6d575c9185d3851ce7524dbb02db7134ec2
|
File details
Details for the file persian_sentiment-0.1.0-py3-none-any.whl.
File metadata
- Download URL: persian_sentiment-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cd0d727ed6733d1e5d8d917c75ff74890a6f9982d40505588d8c6d5c503ed4f
|
|
| MD5 |
acece8390f893fb1bba6ab12aaa365a2
|
|
| BLAKE2b-256 |
a404e0c09f7711e2cc02d4901e8d5b00b872e737ad9a0e8c2ed95d9bf7517fc8
|