Chinese Herbal Medicine E-commerce Sentiment Analysis System
Project description
Chinese Herbal Medicine Sentiment Analysis System
A comprehensive Natural Language Processing (NLP) toolkit specifically designed for analyzing customer reviews and evaluating supply chain quality in Chinese herbal medicine e-commerce platforms.
🎯 Features
🔍 Sentiment Analysis
- Dictionary-based Analysis: Traditional sentiment analysis using Chinese sentiment dictionaries
- Machine Learning Models: SVM, Naive Bayes, and Logistic Regression classifiers
- Deep Learning Models: LSTM, TextCNN, and BERT-based sentiment analysis
- Graph-based Analysis: TextRank algorithm for sentiment analysis
🔑 Keyword Extraction
- TF-IDF: Term Frequency-Inverse Document Frequency for keyword extraction
- TextRank: Graph-based algorithm for keyword ranking
- LDA: Latent Dirichlet Allocation for topic-based keyword extraction
📊 Supply Chain Evaluation
- Multi-dimensional Analysis: Upstream (raw materials), midstream (processing), downstream (distribution)
- Quality Metrics: Comprehensive evaluation of supply chain quality indicators
- Visualization: Rich visualizations for analysis results
🛠️ Utility Features
- Data Processing: Efficient handling of large-scale review datasets
- Visualization Tools: Comprehensive plotting and charting capabilities
- Command-line Interface: Easy-to-use CLI for batch processing
- Modular Design: Flexible and extensible architecture
🚀 Quick Start
Installation
# Basic installation
pip install chinese-herbal-sentiment
# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]
# With development tools
pip install chinese-herbal-sentiment[dev]
# Complete installation
pip install chinese-herbal-sentiment[all]
Basic Usage
import pandas as pd
from chinese_herbal_sentiment import SentimentAnalyzer, KeywordExtractor
# Sample data
data = pd.DataFrame({
'评论内容': [
'这个中药质量很好,效果不错',
'包装很差,质量一般',
'服务态度很好,物流快'
]
})
# Sentiment analysis
analyzer = SentimentAnalyzer()
sentiment_results = analyzer.analyze_all_methods(data)
# Keyword extraction
extractor = KeywordExtractor()
keyword_results = extractor.extract_all_methods(data, num_keywords=10)
print("Sentiment Results:", sentiment_results.head())
print("Keywords:", keyword_results.head())
Command Line Usage
# Analyze sentiment
chinese-herbal-analyze data/reviews.xlsx --method all --output results.csv
# Extract keywords
chinese-herbal-keywords data/reviews.xlsx --method tfidf --num_keywords 20
# Full analysis
chinese-herbal-full data/reviews.xlsx --mode all --output_dir results/
📚 Documentation
API Reference
SentimentAnalyzer
from chinese_herbal_sentiment import SentimentAnalyzer
analyzer = SentimentAnalyzer()
# Single method analysis
results = analyzer.analyze_sentiment(data, method='svm')
# All methods analysis
results = analyzer.analyze_all_methods(data)
Methods:
dictionary: Dictionary-based sentiment analysissvm: Support Vector Machine classifiernaive_bayes: Naive Bayes classifierlogistic_regression: Logistic Regression classifierall: All available methods
KeywordExtractor
from chinese_herbal_sentiment import KeywordExtractor
extractor = KeywordExtractor()
# Single method extraction
keywords = extractor.extract_keywords(data, method='tfidf', num_keywords=20)
# All methods extraction
keywords = extractor.extract_all_methods(data, num_keywords=20)
Methods:
tfidf: TF-IDF keyword extractiontextrank: TextRank algorithmlda: Latent Dirichlet Allocationall: All available methods
Deep Learning Models
from chinese_herbal_sentiment import BERTSentimentAnalyzer, TextCNNSentimentAnalyzer
# BERT analysis
bert_analyzer = BERTSentimentAnalyzer()
bert_results = bert_analyzer.analyze_sentiment(data)
# TextCNN analysis
textcnn_analyzer = TextCNNSentimentAnalyzer()
textcnn_results = textcnn_analyzer.analyze_sentiment(data)
Advanced Usage
Custom Analysis Pipeline
from chinese_herbal_sentiment import DataAnalyzer, Visualizer
# Load and preprocess data
data_analyzer = DataAnalyzer()
data = data_analyzer.load_data('reviews.xlsx', sample_size=10000)
# Perform analysis
sentiment_results = analyzer.analyze_all_methods(data)
keyword_results = extractor.extract_all_methods(data)
# Generate visualizations
visualizer = Visualizer()
visualizer.plot_sentiment_distribution(sentiment_results, save_path='sentiment.png')
visualizer.plot_keyword_cloud(keyword_results, save_path='keywords.png')
Supply Chain Quality Evaluation
from chinese_herbal_sentiment.utils.keyword_mapping import KeywordMapper
# Map keywords to supply chain dimensions
mapper = KeywordMapper()
supply_chain_results = mapper.map_keywords_to_dimensions(keyword_results)
# Analyze quality indicators
quality_metrics = mapper.calculate_quality_metrics(supply_chain_results)
📊 Output Examples
Sentiment Analysis Results
| 评论内容 | dictionary_sentiment | svm_sentiment | naive_bayes_sentiment | logistic_regression_sentiment |
|---|---|---|---|---|
| 质量很好,效果不错 | positive | positive | positive | positive |
| 包装很差,质量一般 | negative | negative | negative | negative |
| 服务态度很好 | positive | positive | positive | positive |
Keyword Extraction Results
| keyword | score | method |
|---|---|---|
| 质量 | 0.85 | TF-IDF |
| 包装 | 0.72 | TF-IDF |
| 服务 | 0.68 | TF-IDF |
| 效果 | 0.65 | TextRank |
| 物流 | 0.58 | TextRank |
🔧 Configuration
Data Format
The package expects data in the following format:
# Excel/CSV file with columns:
data = pd.DataFrame({
'评论内容': ['review text 1', 'review text 2', ...],
'评分': [5, 4, 3, ...], # Optional
'时间': ['2024-01-01', '2024-01-02', ...], # Optional
'用户ID': ['user1', 'user2', ...] # Optional
})
Model Configuration
# Custom model parameters
analyzer = SentimentAnalyzer(
vectorizer_params={'max_features': 5000},
classifier_params={'C': 1.0}
)
extractor = KeywordExtractor(
tfidf_params={'max_features': 1000},
textrank_params={'window_size': 4}
)
🧪 Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=chinese_herbal_sentiment
# Run specific test file
pytest tests/test_sentiment_analysis.py
📈 Performance
Accuracy Comparison
| Method | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Dictionary | 0.72 | 0.71 | 0.72 | 0.71 |
| SVM | 0.85 | 0.84 | 0.85 | 0.84 |
| Naive Bayes | 0.82 | 0.81 | 0.82 | 0.81 |
| Logistic Regression | 0.87 | 0.86 | 0.87 | 0.86 |
| BERT | 0.91 | 0.90 | 0.91 | 0.90 |
| TextCNN | 0.89 | 0.88 | 0.89 | 0.88 |
Processing Speed
- Small dataset (< 1K reviews): ~1-2 seconds
- Medium dataset (1K-10K reviews): ~10-30 seconds
- Large dataset (> 10K reviews): ~2-5 minutes
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment
# Install in development mode
pip install -e .[dev]
# Run tests
pytest
# Format code
black chinese_herbal_sentiment tests
# Lint code
flake8 chinese_herbal_sentiment tests
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Research Foundation: Based on master's thesis research on Chinese herbal medicine e-commerce supply chain quality evaluation
- Open Source Libraries: Built on top of scikit-learn, transformers, PyTorch, and other excellent open-source projects
- Academic Community: Inspired by research in sentiment analysis and supply chain management
📞 Support
- Documentation: GitHub Wiki
- Issues: GitHub Issues
- Email: chenxingqiang@turingai.cc
🔄 Changelog
v0.1.0 (2024-12-XX)
- Initial release
- Basic sentiment analysis (dictionary, SVM, Naive Bayes, Logistic Regression)
- Keyword extraction (TF-IDF, TextRank, LDA)
- Deep learning models (BERT, TextCNN, TextRank)
- Command-line interface
- Comprehensive documentation and examples
Note: This package is designed specifically for Chinese herbal medicine e-commerce review analysis and supply chain quality evaluation. For general sentiment analysis tasks, consider using more general-purpose NLP libraries.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chinese_herbal_sentiment-0.1.0.tar.gz.
File metadata
- Download URL: chinese_herbal_sentiment-0.1.0.tar.gz
- Upload date:
- Size: 153.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd0c85a5303d6280fb9962174f03abe4d74bf65bb62b807a177dc8ce35f2a9f8
|
|
| MD5 |
f25710337e23b6b6df817bf9dc17f5ab
|
|
| BLAKE2b-256 |
19a3d2404a8da24d3717e789a9a617f4667d27a33d45eb1129bded18cc4e164a
|
File details
Details for the file chinese_herbal_sentiment-0.1.0-py3-none-any.whl.
File metadata
- Download URL: chinese_herbal_sentiment-0.1.0-py3-none-any.whl
- Upload date:
- Size: 146.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac9a91bb8a54e0a8ebbf540216dc03fe74509dc0e67e87bcb70dcd4e6a1fc666
|
|
| MD5 |
43bf236178fd603e4fa89ff3698cd391
|
|
| BLAKE2b-256 |
8ae86668eb88d9fb528219a85865c7ea9ce7a11d2ac40ba84958294f539e2309
|