Chinese Herbal Medicine E-commerce Sentiment Analysis System
Project description
Chinese Herbal Medicine Sentiment Analysis System
A comprehensive Natural Language Processing (NLP) toolkit specifically designed for analyzing customer reviews and evaluating supply chain quality in Chinese herbal medicine e-commerce platforms. This system includes advanced sentiment analysis, time series forecasting, regression analysis, and a complete REST API service.
🎯 Features
🔍 Sentiment Analysis
- Dictionary-based Analysis: Traditional sentiment analysis using Chinese sentiment dictionaries
- Machine Learning Models: SVM, Naive Bayes, and Logistic Regression classifiers
- Deep Learning Models: LSTM, TextCNN, and BERT-based sentiment analysis
- Graph-based Analysis: TextRank algorithm for sentiment analysis
🔑 Keyword Extraction
- TF-IDF: Term Frequency-Inverse Document Frequency for keyword extraction
- TextRank: Graph-based algorithm for keyword ranking
- LDA: Latent Dirichlet Allocation for topic-based keyword extraction
📊 Advanced Analytics ✨
- Regression Analysis: Multi-variable linear regression with statistical diagnostics
- Time Series Analysis: Trend analysis, seasonality detection, and forecasting
- Supply Chain Evaluation: Multi-dimensional quality assessment
- Prediction Services: Unified prediction interface with model management
🚀 API Services ✨
- REST API: FastAPI-based web service with automatic documentation
- Batch Processing: Handle large-scale data processing
- Real-time Analysis: Live sentiment analysis and keyword extraction
- Comprehensive Endpoints: Full coverage of all analysis features
🛠️ Utility Features
- Data Processing: Efficient handling of large-scale review datasets
- Visualization Tools: Comprehensive plotting and charting capabilities
- Command-line Interface: Easy-to-use CLI for batch processing
- Modular Design: Flexible and extensible architecture
📊 Dataset
Chinese Herbal Medicine Sentiment Dataset
We provide a comprehensive dataset of Chinese herbal medicine product reviews for research and development:
- 🔢 Scale: 234,879 reviews from 259 products
- 🌐 Platform: Hugging Face Hub
- 📅 Time Span: 14.5 years (2010-2024)
- 🏷️ Labels: Positive (75.8%), Neutral (11.5%), Negative (12.7%)
- 📄 License: MIT License
Quick Dataset Access
from datasets import load_dataset
# Load the complete dataset
dataset = load_dataset("xingqiang/chinese-herbal-medicine-sentiment")
# Access train and validation splits
train_data = dataset['train'] # 211,391 samples
val_data = dataset['validation'] # 23,488 samples
# View sample data
print(train_data[0])
Dataset Features
| Field | Type | Description | Example |
|---|---|---|---|
username |
string | Anonymized username | "用***客" |
user_id |
integer | Unique user identifier | 16788761848 |
review_text |
string | Chinese review content | "产品质量很好,效果明显" |
review_time |
datetime | Review timestamp | "2021-12-09 12:56:37" |
rating |
integer | Rating (1-5 scale) | 5 |
product_id |
string | Product identifier | "100001642346" |
sentiment_label |
string | Sentiment label | "positive", "neutral", "negative" |
📖 View Complete Dataset Documentation
🚀 Installation
From PyPI (Recommended)
# Basic installation
pip install chinese-herbal-sentiment
# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]
# With API services
pip install chinese-herbal-sentiment[api]
# With development tools
pip install chinese-herbal-sentiment[dev]
# Complete installation (all features)
pip install chinese-herbal-sentiment[all]
From Source
# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment
# Install in development mode
pip install -e .[all]
🚀 Quick Start
Basic Usage
from chinese_herbal_sentiment import (
SentimentAnalysis,
KeywordExtraction,
SupplyChainRegression,
PredictionService,
TimeSeriesAnalyzer
)
# Sample data
texts = [
'这个中药质量很好,效果不错',
'包装很差,质量一般',
'服务态度很好,物流快'
]
# 1. Sentiment Analysis
analyzer = SentimentAnalysis()
sentiment_results = analyzer.analyze_batch(texts)
# 2. Keyword Extraction
extractor = KeywordExtraction()
keywords = extractor.tfidf_extraction(texts, top_k=10)
# 3. Unified Prediction Service
service = PredictionService()
comprehensive_results = service.analyze_comprehensive(
texts=texts,
include_sentiment=True,
include_keywords=True
)
print("Comprehensive Results:", comprehensive_results)
Advanced Analytics
# 1. Regression Analysis
regressor = SupplyChainRegression()
# Generate sample supply chain data
data = regressor.generate_supply_chain_data(1000)
# Prepare features
feature_columns = ['material_quality', 'technology', 'delivery_speed']
X, y = regressor.prepare_data(data, 'service_quality', feature_columns)
# Train model
results = regressor.train(X, y)
print(f"Model R²: {results['test_r2']:.3f}")
# Generate analysis report
regressor.visualize_results('analysis_results.png')
regressor.generate_report('analysis_report.md')
# 2. Time Series Analysis
ts_analyzer = TimeSeriesAnalyzer()
# Load time series data
sample_data = ts_analyzer.generate_sample_data(periods=365)
ts_analyzer.load_data(sample_data, 'date', 'sentiment_score')
# Perform analysis
trend_results = ts_analyzer.trend_analysis()
forecast_results = ts_analyzer.forecast(periods=30)
anomalies = ts_analyzer.detect_anomalies()
print(f"Trend: {trend_results['trend_direction']}")
print(f"Forecast length: {len(forecast_results['predictions'])}")
API Services
# Start the API server
from chinese_herbal_sentiment.api import run_server
# Launch API service
run_server(host="0.0.0.0", port=8000)
# API will be available at:
# - Main service: http://localhost:8000
# - Documentation: http://localhost:8000/docs
# - Health check: http://localhost:8000/health
API Endpoints:
POST /api/v1/sentiment/analyze- Sentiment analysisPOST /api/v1/keywords/extract- Keyword extractionPOST /api/v1/analyze/comprehensive- Comprehensive analysisGET /api/v1/models/info- Model informationGET /api/v1/predictions/history- Prediction history
Command Line Usage
# Run comprehensive demo
python examples/comprehensive_demo.py
# Start API server
python examples/comprehensive_demo.py --api
# Run specific analysis
python -c "
from chinese_herbal_sentiment import PredictionService
service = PredictionService()
result = service.predict_sentiment(['产品质量很好'])
print(result)
"
📚 Documentation
Core Classes
SentimentAnalysis
from chinese_herbal_sentiment import SentimentAnalysis
analyzer = SentimentAnalysis()
# Dictionary-based analysis
score = analyzer.dictionary_based_analysis("产品质量很好")
# Machine learning analysis (requires trained models)
ml_result = analyzer.machine_learning_analysis(["产品质量很好"])
KeywordExtraction
from chinese_herbal_sentiment import KeywordExtraction
extractor = KeywordExtraction()
# TF-IDF extraction
tfidf_keywords = extractor.tfidf_extraction(texts, top_k=10)
# TextRank extraction
textrank_keywords = extractor.textrank_extraction(texts, top_k=10)
# LDA topic modeling
lda_keywords, topics = extractor.lda_extraction(texts, n_topics=5)
PredictionService
from chinese_herbal_sentiment import PredictionService
service = PredictionService()
# Batch sentiment prediction
sentiment_results = service.predict_sentiment(
texts=["产品不错", "质量一般"],
methods=['dictionary', 'svm']
)
# Batch keyword extraction
keyword_results = service.extract_keywords_batch(
texts=["产品不错", "质量一般"],
methods=['tfidf', 'textrank'],
top_k=10
)
# Model management
model_info = service.get_model_info()
history = service.get_prediction_history()
Advanced Features
Regression Analysis
from chinese_herbal_sentiment import SupplyChainRegression
# Initialize regressor
regressor = SupplyChainRegression(model_type='linear')
# Generate or load data
data = regressor.generate_supply_chain_data(1000)
# Train model with comprehensive diagnostics
results = regressor.train(X, y, test_size=0.2)
# Feature importance analysis
importance = regressor.feature_importance()
# Model predictions with confidence intervals
predictions, lower, upper = regressor.predict(X_new, return_intervals=True)
# Generate detailed reports
regressor.visualize_results('regression_results.png')
report = regressor.generate_report('regression_report.md')
Time Series Analysis
from chinese_herbal_sentiment import TimeSeriesAnalyzer
# Initialize analyzer
analyzer = TimeSeriesAnalyzer()
# Load time series data
success = analyzer.load_data(data, time_column='date', value_column='score')
# Trend analysis
trend_results = analyzer.trend_analysis(method='linear')
# Seasonal decomposition
seasonal_results = analyzer.seasonal_analysis()
# Forecasting
forecast_results = analyzer.forecast(periods=30, method='auto')
# Anomaly detection
anomalies = analyzer.detect_anomalies(method='iqr')
# Comprehensive visualization
analyzer.visualize_analysis(
include_trend=True,
include_seasonal=True,
include_forecast=True,
save_path='timeseries_analysis.png'
)
📊 Examples and Use Cases
E-commerce Review Analysis
import pandas as pd
from chinese_herbal_sentiment import PredictionService
# Load review data
df = pd.read_csv('herbal_reviews.csv')
# Initialize prediction service
service = PredictionService()
# Comprehensive analysis
results = service.analyze_comprehensive(
texts=df['review_text'].tolist(),
include_sentiment=True,
include_keywords=True
)
# Extract insights
sentiment_distribution = results['results']['sentiment_analysis']
key_themes = results['results']['keyword_extraction']
print("Sentiment Distribution:", sentiment_distribution)
print("Key Themes:", key_themes)
Supply Chain Quality Assessment
from chinese_herbal_sentiment import SupplyChainRegression
# Initialize regression analyzer
regressor = SupplyChainRegression()
# Define quality features
quality_features = {
'material_quality': 8.5,
'technology': 7.8,
'delivery_speed': 8.2,
'after_sales_service': 7.5,
'processing_environment': 7.9
}
# Predict quality score
predicted_score = regressor.predict([list(quality_features.values())])
print(f"Predicted Quality Score: {predicted_score[0]:.2f}/10")
Market Trend Analysis
from chinese_herbal_sentiment import TimeSeriesAnalyzer
# Load historical sentiment data
analyzer = TimeSeriesAnalyzer()
analyzer.load_data(historical_data, 'date', 'avg_sentiment')
# Analyze trends and patterns
trend_analysis = analyzer.trend_analysis()
seasonal_patterns = analyzer.seasonal_analysis()
# Forecast future sentiment
forecast = analyzer.forecast(periods=90) # 3 months ahead
# Detect unusual patterns
anomalies = analyzer.detect_anomalies()
print(f"Market Trend: {trend_analysis['trend_direction']}")
print(f"Forecast Average: {np.mean(forecast['predictions']):.3f}")
🧪 Testing
# Run all tests
python -m pytest tests/ -v
# Run specific test modules
python -m pytest tests/test_regression_analysis.py -v
python -m pytest tests/test_prediction_service.py -v
python -m pytest tests/test_time_series_analysis.py -v
# Run with coverage report
python -m pytest tests/ --cov=chinese_herbal_sentiment --cov-report=html
# Test API endpoints (requires FastAPI)
python -m pytest tests/test_api.py -v
📈 Performance Benchmarks
Model Accuracy
| Method | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Dictionary | 0.72 | 0.71 | 0.72 | 0.71 |
| SVM | 0.85 | 0.84 | 0.85 | 0.84 |
| Naive Bayes | 0.82 | 0.81 | 0.82 | 0.81 |
| Logistic Regression | 0.87 | 0.86 | 0.87 | 0.86 |
| BERT | 0.91 | 0.90 | 0.91 | 0.90 |
| TextCNN | 0.89 | 0.88 | 0.89 | 0.88 |
Processing Speed
| Dataset Size | Processing Time | Memory Usage |
|---|---|---|
| < 1K reviews | ~1-2 seconds | ~50MB |
| 1K-10K reviews | ~10-30 seconds | ~200MB |
| 10K-100K reviews | ~2-5 minutes | ~1GB |
| > 100K reviews | ~10-30 minutes | ~2-4GB |
Regression Analysis Performance
| Features | R² Score | RMSE | Training Time |
|---|---|---|---|
| 5 features | 0.85 | 0.45 | ~1 second |
| 10 features | 0.89 | 0.38 | ~2 seconds |
| 15 features | 0.92 | 0.32 | ~3 seconds |
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Format code
black chinese_herbal_sentiment tests
# Lint code
flake8 chinese_herbal_sentiment tests
Contribution Areas
- 🔬 Algorithm Development: Improve existing algorithms or add new ones
- 📊 Dataset Enhancement: Contribute to the dataset or create new datasets
- 🔧 Feature Development: Add new features or improve existing ones
- 📝 Documentation: Improve documentation, examples, and tutorials
- 🐛 Bug Fixes: Report and fix bugs
- ⚡ Performance: Optimize performance and memory usage
📦 PyPI Publication
This package is published on PyPI for easy installation and distribution:
Package Information
- Package Name:
chinese-herbal-sentiment - PyPI URL: https://pypi.org/project/chinese-herbal-sentiment/
- Installation:
pip install chinese-herbal-sentiment
Version Management
# Check current version
python -c "import chinese_herbal_sentiment; print(chinese_herbal_sentiment.__version__)"
# Build package
python setup.py sdist bdist_wheel
# Upload to PyPI (maintainers only)
twine upload dist/*
Installation Options
# Basic features
pip install chinese-herbal-sentiment
# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]
# With API services
pip install chinese-herbal-sentiment[api]
# With development tools
pip install chinese-herbal-sentiment[dev]
# All features
pip install chinese-herbal-sentiment[all]
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📚 Citation
If you use this package or dataset in your research, please cite:
@software{chinese_herbal_sentiment_2024,
title={Chinese Herbal Medicine Sentiment Analysis System},
author={Chen, Xingqiang},
year={2024},
version={1.0.0},
url={https://github.com/chenxingqiang/chinese-herbal-sentiment},
note={A comprehensive NLP toolkit for Chinese herbal medicine e-commerce analysis}
}
@dataset{chinese_herbal_sentiment_dataset_2024,
title={Chinese Herbal Medicine Sentiment Analysis Dataset},
author={Chen, Xingqiang},
year={2024},
version={1.0.0},
url={https://huggingface.co/datasets/xingqiang/chinese-herbal-medicine-sentiment},
note={A comprehensive sentiment analysis dataset for Traditional Chinese Medicine product reviews}
}
🙏 Acknowledgments
- Research Foundation: Based on master's thesis research on Chinese herbal medicine e-commerce supply chain quality evaluation
- Dataset Contributors: Thanks to all users who provided review data and e-commerce platforms
- Open Source Libraries: Built on scikit-learn, transformers, PyTorch, FastAPI, and other excellent projects
- Academic Community: Inspired by research in sentiment analysis, supply chain management, and NLP
📞 Support
- 📖 Documentation: GitHub Wiki
- 🐛 Issues: GitHub Issues
- 📧 Email: chenxingqiang@turingai.cc
- 💬 Discussions: GitHub Discussions
🔄 Changelog
v1.0.0 (2025-08-26)
- ✨ New Features: Complete regression analysis module with statistical diagnostics
- ✨ New Features: Advanced time series analysis with forecasting capabilities
- ✨ New Features: Unified prediction service with model management
- ✨ New Features: REST API service with FastAPI and automatic documentation
- ✨ New Features: Comprehensive test suite with >90% coverage
- 📊 Dataset: Released Chinese Herbal Medicine Sentiment Dataset (234K+ reviews)
- 📦 PyPI: Initial PyPI publication with multiple installation options
- 🔧 Improvements: Enhanced error handling and graceful dependency management
- 📝 Documentation: Complete API documentation and usage examples
v0.1.0 (2024-12-XX)
- 🎉 Initial release
- ✅ Basic sentiment analysis (dictionary, SVM, Naive Bayes, Logistic Regression)
- ✅ Keyword extraction (TF-IDF, TextRank, LDA)
- ✅ Deep learning models (BERT, TextCNN, TextRank)
- ✅ Command-line interface
- ✅ Comprehensive documentation and examples
📍 Repository: GitHub | 📦 PyPI: Package | 🤗 Dataset: Hugging Face
Note: This package is designed specifically for Chinese herbal medicine e-commerce review analysis and supply chain quality evaluation. The included dataset and models are optimized for Traditional Chinese Medicine domain terminology and sentiment expressions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chinese_herbal_sentiment-1.0.0.tar.gz.
File metadata
- Download URL: chinese_herbal_sentiment-1.0.0.tar.gz
- Upload date:
- Size: 178.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
173bf15f949b1492c5e1955bdca264c1c0b9000ddde3b397996d4aefd527697f
|
|
| MD5 |
d0ea36d12154e718914cab3715918f17
|
|
| BLAKE2b-256 |
a2fe197909c56ddf1678bba66b5bb1303a546631db843b4315c2d70436d18790
|
File details
Details for the file chinese_herbal_sentiment-1.0.0-py3-none-any.whl.
File metadata
- Download URL: chinese_herbal_sentiment-1.0.0-py3-none-any.whl
- Upload date:
- Size: 147.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
156b0aa92abcd876915bb92309a0488d981b1bdf5cf71f1880755458c7b0ef6d
|
|
| MD5 |
745d349bb297daed8ca6aec5aea08646
|
|
| BLAKE2b-256 |
e5e6af71ce0dc0473a9175e7a127b71b9eecb68d336853d7da563ad20892b0b4
|