Skip to main content

Chinese Herbal Medicine E-commerce Sentiment Analysis System

Project description

Chinese Herbal Medicine Sentiment Analysis System

Python 3.8+ License: MIT PyPI version Documentation Dataset

A comprehensive Natural Language Processing (NLP) toolkit specifically designed for analyzing customer reviews and evaluating supply chain quality in Chinese herbal medicine e-commerce platforms. This system includes advanced sentiment analysis, time series forecasting, regression analysis, and a complete REST API service.

🎯 Features

🔍 Sentiment Analysis

  • Dictionary-based Analysis: Traditional sentiment analysis using Chinese sentiment dictionaries
  • Machine Learning Models: SVM, Naive Bayes, and Logistic Regression classifiers
  • Deep Learning Models: LSTM, TextCNN, and BERT-based sentiment analysis
  • Graph-based Analysis: TextRank algorithm for sentiment analysis

🔑 Keyword Extraction

  • TF-IDF: Term Frequency-Inverse Document Frequency for keyword extraction
  • TextRank: Graph-based algorithm for keyword ranking
  • LDA: Latent Dirichlet Allocation for topic-based keyword extraction

📊 Advanced Analytics

  • Regression Analysis: Multi-variable linear regression with statistical diagnostics
  • Time Series Analysis: Trend analysis, seasonality detection, and forecasting
  • Supply Chain Evaluation: Multi-dimensional quality assessment
  • Prediction Services: Unified prediction interface with model management

🚀 API Services

  • REST API: FastAPI-based web service with automatic documentation
  • Batch Processing: Handle large-scale data processing
  • Real-time Analysis: Live sentiment analysis and keyword extraction
  • Comprehensive Endpoints: Full coverage of all analysis features

🛠️ Utility Features

  • Data Processing: Efficient handling of large-scale review datasets
  • Visualization Tools: Comprehensive plotting and charting capabilities
  • Command-line Interface: Easy-to-use CLI for batch processing
  • Modular Design: Flexible and extensible architecture

📊 Dataset

Chinese Herbal Medicine Sentiment Dataset

We provide a comprehensive dataset of Chinese herbal medicine product reviews for research and development:

  • 🔢 Scale: 234,879 reviews from 259 products
  • 🌐 Platform: Hugging Face Hub
  • 📅 Time Span: 14.5 years (2010-2024)
  • 🏷️ Labels: Positive (75.8%), Neutral (11.5%), Negative (12.7%)
  • 📄 License: MIT License

Quick Dataset Access

from datasets import load_dataset

# Load the complete dataset
dataset = load_dataset("xingqiang/chinese-herbal-medicine-sentiment")

# Access train and validation splits
train_data = dataset['train']  # 211,391 samples
val_data = dataset['validation']  # 23,488 samples

# View sample data
print(train_data[0])

Dataset Features

Field Type Description Example
username string Anonymized username "用***客"
user_id integer Unique user identifier 16788761848
review_text string Chinese review content "产品质量很好,效果明显"
review_time datetime Review timestamp "2021-12-09 12:56:37"
rating integer Rating (1-5 scale) 5
product_id string Product identifier "100001642346"
sentiment_label string Sentiment label "positive", "neutral", "negative"

📖 View Complete Dataset Documentation

🚀 Installation

From PyPI (Recommended)

# Basic installation
pip install chinese-herbal-sentiment

# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]

# With API services
pip install chinese-herbal-sentiment[api]

# With development tools
pip install chinese-herbal-sentiment[dev]

# Complete installation (all features)
pip install chinese-herbal-sentiment[all]

From Source

# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment

# Install in development mode
pip install -e .[all]

🚀 Quick Start

Basic Usage

from chinese_herbal_sentiment import (
    SentimentAnalysis, 
    KeywordExtraction,
    SupplyChainRegression,
    PredictionService,
    TimeSeriesAnalyzer
)

# Sample data
texts = [
    '这个中药质量很好,效果不错',
    '包装很差,质量一般',
    '服务态度很好,物流快'
]

# 1. Sentiment Analysis
analyzer = SentimentAnalysis()
sentiment_results = analyzer.analyze_batch(texts)

# 2. Keyword Extraction
extractor = KeywordExtraction()
keywords = extractor.tfidf_extraction(texts, top_k=10)

# 3. Unified Prediction Service
service = PredictionService()
comprehensive_results = service.analyze_comprehensive(
    texts=texts,
    include_sentiment=True,
    include_keywords=True
)

print("Comprehensive Results:", comprehensive_results)

Advanced Analytics

# 1. Regression Analysis
regressor = SupplyChainRegression()

# Generate sample supply chain data
data = regressor.generate_supply_chain_data(1000)

# Prepare features
feature_columns = ['material_quality', 'technology', 'delivery_speed']
X, y = regressor.prepare_data(data, 'service_quality', feature_columns)

# Train model
results = regressor.train(X, y)
print(f"Model R²: {results['test_r2']:.3f}")

# Generate analysis report
regressor.visualize_results('analysis_results.png')
regressor.generate_report('analysis_report.md')

# 2. Time Series Analysis
ts_analyzer = TimeSeriesAnalyzer()

# Load time series data
sample_data = ts_analyzer.generate_sample_data(periods=365)
ts_analyzer.load_data(sample_data, 'date', 'sentiment_score')

# Perform analysis
trend_results = ts_analyzer.trend_analysis()
forecast_results = ts_analyzer.forecast(periods=30)
anomalies = ts_analyzer.detect_anomalies()

print(f"Trend: {trend_results['trend_direction']}")
print(f"Forecast length: {len(forecast_results['predictions'])}")

API Services

# Start the API server
from chinese_herbal_sentiment.api import run_server

# Launch API service
run_server(host="0.0.0.0", port=8000)

# API will be available at:
# - Main service: http://localhost:8000
# - Documentation: http://localhost:8000/docs
# - Health check: http://localhost:8000/health

API Endpoints:

  • POST /api/v1/sentiment/analyze - Sentiment analysis
  • POST /api/v1/keywords/extract - Keyword extraction
  • POST /api/v1/analyze/comprehensive - Comprehensive analysis
  • GET /api/v1/models/info - Model information
  • GET /api/v1/predictions/history - Prediction history

Command Line Usage

# Run comprehensive demo
python examples/comprehensive_demo.py

# Start API server
python examples/comprehensive_demo.py --api

# Run specific analysis
python -c "
from chinese_herbal_sentiment import PredictionService
service = PredictionService()
result = service.predict_sentiment(['产品质量很好'])
print(result)
"

📚 Documentation

Core Classes

SentimentAnalysis

from chinese_herbal_sentiment import SentimentAnalysis

analyzer = SentimentAnalysis()

# Dictionary-based analysis
score = analyzer.dictionary_based_analysis("产品质量很好")

# Machine learning analysis (requires trained models)
ml_result = analyzer.machine_learning_analysis(["产品质量很好"])

KeywordExtraction

from chinese_herbal_sentiment import KeywordExtraction

extractor = KeywordExtraction()

# TF-IDF extraction
tfidf_keywords = extractor.tfidf_extraction(texts, top_k=10)

# TextRank extraction
textrank_keywords = extractor.textrank_extraction(texts, top_k=10)

# LDA topic modeling
lda_keywords, topics = extractor.lda_extraction(texts, n_topics=5)

PredictionService

from chinese_herbal_sentiment import PredictionService

service = PredictionService()

# Batch sentiment prediction
sentiment_results = service.predict_sentiment(
    texts=["产品不错", "质量一般"],
    methods=['dictionary', 'svm']
)

# Batch keyword extraction
keyword_results = service.extract_keywords_batch(
    texts=["产品不错", "质量一般"],
    methods=['tfidf', 'textrank'],
    top_k=10
)

# Model management
model_info = service.get_model_info()
history = service.get_prediction_history()

Advanced Features

Regression Analysis

from chinese_herbal_sentiment import SupplyChainRegression

# Initialize regressor
regressor = SupplyChainRegression(model_type='linear')

# Generate or load data
data = regressor.generate_supply_chain_data(1000)

# Train model with comprehensive diagnostics
results = regressor.train(X, y, test_size=0.2)

# Feature importance analysis
importance = regressor.feature_importance()

# Model predictions with confidence intervals
predictions, lower, upper = regressor.predict(X_new, return_intervals=True)

# Generate detailed reports
regressor.visualize_results('regression_results.png')
report = regressor.generate_report('regression_report.md')

Time Series Analysis

from chinese_herbal_sentiment import TimeSeriesAnalyzer

# Initialize analyzer
analyzer = TimeSeriesAnalyzer()

# Load time series data
success = analyzer.load_data(data, time_column='date', value_column='score')

# Trend analysis
trend_results = analyzer.trend_analysis(method='linear')

# Seasonal decomposition
seasonal_results = analyzer.seasonal_analysis()

# Forecasting
forecast_results = analyzer.forecast(periods=30, method='auto')

# Anomaly detection
anomalies = analyzer.detect_anomalies(method='iqr')

# Comprehensive visualization
analyzer.visualize_analysis(
    include_trend=True,
    include_seasonal=True,
    include_forecast=True,
    save_path='timeseries_analysis.png'
)

📊 Examples and Use Cases

E-commerce Review Analysis

import pandas as pd
from chinese_herbal_sentiment import PredictionService

# Load review data
df = pd.read_csv('herbal_reviews.csv')

# Initialize prediction service
service = PredictionService()

# Comprehensive analysis
results = service.analyze_comprehensive(
    texts=df['review_text'].tolist(),
    include_sentiment=True,
    include_keywords=True
)

# Extract insights
sentiment_distribution = results['results']['sentiment_analysis']
key_themes = results['results']['keyword_extraction']

print("Sentiment Distribution:", sentiment_distribution)
print("Key Themes:", key_themes)

Supply Chain Quality Assessment

from chinese_herbal_sentiment import SupplyChainRegression

# Initialize regression analyzer
regressor = SupplyChainRegression()

# Define quality features
quality_features = {
    'material_quality': 8.5,
    'technology': 7.8,
    'delivery_speed': 8.2,
    'after_sales_service': 7.5,
    'processing_environment': 7.9
}

# Predict quality score
predicted_score = regressor.predict([list(quality_features.values())])
print(f"Predicted Quality Score: {predicted_score[0]:.2f}/10")

Market Trend Analysis

from chinese_herbal_sentiment import TimeSeriesAnalyzer

# Load historical sentiment data
analyzer = TimeSeriesAnalyzer()
analyzer.load_data(historical_data, 'date', 'avg_sentiment')

# Analyze trends and patterns
trend_analysis = analyzer.trend_analysis()
seasonal_patterns = analyzer.seasonal_analysis()

# Forecast future sentiment
forecast = analyzer.forecast(periods=90)  # 3 months ahead

# Detect unusual patterns
anomalies = analyzer.detect_anomalies()

print(f"Market Trend: {trend_analysis['trend_direction']}")
print(f"Forecast Average: {np.mean(forecast['predictions']):.3f}")

🧪 Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test modules
python -m pytest tests/test_regression_analysis.py -v
python -m pytest tests/test_prediction_service.py -v
python -m pytest tests/test_time_series_analysis.py -v

# Run with coverage report
python -m pytest tests/ --cov=chinese_herbal_sentiment --cov-report=html

# Test API endpoints (requires FastAPI)
python -m pytest tests/test_api.py -v

📈 Performance Benchmarks

Model Accuracy

Method Accuracy Precision Recall F1-Score
Dictionary 0.72 0.71 0.72 0.71
SVM 0.85 0.84 0.85 0.84
Naive Bayes 0.82 0.81 0.82 0.81
Logistic Regression 0.87 0.86 0.87 0.86
BERT 0.91 0.90 0.91 0.90
TextCNN 0.89 0.88 0.89 0.88

Processing Speed

Dataset Size Processing Time Memory Usage
< 1K reviews ~1-2 seconds ~50MB
1K-10K reviews ~10-30 seconds ~200MB
10K-100K reviews ~2-5 minutes ~1GB
> 100K reviews ~10-30 minutes ~2-4GB

Regression Analysis Performance

Features R² Score RMSE Training Time
5 features 0.85 0.45 ~1 second
10 features 0.89 0.38 ~2 seconds
15 features 0.92 0.32 ~3 seconds

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Format code
black chinese_herbal_sentiment tests

# Lint code
flake8 chinese_herbal_sentiment tests

Contribution Areas

  • 🔬 Algorithm Development: Improve existing algorithms or add new ones
  • 📊 Dataset Enhancement: Contribute to the dataset or create new datasets
  • 🔧 Feature Development: Add new features or improve existing ones
  • 📝 Documentation: Improve documentation, examples, and tutorials
  • 🐛 Bug Fixes: Report and fix bugs
  • ⚡ Performance: Optimize performance and memory usage

📦 PyPI Publication

This package is published on PyPI for easy installation and distribution:

Package Information

Version Management

# Check current version
python -c "import chinese_herbal_sentiment; print(chinese_herbal_sentiment.__version__)"

# Build package
python setup.py sdist bdist_wheel

# Upload to PyPI (maintainers only)
twine upload dist/*

Installation Options

# Basic features
pip install chinese-herbal-sentiment

# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]

# With API services
pip install chinese-herbal-sentiment[api]

# With development tools
pip install chinese-herbal-sentiment[dev]

# All features
pip install chinese-herbal-sentiment[all]

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use this package or dataset in your research, please cite:

@software{chinese_herbal_sentiment_2024,
  title={Chinese Herbal Medicine Sentiment Analysis System},
  author={Chen, Xingqiang},
  year={2024},
  version={1.0.0},
  url={https://github.com/chenxingqiang/chinese-herbal-sentiment},
  note={A comprehensive NLP toolkit for Chinese herbal medicine e-commerce analysis}
}

@dataset{chinese_herbal_sentiment_dataset_2024,
  title={Chinese Herbal Medicine Sentiment Analysis Dataset},
  author={Chen, Xingqiang},
  year={2024},
  version={1.0.0},
  url={https://huggingface.co/datasets/xingqiang/chinese-herbal-medicine-sentiment},
  note={A comprehensive sentiment analysis dataset for Traditional Chinese Medicine product reviews}
}

🙏 Acknowledgments

  • Research Foundation: Based on master's thesis research on Chinese herbal medicine e-commerce supply chain quality evaluation
  • Dataset Contributors: Thanks to all users who provided review data and e-commerce platforms
  • Open Source Libraries: Built on scikit-learn, transformers, PyTorch, FastAPI, and other excellent projects
  • Academic Community: Inspired by research in sentiment analysis, supply chain management, and NLP

📞 Support

🔄 Changelog

v1.0.0 (2025-08-26)

  • New Features: Complete regression analysis module with statistical diagnostics
  • New Features: Advanced time series analysis with forecasting capabilities
  • New Features: Unified prediction service with model management
  • New Features: REST API service with FastAPI and automatic documentation
  • New Features: Comprehensive test suite with >90% coverage
  • 📊 Dataset: Released Chinese Herbal Medicine Sentiment Dataset (234K+ reviews)
  • 📦 PyPI: Initial PyPI publication with multiple installation options
  • 🔧 Improvements: Enhanced error handling and graceful dependency management
  • 📝 Documentation: Complete API documentation and usage examples

v0.1.0 (2024-12-XX)

  • 🎉 Initial release
  • ✅ Basic sentiment analysis (dictionary, SVM, Naive Bayes, Logistic Regression)
  • ✅ Keyword extraction (TF-IDF, TextRank, LDA)
  • ✅ Deep learning models (BERT, TextCNN, TextRank)
  • ✅ Command-line interface
  • ✅ Comprehensive documentation and examples

📍 Repository: GitHub | 📦 PyPI: Package | 🤗 Dataset: Hugging Face

Note: This package is designed specifically for Chinese herbal medicine e-commerce review analysis and supply chain quality evaluation. The included dataset and models are optimized for Traditional Chinese Medicine domain terminology and sentiment expressions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chinese_herbal_sentiment-1.0.0.tar.gz (178.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chinese_herbal_sentiment-1.0.0-py3-none-any.whl (147.0 kB view details)

Uploaded Python 3

File details

Details for the file chinese_herbal_sentiment-1.0.0.tar.gz.

File metadata

  • Download URL: chinese_herbal_sentiment-1.0.0.tar.gz
  • Upload date:
  • Size: 178.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chinese_herbal_sentiment-1.0.0.tar.gz
Algorithm Hash digest
SHA256 173bf15f949b1492c5e1955bdca264c1c0b9000ddde3b397996d4aefd527697f
MD5 d0ea36d12154e718914cab3715918f17
BLAKE2b-256 a2fe197909c56ddf1678bba66b5bb1303a546631db843b4315c2d70436d18790

See more details on using hashes here.

File details

Details for the file chinese_herbal_sentiment-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chinese_herbal_sentiment-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 156b0aa92abcd876915bb92309a0488d981b1bdf5cf71f1880755458c7b0ef6d
MD5 745d349bb297daed8ca6aec5aea08646
BLAKE2b-256 e5e6af71ce0dc0473a9175e7a127b71b9eecb68d336853d7da563ad20892b0b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page