Chinese Herbal Medicine E-commerce Sentiment Analysis System

These details have not been verified by PyPI

Project links

Project description

Chinese Herbal Medicine Sentiment Analysis System

A comprehensive Natural Language Processing (NLP) toolkit specifically designed for analyzing customer reviews and evaluating supply chain quality in Chinese herbal medicine e-commerce platforms. This system includes advanced sentiment analysis, time series forecasting, regression analysis, and a complete REST API service.

🎯 Features

🔍 Sentiment Analysis

Dictionary-based Analysis: Traditional sentiment analysis using Chinese sentiment dictionaries
Machine Learning Models: SVM, Naive Bayes, and Logistic Regression classifiers
Deep Learning Models: LSTM, TextCNN, and BERT-based sentiment analysis
Graph-based Analysis: TextRank algorithm for sentiment analysis

🔑 Keyword Extraction

TF-IDF: Term Frequency-Inverse Document Frequency for keyword extraction
TextRank: Graph-based algorithm for keyword ranking
LDA: Latent Dirichlet Allocation for topic-based keyword extraction

📊 Advanced Analytics ✨

Regression Analysis: Multi-variable linear regression with statistical diagnostics
Time Series Analysis: Trend analysis, seasonality detection, and forecasting
Supply Chain Evaluation: Multi-dimensional quality assessment
Prediction Services: Unified prediction interface with model management

🚀 API Services ✨

REST API: FastAPI-based web service with automatic documentation
Batch Processing: Handle large-scale data processing
Real-time Analysis: Live sentiment analysis and keyword extraction
Comprehensive Endpoints: Full coverage of all analysis features

🛠️ Utility Features

Data Processing: Efficient handling of large-scale review datasets
Visualization Tools: Comprehensive plotting and charting capabilities
Command-line Interface: Easy-to-use CLI for batch processing
Modular Design: Flexible and extensible architecture

📊 Dataset

Chinese Herbal Medicine Sentiment Dataset

We provide a comprehensive dataset of Chinese herbal medicine product reviews for research and development:

🔢 Scale: 234,879 reviews from 259 products
🌐 Platform: Hugging Face Hub
📅 Time Span: 14.5 years (2010-2024)
🏷️ Labels: Positive (75.8%), Neutral (11.5%), Negative (12.7%)
📄 License: MIT License

Quick Dataset Access

from datasets import load_dataset

# Load the complete dataset
dataset = load_dataset("xingqiang/chinese-herbal-medicine-sentiment")

# Access train and validation splits
train_data = dataset['train']  # 211,391 samples
val_data = dataset['validation']  # 23,488 samples

# View sample data
print(train_data[0])

Dataset Features

Field	Type	Description	Example
`username`	string	Anonymized username	"用***客"
`user_id`	integer	Unique user identifier	16788761848
`review_text`	string	Chinese review content	"产品质量很好，效果明显"
`review_time`	datetime	Review timestamp	"2021-12-09 12:56:37"
`rating`	integer	Rating (1-5 scale)	5
`product_id`	string	Product identifier	"100001642346"
`sentiment_label`	string	Sentiment label	"positive", "neutral", "negative"

📖 View Complete Dataset Documentation

🚀 Installation

From PyPI (Recommended)

# Basic installation
pip install chinese-herbal-sentiment

# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]

# With API services
pip install chinese-herbal-sentiment[api]

# With development tools
pip install chinese-herbal-sentiment[dev]

# Complete installation (all features)
pip install chinese-herbal-sentiment[all]

From Source

# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment

# Install in development mode
pip install -e .[all]

🚀 Quick Start

Basic Usage

from chinese_herbal_sentiment import (
    SentimentAnalysis, 
    KeywordExtraction,
    SupplyChainRegression,
    PredictionService,
    TimeSeriesAnalyzer
)

# Sample data
texts = [
    '这个中药质量很好，效果不错',
    '包装很差，质量一般',
    '服务态度很好，物流快'
]

# 1. Sentiment Analysis
analyzer = SentimentAnalysis()
sentiment_results = analyzer.analyze_batch(texts)

# 2. Keyword Extraction
extractor = KeywordExtraction()
keywords = extractor.tfidf_extraction(texts, top_k=10)

# 3. Unified Prediction Service
service = PredictionService()
comprehensive_results = service.analyze_comprehensive(
    texts=texts,
    include_sentiment=True,
    include_keywords=True
)

print("Comprehensive Results:", comprehensive_results)

Advanced Analytics

# 1. Regression Analysis
regressor = SupplyChainRegression()

# Generate sample supply chain data
data = regressor.generate_supply_chain_data(1000)

# Prepare features
feature_columns = ['material_quality', 'technology', 'delivery_speed']
X, y = regressor.prepare_data(data, 'service_quality', feature_columns)

# Train model
results = regressor.train(X, y)
print(f"Model R²: {results['test_r2']:.3f}")

# Generate analysis report
regressor.visualize_results('analysis_results.png')
regressor.generate_report('analysis_report.md')

# 2. Time Series Analysis
ts_analyzer = TimeSeriesAnalyzer()

# Load time series data
sample_data = ts_analyzer.generate_sample_data(periods=365)
ts_analyzer.load_data(sample_data, 'date', 'sentiment_score')

# Perform analysis
trend_results = ts_analyzer.trend_analysis()
forecast_results = ts_analyzer.forecast(periods=30)
anomalies = ts_analyzer.detect_anomalies()

print(f"Trend: {trend_results['trend_direction']}")
print(f"Forecast length: {len(forecast_results['predictions'])}")

API Services

# Start the API server
from chinese_herbal_sentiment.api import run_server

# Launch API service
run_server(host="0.0.0.0", port=8000)

# API will be available at:
# - Main service: http://localhost:8000
# - Documentation: http://localhost:8000/docs
# - Health check: http://localhost:8000/health

API Endpoints:

POST /api/v1/sentiment/analyze - Sentiment analysis
POST /api/v1/keywords/extract - Keyword extraction
POST /api/v1/analyze/comprehensive - Comprehensive analysis
GET /api/v1/models/info - Model information
GET /api/v1/predictions/history - Prediction history

Command Line Usage

# Run comprehensive demo
python examples/comprehensive_demo.py

# Start API server
python examples/comprehensive_demo.py --api

# Run specific analysis
python -c "
from chinese_herbal_sentiment import PredictionService
service = PredictionService()
result = service.predict_sentiment(['产品质量很好'])
print(result)
"

📚 Documentation

Core Classes

SentimentAnalysis

from chinese_herbal_sentiment import SentimentAnalysis

analyzer = SentimentAnalysis()

# Dictionary-based analysis
score = analyzer.dictionary_based_analysis("产品质量很好")

# Machine learning analysis (requires trained models)
ml_result = analyzer.machine_learning_analysis(["产品质量很好"])

KeywordExtraction

from chinese_herbal_sentiment import KeywordExtraction

extractor = KeywordExtraction()

# TF-IDF extraction
tfidf_keywords = extractor.tfidf_extraction(texts, top_k=10)

# TextRank extraction
textrank_keywords = extractor.textrank_extraction(texts, top_k=10)

# LDA topic modeling
lda_keywords, topics = extractor.lda_extraction(texts, n_topics=5)

PredictionService

from chinese_herbal_sentiment import PredictionService

service = PredictionService()

# Batch sentiment prediction
sentiment_results = service.predict_sentiment(
    texts=["产品不错", "质量一般"],
    methods=['dictionary', 'svm']
)

# Batch keyword extraction
keyword_results = service.extract_keywords_batch(
    texts=["产品不错", "质量一般"],
    methods=['tfidf', 'textrank'],
    top_k=10
)

# Model management
model_info = service.get_model_info()
history = service.get_prediction_history()

Advanced Features

Regression Analysis

from chinese_herbal_sentiment import SupplyChainRegression

# Initialize regressor
regressor = SupplyChainRegression(model_type='linear')

# Generate or load data
data = regressor.generate_supply_chain_data(1000)

# Train model with comprehensive diagnostics
results = regressor.train(X, y, test_size=0.2)

# Feature importance analysis
importance = regressor.feature_importance()

# Model predictions with confidence intervals
predictions, lower, upper = regressor.predict(X_new, return_intervals=True)

# Generate detailed reports
regressor.visualize_results('regression_results.png')
report = regressor.generate_report('regression_report.md')

Time Series Analysis

from chinese_herbal_sentiment import TimeSeriesAnalyzer

# Initialize analyzer
analyzer = TimeSeriesAnalyzer()

# Load time series data
success = analyzer.load_data(data, time_column='date', value_column='score')

# Trend analysis
trend_results = analyzer.trend_analysis(method='linear')

# Seasonal decomposition
seasonal_results = analyzer.seasonal_analysis()

# Forecasting
forecast_results = analyzer.forecast(periods=30, method='auto')

# Anomaly detection
anomalies = analyzer.detect_anomalies(method='iqr')

# Comprehensive visualization
analyzer.visualize_analysis(
    include_trend=True,
    include_seasonal=True,
    include_forecast=True,
    save_path='timeseries_analysis.png'
)

📊 Examples and Use Cases

E-commerce Review Analysis

import pandas as pd
from chinese_herbal_sentiment import PredictionService

# Load review data
df = pd.read_csv('herbal_reviews.csv')

# Initialize prediction service
service = PredictionService()

# Comprehensive analysis
results = service.analyze_comprehensive(
    texts=df['review_text'].tolist(),
    include_sentiment=True,
    include_keywords=True
)

# Extract insights
sentiment_distribution = results['results']['sentiment_analysis']
key_themes = results['results']['keyword_extraction']

print("Sentiment Distribution:", sentiment_distribution)
print("Key Themes:", key_themes)

Supply Chain Quality Assessment

from chinese_herbal_sentiment import SupplyChainRegression

# Initialize regression analyzer
regressor = SupplyChainRegression()

# Define quality features
quality_features = {
    'material_quality': 8.5,
    'technology': 7.8,
    'delivery_speed': 8.2,
    'after_sales_service': 7.5,
    'processing_environment': 7.9
}

# Predict quality score
predicted_score = regressor.predict([list(quality_features.values())])
print(f"Predicted Quality Score: {predicted_score[0]:.2f}/10")

Market Trend Analysis

from chinese_herbal_sentiment import TimeSeriesAnalyzer

# Load historical sentiment data
analyzer = TimeSeriesAnalyzer()
analyzer.load_data(historical_data, 'date', 'avg_sentiment')

# Analyze trends and patterns
trend_analysis = analyzer.trend_analysis()
seasonal_patterns = analyzer.seasonal_analysis()

# Forecast future sentiment
forecast = analyzer.forecast(periods=90)  # 3 months ahead

# Detect unusual patterns
anomalies = analyzer.detect_anomalies()

print(f"Market Trend: {trend_analysis['trend_direction']}")
print(f"Forecast Average: {np.mean(forecast['predictions']):.3f}")

🧪 Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test modules
python -m pytest tests/test_regression_analysis.py -v
python -m pytest tests/test_prediction_service.py -v
python -m pytest tests/test_time_series_analysis.py -v

# Run with coverage report
python -m pytest tests/ --cov=chinese_herbal_sentiment --cov-report=html

# Test API endpoints (requires FastAPI)
python -m pytest tests/test_api.py -v

📈 Performance Benchmarks

Model Accuracy

Method	Accuracy	Precision	Recall	F1-Score
Dictionary	0.72	0.71	0.72	0.71
SVM	0.85	0.84	0.85	0.84
Naive Bayes	0.82	0.81	0.82	0.81
Logistic Regression	0.87	0.86	0.87	0.86
BERT	0.91	0.90	0.91	0.90
TextCNN	0.89	0.88	0.89	0.88

Processing Speed

Dataset Size	Processing Time	Memory Usage
< 1K reviews	~1-2 seconds	~50MB
1K-10K reviews	~10-30 seconds	~200MB
10K-100K reviews	~2-5 minutes	~1GB
> 100K reviews	~10-30 minutes	~2-4GB

Regression Analysis Performance

Features	R² Score	RMSE	Training Time
5 features	0.85	0.45	~1 second
10 features	0.89	0.38	~2 seconds
15 features	0.92	0.32	~3 seconds

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone the repository
git clone https://github.com/chenxingqiang/chinese-herbal-sentiment.git
cd chinese-herbal-sentiment

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Format code
black chinese_herbal_sentiment tests

# Lint code
flake8 chinese_herbal_sentiment tests

Contribution Areas

🔬 Algorithm Development: Improve existing algorithms or add new ones
📊 Dataset Enhancement: Contribute to the dataset or create new datasets
🔧 Feature Development: Add new features or improve existing ones
📝 Documentation: Improve documentation, examples, and tutorials
🐛 Bug Fixes: Report and fix bugs
⚡ Performance: Optimize performance and memory usage

📦 PyPI Publication

This package is published on PyPI for easy installation and distribution:

Package Information

Package Name: chinese-herbal-sentiment
PyPI URL: https://pypi.org/project/chinese-herbal-sentiment/
Installation: pip install chinese-herbal-sentiment

Version Management

# Check current version
python -c "import chinese_herbal_sentiment; print(chinese_herbal_sentiment.__version__)"

# Build package
python setup.py sdist bdist_wheel

# Upload to PyPI (maintainers only)
twine upload dist/*

Installation Options

# Basic features
pip install chinese-herbal-sentiment

# With deep learning support
pip install chinese-herbal-sentiment[deep_learning]

# With API services
pip install chinese-herbal-sentiment[api]

# With development tools
pip install chinese-herbal-sentiment[dev]

# All features
pip install chinese-herbal-sentiment[all]

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use this package or dataset in your research, please cite:

@software{chinese_herbal_sentiment_2024,
  title={Chinese Herbal Medicine Sentiment Analysis System},
  author={Chen, Xingqiang},
  year={2024},
  version={1.0.0},
  url={https://github.com/chenxingqiang/chinese-herbal-sentiment},
  note={A comprehensive NLP toolkit for Chinese herbal medicine e-commerce analysis}
}

@dataset{chinese_herbal_sentiment_dataset_2024,
  title={Chinese Herbal Medicine Sentiment Analysis Dataset},
  author={Chen, Xingqiang},
  year={2024},
  version={1.0.0},
  url={https://huggingface.co/datasets/xingqiang/chinese-herbal-medicine-sentiment},
  note={A comprehensive sentiment analysis dataset for Traditional Chinese Medicine product reviews}
}

🙏 Acknowledgments

Research Foundation: Based on master's thesis research on Chinese herbal medicine e-commerce supply chain quality evaluation
Dataset Contributors: Thanks to all users who provided review data and e-commerce platforms
Open Source Libraries: Built on scikit-learn, transformers, PyTorch, FastAPI, and other excellent projects
Academic Community: Inspired by research in sentiment analysis, supply chain management, and NLP

📞 Support

📖 Documentation: GitHub Wiki
🐛 Issues: GitHub Issues
📧 Email: chenxingqiang@turingai.cc
💬 Discussions: GitHub Discussions

🔄 Changelog

v1.0.0 (2025-08-26)

✨ New Features: Complete regression analysis module with statistical diagnostics
✨ New Features: Advanced time series analysis with forecasting capabilities
✨ New Features: Unified prediction service with model management
✨ New Features: REST API service with FastAPI and automatic documentation
✨ New Features: Comprehensive test suite with >90% coverage
📊 Dataset: Released Chinese Herbal Medicine Sentiment Dataset (234K+ reviews)
📦 PyPI: Initial PyPI publication with multiple installation options
🔧 Improvements: Enhanced error handling and graceful dependency management
📝 Documentation: Complete API documentation and usage examples

v0.1.0 (2024-12-XX)

🎉 Initial release
✅ Basic sentiment analysis (dictionary, SVM, Naive Bayes, Logistic Regression)
✅ Keyword extraction (TF-IDF, TextRank, LDA)
✅ Deep learning models (BERT, TextCNN, TextRank)
✅ Command-line interface
✅ Comprehensive documentation and examples

📍 Repository: GitHub | 📦 PyPI: Package | 🤗 Dataset: Hugging Face

Note: This package is designed specifically for Chinese herbal medicine e-commerce review analysis and supply chain quality evaluation. The included dataset and models are optimized for Traditional Chinese Medicine domain terminology and sentiment expressions.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Aug 26, 2025

0.1.0

Aug 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chinese_herbal_sentiment-1.0.0.tar.gz (178.3 kB view details)

Uploaded Aug 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chinese_herbal_sentiment-1.0.0-py3-none-any.whl (147.0 kB view details)

Uploaded Aug 26, 2025 Python 3

File details

Details for the file chinese_herbal_sentiment-1.0.0.tar.gz.

File metadata

Download URL: chinese_herbal_sentiment-1.0.0.tar.gz
Upload date: Aug 26, 2025
Size: 178.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chinese_herbal_sentiment-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`173bf15f949b1492c5e1955bdca264c1c0b9000ddde3b397996d4aefd527697f`
MD5	`d0ea36d12154e718914cab3715918f17`
BLAKE2b-256	`a2fe197909c56ddf1678bba66b5bb1303a546631db843b4315c2d70436d18790`

See more details on using hashes here.

File details

Details for the file chinese_herbal_sentiment-1.0.0-py3-none-any.whl.

File metadata

Download URL: chinese_herbal_sentiment-1.0.0-py3-none-any.whl
Upload date: Aug 26, 2025
Size: 147.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chinese_herbal_sentiment-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`156b0aa92abcd876915bb92309a0488d981b1bdf5cf71f1880755458c7b0ef6d`
MD5	`745d349bb297daed8ca6aec5aea08646`
BLAKE2b-256	`e5e6af71ce0dc0473a9175e7a127b71b9eecb68d336853d7da563ad20892b0b4`

See more details on using hashes here.

chinese-herbal-sentiment 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Chinese Herbal Medicine Sentiment Analysis System

🎯 Features

🔍 Sentiment Analysis

🔑 Keyword Extraction

📊 Advanced Analytics ✨

🚀 API Services ✨

🛠️ Utility Features

📊 Dataset

Chinese Herbal Medicine Sentiment Dataset

Quick Dataset Access

Dataset Features

🚀 Installation

From PyPI (Recommended)

From Source

🚀 Quick Start

Basic Usage

Advanced Analytics

API Services

Command Line Usage

📚 Documentation

Core Classes

SentimentAnalysis

KeywordExtraction

PredictionService

Advanced Features

Regression Analysis

Time Series Analysis

📊 Examples and Use Cases

E-commerce Review Analysis

Supply Chain Quality Assessment

Market Trend Analysis

🧪 Testing

📈 Performance Benchmarks

Model Accuracy

Processing Speed

Regression Analysis Performance

🤝 Contributing

Development Setup

Contribution Areas

📦 PyPI Publication

Package Information

Version Management

Installation Options

📄 License

📚 Citation

🙏 Acknowledgments

📞 Support

🔄 Changelog

v1.0.0 (2025-08-26)

v0.1.0 (2024-12-XX)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes