A comprehensive pipeline for sentiment analysis using deep learning models
Project description
NLP Sentiment Analysis Pipeline
A comprehensive, modular pipeline for sentiment analysis using deep learning models. This package provides tools for data extraction, preprocessing, model training, and evaluation.
Features
- Data Preparation: Extract and preprocess text data for sentiment analysis
- Modeling: Baseline models with TF-IDF vectorization and neural networks
- Evaluation: Comprehensive model evaluation utilities
Installation
From PyPI (once published)
pip install nlp-sentiment-pipeline
From Source
git clone https://github.com/FranzCastillo/NLP-Tweets-Sentiment-Analysis-DL-Models
cd NLP-Tweets-Sentiment-Analysis-DL-Models/pipeline
pip install -e .
For Development
pip install -e ".[dev]"
Usage
As a Python Package
from pipeline.data_preparation import DataExtractor, TextPreprocessor, DataSplitter
from pipeline.modeling import BaselineModel, TfidfVectorizerWrapper
from pipeline.evaluation import ModelEvaluator
# Extract data
extractor = DataExtractor(split="train")
df = extractor.extract()
# Preprocess text
preprocessor = TextPreprocessor(remove_stopwords=True)
df['clean_text'] = df['text'].apply(preprocessor.preprocess)
# Train model
model = BaselineModel()
# ... training code ...
# Evaluate
evaluator = ModelEvaluator(model, X_test, y_test)
results = evaluator.evaluate()
As a Command-Line Tool
nlp-sentiment-pipeline
Package Structure
pipeline/
├── __init__.py
├── main.py
├── data_preparation/ # Data extraction and preprocessing
│ ├── __init__.py
│ ├── extraction.py
│ ├── preprocessing.py
│ └── data_splitter.py
├── modeling/ # Model definitions and utilities
│ ├── __init__.py
│ ├── baseline.py
│ ├── vectorizer.py
│ └── model_evaluator.py
└── evaluation/ # Evaluation utilities
├── __init__.py
├── evaluator.py
└── model_evaluator.py
Subpackages
data_preparation
Tools for data extraction and preprocessing:
DataExtractor: Extract datasets from various sourcesTextPreprocessor: Clean and preprocess text dataDataSplitter: Split data into train/validation/test sets
modeling
Model implementations and utilities:
BaselineModel: Baseline neural network modelTfidfVectorizerWrapper: TF-IDF vectorization wrapper- Various deep learning models
evaluation
Model evaluation tools:
ModelEvaluator: Comprehensive model evaluationevaluate_model: Quick evaluation functionprint_evaluation_results: Pretty-print evaluation metrics
Requirements
- Python >= 3.8
- TensorFlow >= 2.15.0
- pandas >= 2.3.1
- scikit-learn >= 1.5.2
- nltk >= 3.9.1
- spacy >= 3.8.7
See requirements.txt for a complete list of dependencies.
Development
Running Tests
pytest
Code Formatting
black pipeline/
Type Checking
mypy pipeline/
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Authors
Francisco Castillo - cas21562@uvg.edu.gt
Changelog
0.1.0 (2025-11-14)
- Initial release
- Data preparation subpackage
- Modeling subpackage
- Evaluation subpackage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlp_sentiment_pipeline-0.1.0.tar.gz.
File metadata
- Download URL: nlp_sentiment_pipeline-0.1.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
050292bdaff4827f8cf3fa72a409d9db90d958eff7273657593f4de70a40f2a8
|
|
| MD5 |
9261b626e81868cec7c9a4a5c4bddda9
|
|
| BLAKE2b-256 |
f3d8888d01ad9b889fb61f8f9492590ac9ba3e4c1c19ccda8156252e0a68631a
|
File details
Details for the file nlp_sentiment_pipeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nlp_sentiment_pipeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61febd89142af9248f61c392a587df1b88f54195d6246dccb3b19dba3d765ed3
|
|
| MD5 |
bc38c63493c9808db43bc47b0a18aa65
|
|
| BLAKE2b-256 |
3c63573489c20e4d70c841c7d0d759447cd9123654de53eb35275185395ae12d
|