Skip to main content

A comprehensive pipeline for sentiment analysis using deep learning models

Project description

NLP Sentiment Analysis Pipeline

A comprehensive, modular pipeline for sentiment analysis using deep learning models. This package provides tools for data extraction, preprocessing, model training, and evaluation.

Features

  • Data Preparation: Extract and preprocess text data for sentiment analysis
  • Modeling: Baseline models with TF-IDF vectorization and neural networks
  • Evaluation: Comprehensive model evaluation utilities

Installation

From PyPI (once published)

pip install nlp-sentiment-pipeline

From Source

git clone https://github.com/FranzCastillo/NLP-Tweets-Sentiment-Analysis-DL-Models
cd NLP-Tweets-Sentiment-Analysis-DL-Models/pipeline
pip install -e .

For Development

pip install -e ".[dev]"

Usage

As a Python Package

from pipeline.data_preparation import DataExtractor, TextPreprocessor, DataSplitter
from pipeline.modeling import BaselineModel, TfidfVectorizerWrapper
from pipeline.evaluation import ModelEvaluator

# Extract data
extractor = DataExtractor(split="train")
df = extractor.extract()

# Preprocess text
preprocessor = TextPreprocessor(remove_stopwords=True)
df['clean_text'] = df['text'].apply(preprocessor.preprocess)

# Train model
model = BaselineModel()
# ... training code ...

# Evaluate
evaluator = ModelEvaluator(model, X_test, y_test)
results = evaluator.evaluate()

As a Command-Line Tool

nlp-sentiment-pipeline

Package Structure

pipeline/
├── __init__.py
├── main.py
├── data_preparation/      # Data extraction and preprocessing
│   ├── __init__.py
│   ├── extraction.py
│   ├── preprocessing.py
│   └── data_splitter.py
├── modeling/              # Model definitions and utilities
│   ├── __init__.py
│   ├── baseline.py
│   ├── vectorizer.py
│   └── model_evaluator.py
└── evaluation/            # Evaluation utilities
    ├── __init__.py
    ├── evaluator.py
    └── model_evaluator.py

Subpackages

data_preparation

Tools for data extraction and preprocessing:

  • DataExtractor: Extract datasets from various sources
  • TextPreprocessor: Clean and preprocess text data
  • DataSplitter: Split data into train/validation/test sets

modeling

Model implementations and utilities:

  • BaselineModel: Baseline neural network model
  • TfidfVectorizerWrapper: TF-IDF vectorization wrapper
  • Various deep learning models

evaluation

Model evaluation tools:

  • ModelEvaluator: Comprehensive model evaluation
  • evaluate_model: Quick evaluation function
  • print_evaluation_results: Pretty-print evaluation metrics

Requirements

  • Python >= 3.8
  • TensorFlow >= 2.15.0
  • pandas >= 2.3.1
  • scikit-learn >= 1.5.2
  • nltk >= 3.9.1
  • spacy >= 3.8.7

See requirements.txt for a complete list of dependencies.

Development

Running Tests

pytest

Code Formatting

black pipeline/

Type Checking

mypy pipeline/

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Authors

Francisco Castillo - cas21562@uvg.edu.gt

Changelog

0.1.2 (2025-11-15)

  • Second test release for PyPI publishing
  • CI/CD pipeline testing

0.1.1 (2025-11-15)

  • Test release for CI/CD pipeline
  • Updated GitHub Actions workflows

0.1.0 (2025-11-14)

  • Initial release
  • Data preparation subpackage
  • Modeling subpackage
  • Evaluation subpackage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_sentiment_pipeline-0.1.2.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp_sentiment_pipeline-0.1.2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file nlp_sentiment_pipeline-0.1.2.tar.gz.

File metadata

  • Download URL: nlp_sentiment_pipeline-0.1.2.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlp_sentiment_pipeline-0.1.2.tar.gz
Algorithm Hash digest
SHA256 760e3b787e2c0ff81ec2c69b29dfd74036c4c35e661c3422442c9d6a026e61ce
MD5 3189e054e91c350051630729be2fc1c2
BLAKE2b-256 05f5c6e2c67b416b87202d7816f5413f01ab7630296a90907cdf7a66aed23585

See more details on using hashes here.

File details

Details for the file nlp_sentiment_pipeline-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for nlp_sentiment_pipeline-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 42e43b0f8b13507e642c3a3cd1b37c6c346a729bdd410de8e61fe2ed07e8e620
MD5 1c0411f223cc9204affef8d1b1d1a05b
BLAKE2b-256 a2e59efa8ca770d6e64cefeb5fffa7a2ae0541de1abd5c8295be8a3da74f8b15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page