Text Classification with Transformers
Project description
Text Classification with Transformers
Overview
NLPipes
is for people unfamiliar with Transformers who want an end to end solution to solve practical text classification problems, including:
- Single-label classification: A typical use case is sentiment detection where one want to detect the overall sentiment polarity (e.g., positive, neutral, negative) in a review.
- Multi-label classification: A typical use case is aspect categories detection where one want to detect the multiple aspects mentionned in a review (e.g., product_quality, delivery_time, price, ...).
- Class-label classification: A typical use case is aspect based sentiment analysis where one want to detect sentiment polarity associated to each aspect categories mentionned in a review (e.g., product_quality: neutral, delivery_time: negative, price: positive, ...).
NLPipes
expose a Model
API that provide a unique and simple abstraction for all the tasks.
The library maintain a common usage pattern across models (train, evaluate, predict, save) with
also a clear and consistent data structure (python lists as inputs/outputs data).
Built with
NLPipes
is built with TensorFlow and HuggingFace Transformers:
- TensorFlow: An end-to-end open source deep learning framework
- Transformers: An general-purpose open-sources library for transformers-based architectures
Getting Started
Installation
- Create a virtual environment
python3 -m venv nlpipesenv
source nlpipesenv/bin/activate
- Install the package
pip install nlpipes
Tasks
A model can be trained for a specific task by first loading a backbone model. The train command takes at minimum two parameters (X and Y), where X is a list of texts to train on and Y is the training target.
The training target expect different formats, depending on what task you want to solve:
Single Label Classification:
Give one label name for each sequence of text in X
:
model = Model("albert-base-v2",
task='single-label-classification',
all_labels=["NEG", "NEU", "POS"],
)
X = ["This was bad.", "This was great!"]
Y = ["NEG", "POS"]
model.train(X, Y)
Multiple Label Classification:
Give a list of class names for each sequence of text in X
:
model = Model("albert-base-v2",
task='multi-label-classification',
all_labels=all_labels,
)
X = ["I want a refund!",
"The bill I got is not correct and I also have technical issues",
"All good"]
Y = [
["billing"],
["billing", "tech support"],
[]
]
model.train(X, Y)
Aspect Based Classification:
Give a list of lists of label lists (pairs) for each given text in X
:
model = Model("albert-base-v2",
task='class-label-classification',
all_labels=["NEG", "NEU", "POS"],
)
X = ["The room was nice.",
"The food was great, but the staff was unfriendly.",
"The room was horrible, but the waiters were welcoming"]
Y = [
[["room", "POS"],
[["food", "POS"], ["staff", "NEG"]],
[["room", "NEG"], ["staff", "POS"]],
]
model.train(X, Y)
Examples
Here are some examples on open datasets that show how to use NLPipes
on different tasks:
Name | Notebook | Description | Task | Size | Memory | Speed |
---|---|---|---|---|---|---|
GooglePlay Sentiment Detection | Available | Train a model to detect the sentiment polarity from the GooglePlay store | Single label classification | |||
StackOverflow tags Detection | Available | Train a model to detect tags from the StackOverFlow questions | Multiple label classification | |||
Amazon Aspect Based Sentiment Detection | Available | Train a model to detect the aspect based sentiment polarity on Laptops Amazon reviews | Class label classification |
Notices
NLPipes
is still in its early stage. The library comes with no warranty and future releases could bring substantial API and behavior changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.