Thin wrapper around HuggingFace Transformers sequence classification models for ease of use
Project description
OP Text
A wrapper around the popular transformers machine learning library, by the HuggingFace team. OP Text provides a simplified, Keras like, interface for fine-tuning, evaluating and inference of popular pretrained BERT models.
Installation
PyTorch is required as a prerequisite before installing OP Text. Head on over to the getting started page of their website and follow the installation instructions for your version of Python.
!Currently only Python versions 3.6 and above are supported
Use one of the following commands to install OP Text:
with pip
pip install op_text
with anaconda
conda install op_text
Usage
The entire purpose of this package is to allow users to leverage the power of the transformer models available in HuggingFace's library without needing to understand how to use PyTorch.
Model Loading
Currently the available models are:
-
BERT from the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(https://arxiv.org/abs/1810.04805) released by Google.
-
RoBERTa from the paper Robustly Optimized BERT Pretraining Approach released by Facebook.
-
DistilBERT from the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter released by HuggingFace.
Each model has contains a list of the available pretrained models. Use the DOWNLOADABLES property to access them.
Bert.DOWNLOADBLES
>> ['bert-base-uncased','bert-large-uncased']
Roberta.DOWNLOADBLES
>> ['roberta-base','roberta-large']
DistilBert.DOWNLOADABLES
>> ['distilbert-base-uncased']
Loading a model is achieved in one line of code. The string can either be the name of a pretrained model or a path to a local fine-tuned model on disk.
from op_text.models import Bert
# Loading a pretrained model
model = Bert("bert-base-uncased", num_labels=2)
# Loading a fine-tuned model
model = Bert("path/to/local/model/")
Supply num_labels when using a pretrained model, as an untrained classification head is added when using this one of the DOWNLOADABLE strings.
Fine-tuning
Finetuning a model is as simple as loading a dataset, instantiating a model and then passing it to the models fit function.
from models.import Bert
X_train = [
"Example sentence 1"
"Example sentence 2"
"Today was a horrible day"
]
y_train = [1,1,0]
model = Bert('bert-base-uncased', num_labels=2)
model.fit(X_train, y_train)
Saving
At the conclusion of training you will most likely want to save your model to disk. Simply call the the models save function and supply an output directory and name for the model.
from models.import Bert
model = Bert('bert-base-uncased', num_labels=2)
model.save("path/to/output/dir/", "example_save_name")
Evaluation
Model evaluation is basically the same as model training. Load a dataset, instantiate a model and but instead call the evaluate function. This returns a number between 0 and 1 which is the percentage of predictions the model got correct.
model.evalaute(X_test, y_test)
>> 0.8 # 80% correct predictions
Prediction
Predict the label of a piece of text/s by passing a list of strings to the models predict function. This returns a list of tuples, one for each piece of text to be predicted. These tuples contain the models confidence scores for each class and the numerical label of the predicted class. If a label converter is supplied, a string label of the predicted class is also included in each output tuple.
from op_text.utils import LabelConverter
converter = LabelConverter({0: "negative", 1: "positive"}
to_predict = ["Today was a great day!"]
model.predict(to_predict, label_converter=converter)
>> [([0.02, 0.98], 1, "positive")]
Citation
Paper you can cite for the Transformers library:
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file op_text-0.2.0.tar.gz
.
File metadata
- Download URL: op_text-0.2.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17dc44493933570dda22c08f895431a818df0d1d5f55073bd7df7ab913db0e8e |
|
MD5 | e4cdf237b3e0b1641d0c9c9233d30d19 |
|
BLAKE2b-256 | 75555941a124af4aa54b4e50a60a8192cf470e328075106543765a0dcca51f42 |
File details
Details for the file op_text-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: op_text-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8e0ae445201ed2ff6f6aee5e093bd5f6668ef4f23e6e11033fa35dd1e26f04b |
|
MD5 | bc787066140a202af3b017aaf9e1664f |
|
BLAKE2b-256 | b0f8e27daa6a5a245424e078bd351d7e6bd158d90e8139a706bdf067026308f2 |