Skip to main content

Thin wrapper around HuggingFace Transformers sequence classification models for ease of use

Project description

OP Text

A wrapper around the popular transformers machine learning library, by the HuggingFace team. OP Text provides a simplified, Keras like, interface for fine-tuning, evaluating and inference of popular pretrained BERT models.

Installation

PyTorch is required as a prerequisite before installing OP Text. Head on over to the getting started page of their website and follow the installation instructions for your version of Python.

!Currently only Python versions 3.6 and above are supported

Use one of the following commands to install OP Text:

with pip

pip install op_text

with anaconda

conda install op_text

Usage

The entire purpose of this package is to allow users to leverage the power of the transformer models available in HuggingFace's library without needing to understand how to use PyTorch.

Model Loading

Currently the available models are:

  1. BERT from the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(https://arxiv.org/abs/1810.04805) released by Google.

  2. RoBERTa from the paper Robustly Optimized BERT Pretraining Approach released by Facebook.

  3. DistilBERT from the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter released by HuggingFace.

Each model has contains a list of the available pretrained models. Use the DOWNLOADABLES property to access them.

Bert.DOWNLOADBLES
>> ['bert-base-uncased','bert-large-uncased']

Roberta.DOWNLOADBLES
>> ['roberta-base','roberta-large']

DistilBert.DOWNLOADABLES
>> ['distilbert-base-uncased']

Loading a model is achieved in one line of code. The string can either be the name of a pretrained model or a path to a local fine-tuned model on disk.

from op_text.models import Bert

# Loading a pretrained model 
model = Bert("bert-base-uncased", num_labels=2)

# Loading a fine-tuned model
model = Bert("path/to/local/model/")

Supply num_labels when using a pretrained model, as an untrained classification head is added when using this one of the DOWNLOADABLE strings.

Fine-tuning

Finetuning a model is as simple as loading a dataset, instantiating a model and then passing it to the models fit function.

from models.import Bert

X_train = [
	"Example sentence 1"
	"Example sentence 2"
	"Today was a horrible day"
]
y_train = [1,1,0]

model = Bert('bert-base-uncased', num_labels=2)
model.fit(X_train, y_train)

Saving

At the conclusion of training you will most likely want to save your model to disk. Simply call the the models save function and supply an output directory and name for the model.

from models.import Bert
model = Bert('bert-base-uncased', num_labels=2)
model.save("path/to/output/dir/", "example_save_name")

Evaluation

Model evaluation is basically the same as model training. Load a dataset, instantiate a model and but instead call the evaluate function. This returns a number between 0 and 1 which is the percentage of predictions the model got correct.

model.evalaute(X_test, y_test)
>> 0.8 # 80% correct predictions

Prediction

Predict the label of a piece of text/s by passing a list of strings to the models predict function. This returns a list of tuples, one for each piece of text to be predicted. These tuples contain the models confidence scores for each class and the numerical label of the predicted class. If a label converter is supplied, a string label of the predicted class is also included in each output tuple.

from op_text.utils import LabelConverter

converter = LabelConverter({0: "negative", 1: "positive"}    
to_predict = ["Today was a great day!"]
model.predict(to_predict, label_converter=converter)
>> [([0.02, 0.98], 1, "positive")]

Citation

Paper you can cite for the Transformers library:

@article{Wolf2019HuggingFacesTS,
  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.03771}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

op_text-0.2.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

op_text-0.2.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file op_text-0.2.0.tar.gz.

File metadata

  • Download URL: op_text-0.2.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1

File hashes

Hashes for op_text-0.2.0.tar.gz
Algorithm Hash digest
SHA256 17dc44493933570dda22c08f895431a818df0d1d5f55073bd7df7ab913db0e8e
MD5 e4cdf237b3e0b1641d0c9c9233d30d19
BLAKE2b-256 75555941a124af4aa54b4e50a60a8192cf470e328075106543765a0dcca51f42

See more details on using hashes here.

File details

Details for the file op_text-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: op_text-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.1

File hashes

Hashes for op_text-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8e0ae445201ed2ff6f6aee5e093bd5f6668ef4f23e6e11033fa35dd1e26f04b
MD5 bc787066140a202af3b017aaf9e1664f
BLAKE2b-256 b0f8e27daa6a5a245424e078bd351d7e6bd158d90e8139a706bdf067026308f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page