Skip to main content

Easy fine-tuning for BERT models

Project description

pypi version pypi downloads

bert-for-sequence-classification

Pipeline for easy fine-tuning of BERT architecture for sequence classification

Quick Start

Installation

  1. Install the library
pip install bert-for-sequence-classification
  1. If you want to train you model on GPU, please install pytorch version compatible with your device.

To find the version compatible with the cuda installed on your GPU, check Pytorch website. You can learn CUDA version installed on your device by typing nvidia-smi in console or !nvidia-smi in a notebook cell.

CLI Use

bert-clf-train --path_to_config <path to yaml file>

Example config file can be found here

Jupyter notebook

Example notebook can be found here

Inference mode

When using your trained model for inference it depends on how you saved your model

if path_to_state_dict in config is equal to false, then if you have the library installed:

import torch
import pandas as pd

device = torch.device("cuda" if  torch.cuda.is_available() else "cpu")

model = torch.load(
    "path_to_saved_model", map_location=device
)
    
model.eval()

df = pd.read_csv("path_to_some_df")

df["target_column"] = df["text_column"].apply(model.predict)

Otherwise:

import torch
import json
import pandas as pd
from bert_clf.src.models.BertCLF import BertCLF
from transformers import AutoModel, AutoTokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path="pretrained_model_name_or_path"
)
model_bert = AutoModel.from_pretrained(
    pretrained_model_name_or_path="pretrained_model_name_or_path"
).to(device)

id2label = json.load(open("path/to/saved/mapper"))  # mapper is saved with the state dict

model = BertCLF(
    pretrained_model=model_bert,
    tokenizer=tokenizer,
    id2label=id2label,
    dropout="some number",
    device=device
)

model.load_state_dict(
    torch.load(
        "path_to_state_dict", map_location=device
    ),
    strict=False
)

model.eval()

df = pd.read_csv("path_to_some_df")

df["target_column"] = df["text_column"].apply(model.predict)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert-for-sequence-classification-0.1.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file bert-for-sequence-classification-0.1.1.tar.gz.

File metadata

File hashes

Hashes for bert-for-sequence-classification-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2c39fa648c7d1523e97a63dfeb907d74ba5ed47d38f9880a217eeff232086cb6
MD5 3b27782992f1f5976ccbde5905daf375
BLAKE2b-256 75e53e18689aad35038b563afea53a9d255c8343325b23c48a2c3f0c0ba64582

See more details on using hashes here.

File details

Details for the file bert_for_sequence_classification-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bert_for_sequence_classification-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9931feb38746ac4132ee8ccd17c448f83de42534d9ef575c3c5eb03d43cc9d01
MD5 337229c23ad7ad0dced11aa3fc71d21b
BLAKE2b-256 dfd6e728eb2b59c8a86a0a143c24660a6c8115aaac7954b032c00fe527a9af41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page