Skip to main content

This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).

Project description

few-shot-learning-nlp

This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).

Overview

Few Shot Learning in NLP involves training and evaluating models on tasks with limited labeled data. This library offers functionalities to facilitate this process.

Installation

You can install this library via pip:

pip install -U few-shot-learning-nlp

Documentation

The documentation for this library is available here.

Supported Approaches

Text Classification

  • Sentence Transformers Finetuning (SetFit)
  • Pattern Exploiting Training (PET)

Named Entity Recognition for Image Documents

Classification Utils

Usage

To utilize this library, import the necessary classes and methods and follow the provided documentation for each component.

Here is a short example of the SetFit implementation

from datasets import load_dataset
import pandas as pd
from few_shot_learning_nlp.utils import stratified_train_test_split
from torch.utils.data import DataLoader
from few_shot_learning_nlp.few_shot_text_classification.setfit_dataset import SetFitDataset

# Load a dataset for text classification
ag_news_dataset = load_dataset("ag_news")

# Extract necessary information from the dataset
num_classes = len(ag_news_dataset['train'].features['label'].names)

# Perform few-shot learning by selecting a limited number of classes
n_shots = 50
train_validation, test_df = stratified_train_test_split(ag_news_dataset['train'], num_shots_per_class=n_shots)
train_df, val_df = stratified_train_test_split(pd.DataFrame(train_validation), num_shots_per_class=30)

# Create SetFitDataset objects for training and validation
set_fit_data_train = SetFitDataset(train_df['text'], train_df['label'], input_example_format=True)
set_fit_data_val = SetFitDataset(val_df['text'], val_df['label'], input_example_format=False)

# Create DataLoader objects for training and validation datasets
train_dataloader = DataLoader(set_fit_data_train.data, shuffle=False)
val_dataloader = DataLoader(set_fit_data_val)

Defining Classifier

import torch

class CLF(torch.nn.Module):
    def __init__(
        self,
        in_features : int,
        out_features : int, 
        *args, 
        **kwargs
    ) -> None:
        super().__init__(*args, **kwargs)

        self.layer1 = torch.nn.Linear(in_features, 128)
        self.relu = torch.nn.ReLU()
        self.layer2 = torch.nn.Linear(128, 32)
        self.layer3 = torch.nn.Linear(32, out_features)

    def forward(self, x : torch.Tensor):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        x = self.relu(x)
        return self.layer3(x)

Training the Embedding Model

import torch
from sentence_transformers import SentenceTransformer
from few_shot_learning_nlp.few_shot_text_classification.setfit import SetFitTrainer

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer("whaleloops/phrase-bert")

# Initialize the SetFitTrainer with embedding model and classifier
embedding_model = model.to("cuda")
in_features = embedding_model.get_sentence_embedding_dimension()
clf = CLF(in_features, num_classes).to("cuda")
trainer = SetFitTrainer(embedding_model, clf, num_classes)

# Train the embedding model
trainer.train_embedding(train_dataloader, val_dataloader, n_epochs=10)

Training the Classifier Model

# Shuffle training data
_, class_counts = np.unique(train_df['label'], return_counts=True)
X_train_shuffled, y_train_shuffled = shuffle_two_lists(train_df['text'], train_df['label'])

# Train the classifier
history, embedding_model, clf = trainer.train_classifier(
    X_train_shuffled, y_train_shuffled, val_df['text'], val_df['label'],
    clf=CLF(in_features, num_classes),
    n_epochs=15,
    lr=1e-4
)

Testing the Models

y_true, y_pred = trainer.test(test_df)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

few_shot_learning_nlp-1.0.4-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file few_shot_learning_nlp-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for few_shot_learning_nlp-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7b523b90123307f0fb64f6b4875d12541b2fb2ec0761f24f99b59c2fc77fd61e
MD5 b59a8c9e13cc7ecd2dc9a20ac23be213
BLAKE2b-256 f24d9c8ef0fd029fb838eac3c17f78655889d1e67eb3b6d6c860dddf3c952f68

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page