Skip to main content

Model hub for transformers.

Project description

Usage Sample ''''''''''''

.. code:: python

    import pandas as pd
    from sklearn.model_selection import train_test_split
    import torch
    from transformers import BertTokenizer
    from nlpx.tokenize.utils import get_df_text_labels
    from nlpx.dataset import TextDataset, text_collate
    from transformers_model import AutoCNNTextClassifier, AutoCNNTokenClassifier,BertDataset, BertCollator, BertTokenizeCollator
    from nlpx.model.wrapper import ClassifyModelWrapper
    
    ######################## AutoCNNTextClassifier classification ##########################
    classes = ['class1', 'class2', 'class3'...]
    texts = [[str],]
    labels = [0, 0, 1, 2, 1...]
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    train_texts, test_texts, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)
    
    train_set = TextDataset(train_texts, y_train)
    test_set = TextDataset(test_texts, y_test)
    model = AutoCNNTextClassifier(pretrained_path, len(classes), device)
    wrapper = ClassifyModelWrapper(model, classes, device)
    _ = wrapper.train(train_set, test_set, collate_fn=text_collate)

    ######################### AutoCNNTokenClassifier classification ##########################
    tokenizer = BertTokenizer.from_pretrained(pretrained_path)

    ###################################### BertCollator ######################################
    train_tokenizies = tokenizer.batch_encode_plus(
            train_texts,
            max_length=60,
            padding="max_length",
            truncation=True,
            return_token_type_ids=True,
            return_attention_mask=True,
            return_tensors="pt",
    )

    test_tokenizies = tokenizer.batch_encode_plus(
            test_texts,
            max_length=256,
            padding="max_length",
            truncation=True,
            return_token_type_ids=True,
            return_attention_mask=True,
            return_tensors="pt",
    )

    train_set = BertDataset(train_tokenizies, y_train)
    test_set = BertDataset(test_tokenizies, y_test)

    model = AutoCNNTokenClassifier(pretrained_path, len(classes), device)
    wrapper = ClassifyModelWrapper(model, classes, device)
    _ = wrapper.train(train_set, test_set, collate_fn=BertCollator())

    ################################ BertTokenizeCollator ################################
    train_set = TextDataset(train_texts, y_train)
    test_set = TextDataset(test_texts, y_test)
    model = AutoCNNTokenClassifier(pretrained_path, len(classes), device)
    wrapper = ClassifyModelWrapper(model, classes, device)
    _ = wrapper.train(train_set, test_set, collate_fn=BertTokenizeCollator(tokenizer, 60))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformers-model-0.0.2.tar.gz (6.8 kB view details)

Uploaded Source

File details

Details for the file transformers-model-0.0.2.tar.gz.

File metadata

  • Download URL: transformers-model-0.0.2.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for transformers-model-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6ca5005427cf79d0fa64e276b7a466ab44f535dd9c080393bb59a733f9048942
MD5 b49f9862555526e4bd352e111105f205
BLAKE2b-256 122a3a07a5fc6c3aa77152230a105d827e585dffdb82e431c807b780e40974b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page