Skip to main content

Utilities for training and working with nlp models in pytorch

Project description

xt-nlp

Description

This repo contains common NLP pre/post processing functions, loss functions, metrics, and helper functions.

Installation

From PyPI:

pip install xt-nlp

From source:

git clone https://github.com/XtractTech/xt-nlp.git
pip install ./xt-nlp

Usage

See specific help on a class or function using help. E.g., help(SESLoss).

Defining SES Metrics and Loss

from xt_nlp.metrics import SESF1
from xt_nlp.metrics import SESLoss

eval_metrics = {
   'f1': SESF1(threshold=0.8)
}
loss_fn = SESLoss()

Read BRAT annotations for sequence extraction into data loader

from xt_nlp.utils import get_brat_examples, split_examples, get_features, build_ses_dataloader

# tokenizer = 
# max_sequence_length = 
# doc_stride =
# class_dict = Dictionary mapping classname ==> list of classes to group into this class
# classes = 
# batch_size = 
# workers = 

examples = get_brat_examples(
    datadir='./data/datadir',
    classes=classes
)

train_examples, val_examples = split_examples(examples, train_prop=.9, seed=4000)

train_features = get_features(
    examples=train_examples, 
    tokenizer=tokenizer, 
    all_ans_types=classes, 
    max_seq_len=max_sequence_length,
    doc_stride=doc_stride,
    mode='train'
)

train_loader = build_ses_dataloader(
    train_features, 
    classes, 
    class_dict, 
    batch_size=batch_size,
    workers=workers,
    max_seq_length=max_sequence_length,
    shuffle=True
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xt-nlp-0.2.7.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

xt_nlp-0.2.7-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file xt-nlp-0.2.7.tar.gz.

File metadata

  • Download URL: xt-nlp-0.2.7.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt-nlp-0.2.7.tar.gz
Algorithm Hash digest
SHA256 b6016607478ad1414c42501947a2ac47060d795e54799f0a2f331453cfe6b1b0
MD5 d1f3a7968d44cf3be7a43e621b822f14
BLAKE2b-256 a9861fb2595aa73ff6c03cd2f58aac698a96a55394c2d9a3e9eb5338a49adaa5

See more details on using hashes here.

File details

Details for the file xt_nlp-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: xt_nlp-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt_nlp-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3599c4d8429258b2cbc6b91544f3156388841bc65cb09114466911dc9bcd8dff
MD5 c0eced4f2c4fe83087db68ed962c895c
BLAKE2b-256 955e7793f4745de5a3d594103b7992f83a032f8e170900b79525a3dd05db0898

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page