Skip to main content

Utilities for training and working with nlp models in pytorch

Project description

xt-nlp

Description

This repo contains common NLP pre/post processing functions, loss functions, metrics, and helper functions.

Installation

From PyPI:

pip install xt-nlp

From source:

git clone https://github.com/XtractTech/xt-nlp.git
pip install ./xt-nlp

Usage

See specific help on a class or function using help. E.g., help(SESLoss).

Defining SES Metrics and Loss

from xt_nlp.metrics import SESF1
from xt_nlp.metrics import SESLoss

eval_metrics = {
   'f1': SESF1(threshold=0.8)
}
loss_fn = SESLoss()

Read BRAT annotations for sequence extraction into data loader

from xt_nlp.utils import get_brat_examples, split_examples, get_features, build_ses_dataloader

# tokenizer = 
# max_sequence_length = 
# doc_stride =
# class_dict = Dictionary mapping classname ==> list of classes to group into this class
# classes = 
# batch_size = 
# workers = 

examples = get_brat_examples(
    datadir='./data/datadir',
    classes=classes
)

train_examples, val_examples = split_examples(examples, train_prop=.9, seed=4000)

train_features = get_features(
    examples=train_examples, 
    tokenizer=tokenizer, 
    all_ans_types=classes, 
    max_seq_len=max_sequence_length,
    doc_stride=doc_stride,
    mode='train'
)

train_loader = build_ses_dataloader(
    train_features, 
    classes, 
    class_dict, 
    batch_size=batch_size,
    workers=workers,
    max_seq_length=max_sequence_length,
    shuffle=True
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xt-nlp-0.2.2.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

xt_nlp-0.2.2-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file xt-nlp-0.2.2.tar.gz.

File metadata

  • Download URL: xt-nlp-0.2.2.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt-nlp-0.2.2.tar.gz
Algorithm Hash digest
SHA256 19624595c7502f483a588bd77d568935dfd6ec0b2c062ba2c6fc7f4c1edaba4f
MD5 ecac8c85611ccce540de17bf591b7705
BLAKE2b-256 d50bee05531cb448e0f1b6f36dbcb32d65b39f45b6ee4ba229b47589ef541882

See more details on using hashes here.

File details

Details for the file xt_nlp-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: xt_nlp-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt_nlp-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8560a822aa0a08798b53f0415d03644740767a87ec6862f6ab37e1b51906758d
MD5 10dca0440c09f92963a47d91d7975986
BLAKE2b-256 0376d22fabe392952a5b73ee63598b4ba4bcf01f4ec3e42b09eba269148fe01e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page