Skip to main content

Utilities for training and working with nlp models in pytorch

Project description

xt-nlp

Description

This repo contains common NLP pre/post processing functions, loss functions, metrics, and helper functions.

Installation

From PyPI:

pip install xt-nlp

From source:

git clone https://github.com/XtractTech/xt-nlp.git
pip install ./xt-nlp

Usage

See specific help on a class or function using help. E.g., help(SESLoss).

Defining SES Metrics and Loss

from xt_nlp.metrics import SESF1
from xt_nlp.metrics import SESLoss

eval_metrics = {
   'f1': SESF1(threshold=0.8)
}
loss_fn = SESLoss()

Read BRAT annotations for sequence extraction into data loader

from xt_nlp.utils import get_brat_examples, split_examples, get_features, build_ses_dataloader

# tokenizer = 
# max_sequence_length = 
# doc_stride =
# class_dict = Dictionary mapping classname ==> list of classes to group into this class
# classes = 
# batch_size = 
# workers = 

examples = get_brat_examples(
    datadir='./data/datadir',
    classes=classes
)

train_examples, val_examples = split_examples(examples, train_prop=.9, seed=4000)

train_features = get_features(
    examples=train_examples, 
    tokenizer=tokenizer, 
    all_ans_types=classes, 
    max_seq_len=max_sequence_length,
    doc_stride=doc_stride,
    mode='train'
)

train_loader = build_ses_dataloader(
    train_features, 
    classes, 
    class_dict, 
    batch_size=batch_size,
    workers=workers,
    max_seq_length=max_sequence_length,
    shuffle=True
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xt-nlp-0.2.4.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

xt_nlp-0.2.4-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file xt-nlp-0.2.4.tar.gz.

File metadata

  • Download URL: xt-nlp-0.2.4.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt-nlp-0.2.4.tar.gz
Algorithm Hash digest
SHA256 58d832fc315bb1ed69ea331a74c1daae8a5232f3f5b4bbdbc9e3cf98f841a145
MD5 220323e80f1769d6c2beda80fcf920c1
BLAKE2b-256 c40829888c44c776179a63586aea4e617904da95e2c244d7d48f828735acc415

See more details on using hashes here.

File details

Details for the file xt_nlp-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: xt_nlp-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for xt_nlp-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c3599f157298f73df8eb86405dac6bbf6e431dd3b0284d777467c1745289aa7b
MD5 91fa12242cb22cdd88c5a5a19b6bb8af
BLAKE2b-256 aae675ece97e22687193b9a35939f5c05283288ca948b49a5d027c995637ab9f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page