Skip to main content

Transformers kit - Multi-task QA/Tagging/Multi-label Multi-Class Classification/Generation with BERT/ALBERT/T5/BERT

Project description




PyPI Download Build Last Commit CodeFactor Visitor

TFKit lets everyone make use of transformer architecture on many tasks and models in small change of config.
At the same time, it can do multi-task multi-model learning, and can introduce its own data sets and tasks through simple modifications.

Feature

  • One-click replacement of different pre-trained models
  • Support multi-model and multi-task
  • Classifier with multiple labels and multiple classifications
  • Unify input formats for different tasks
  • Separation of data reading and model architecture
  • Support various loss function and indicators

Supplement

  • Model list: Support Bert/GPT/GPT2/XLM/XLNet/RoBERTa/CTRL/ALBert/...
  • NLPrep: download and preprocessing data in one line
  • nlp2go: create demo api as quickly as possible.

Documentation

Learn more from the docs.

Quick Start

Installing via pip

pip install tfkit

Running TFKit to train a ner model

install nlprep and nlp2go

pip install nlprep  nlp2go -U

download dataset using nlprep

nlprep --dataset tag_clner  --outdir ./clner_row --util s2t

train model with albert

tfkit-train --batch 20 \
--epoch 5 \
--lr 5e-5 \
--train ./clner_row/clner-train.csv \
--test ./clner_row/clner-test.csv \
--maxlen 512 \
--model tagRow \
--savedir ./albert_ner \
--config voidful/albert_chinese_small

eval model

tfkit-eval --model ./albert_ner/3.pt --valid ./clner_row/validation.csv --metric clas

result

Task : default report 
TASK:  default 0
                precision    recall  f1-score   support

    B_Abstract       0.00      0.00      0.00         1
    B_Location       1.00      1.00      1.00         1
      B_Metric       1.00      1.00      1.00         1
B_Organization       0.00      0.00      0.00         1
      B_Person       1.00      1.00      1.00         1
    B_Physical       0.00      0.00      0.00         1
       B_Thing       1.00      1.00      1.00         1
        B_Time       1.00      1.00      1.00         1
    I_Abstract       1.00      1.00      1.00         1
    I_Location       1.00      1.00      1.00         1
      I_Metric       1.00      1.00      1.00         1
I_Organization       0.00      0.00      0.00         1
      I_Person       1.00      1.00      1.00         1
    I_Physical       0.00      0.00      0.00         1
       I_Thing       1.00      1.00      1.00         1
        I_Time       1.00      1.00      1.00         1
             O       1.00      1.00      1.00         1

     micro avg       1.00      0.71      0.83        17
     macro avg       0.71      0.71      0.71        17
  weighted avg       0.71      0.71      0.71        17
   samples avg       1.00      0.71      0.83        17

host prediction service

nlp2go --model ./albert_ner/3.pt --api_path ner

You can also try tfkit in Google Colab: Google Colab

Overview

Train

$ tfkit-train
Run training

arguments:
  --train TRAIN [TRAIN ...]     train dataset path
  --test TEST [TEST ...]        test dataset path
  --config CONFIG               distilbert-base-multilingual-cased/bert-base-multilingual-cased/voidful/albert_chinese_small
  --model {once,twice,onebyone,clas,tagRow,tagCol,qa,onebyone-neg,onebyone-pos,onebyone-both} [{once,twice,onebyone,clas,tagRow,tagCol,qa,onebyone-neg,onebyone-pos,onebyone-both} ...]
                                model task
  --savedir SAVEDIR     model saving dir, default /checkpoints
optional arguments:
  -h, --help            show this help message and exit
  --batch BATCH         batch size, default 20
  --lr LR [LR ...]      learning rate, default 5e-5
  --epoch EPOCH         epoch, default 10
  --maxlen MAXLEN       max tokenized sequence length, default 368
  --lossdrop            loss dropping for text generation
  --tag TAG [TAG ...]   tag to identity task in multi-task
  --seed SEED           random seed, default 609
  --worker WORKER       number of worker on pre-processing, default 8
  --grad_accum          gradient accumulation, default 1
  --tensorboard         Turn on tensorboard graphing
  --resume RESUME       resume training
  --cache               cache training data

Eval

$ tfkit-eval
Run evaluation on different benchmark
arguments:
  --model MODEL             model path
  --metric {emf1,nlg,clas}  evaluate metric
  --valid VALID             evaluate data path

optional arguments:
  -h, --help            show this help message and exit
  --print               print each pair of evaluate data
  --enable_arg_panel    enable panel to input argument

Contributing

Thanks for your interest.There are many ways to contribute to this project. Get started here.

License PyPI - License

Icons reference

Icons modify from Freepik from www.flaticon.com
Icons modify from Nikita Golubev from www.flaticon.com

Project details


Release history Release notifications | RSS feed

This version

0.7.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfkit-0.7.1.tar.gz (229.0 kB view details)

Uploaded Source

Built Distributions

tfkit-0.7.1-py3.7.egg (179.8 kB view details)

Uploaded Source

tfkit-0.7.1-py3-none-any.whl (80.6 kB view details)

Uploaded Python 3

File details

Details for the file tfkit-0.7.1.tar.gz.

File metadata

  • Download URL: tfkit-0.7.1.tar.gz
  • Upload date:
  • Size: 229.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for tfkit-0.7.1.tar.gz
Algorithm Hash digest
SHA256 3e70f134a0b6cb2192daa863f8215c01e49e88acad7ef93a880fb68ed68abc71
MD5 61bb999c8d8af9a752f9f07c0abedaf7
BLAKE2b-256 15d2f920db7c95c55ef8b60050e5df897604cde2e2ee0d32dd40e39fdedf3622

See more details on using hashes here.

File details

Details for the file tfkit-0.7.1-py3.7.egg.

File metadata

  • Download URL: tfkit-0.7.1-py3.7.egg
  • Upload date:
  • Size: 179.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for tfkit-0.7.1-py3.7.egg
Algorithm Hash digest
SHA256 f76fd0e5bf5a40c3a931976fe8cd89b5c7ab44345c5e0e9c6feb486ffacc0552
MD5 aefb4f0a229e4534d47656955e21fd98
BLAKE2b-256 5a06f2543e4b1f5482b81de7014133d7166a58136f9424b6d8a7344ea905d6a0

See more details on using hashes here.

File details

Details for the file tfkit-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: tfkit-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 80.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for tfkit-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e904bcd5d328b265f9ffe430787c042eb196ca9c35eaaa7f20afec7bc5e41a14
MD5 79d8e7c3b4d12a36256ccef58e38170a
BLAKE2b-256 f2b03b0bf05606aca170cf9eb586e9ba96489d4c37c69566e31d30967bf19466

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page