Skip to main content

Transformers kit - NLP library for different downstream tasks, built on huggingface project

Project description

🤖 TFKit - Transformer Kit 🤗

NLP library for different downstream tasks, built on huggingface 🤗 project,
for developing wide variety of nlp tasks.

Read this in other languages: 正體中文(施工中👷).

Feature

  • support Bert/GPT/GPT2/XLM/XLNet/RoBERTa/CTRL/ALBert
  • modularize data loading
  • easy to modify
  • special loss function for handling different cases: FocalLoss/ FocalBCELoss/ NegativeCrossEntropyLoss/ SmoothCrossEntropyLoss
  • eval on different benchmark - EM / F1 / BLEU / METEOR / ROUGE / CIDEr / Classification Report / ...
  • multi-class multi-task multi-label classifier
  • word/sentence level text generation
  • support beamsarch on decoding
  • token tagging

Package Overview

tfkit NLP library for different downstream tasks, built on huggingface project
tfkit.classifier multi-class multi-task multi-label classifier
tfkit.gen_once text generation in one time built on masklm model
tfkit.gen_onebyone text generation in one word by one word built on masklm model
tfkit.tag token tagging model
tfkit.train.py Run training
tfkit.eval.py Run evaluation

Installation

TFKit requires Python 3.6 or later.

Installing via pip

pip install tfkit

Running TFKit

Once you've installed TFKit, you can run train.py for training or eval.py for evaluation.

$ tfkit-train
Run training

arguments:
  --train       training data path       
  --valid       validation data path       
  --maxlen      maximum text length       
  --model       type of model         ['once', 'onebyone', 'classify', 'tagRow', 'tagCol']
  --config      pre-train model       bert-base-multilingual-cased... etc (you can find one on https://huggingface.co/models)

optional arguments:
  -h, --help    show this help message and exit
  --resume      resume from previous training
  --savedir     dir for model saving
  --worker      number of worker
  --batch       batch size
  --lr          learning rate
  --epoch       epoch rate
  --tensorboard enable tensorboard
  --cache       enable data caching
$ tfkit-eval
Run evaluation on different benchmark
arguments:
  --model       model for evaluate       
  --valid       validation data path        
  --metric      metric for evaluate         ['em', 'nlg', 'classification']
  --config      pre-train model             bert-base-multilingual-cased

optional arguments:
  -h, --help    show this help message and exit
  --batch       batch size
  --topk        select top k result in classification task 
  --outprint    enable printing result in console
  --beamsearch  enable beamsearch for text generation task

Dataset format

once

csv file with 2 row - input, target
each token separate by space
no header needed
Example:

"i go to school by bus","我 坐 巴 士 上 學"

onebyone

csv file with 2 row - input, target
each token separate by space
no header needed
Example:

"i go to school by bus","我 坐 巴 士 上 學"

classify

csv file with header
header - input,task1,task2...taskN
if some task have multiple label, use / to separate each label - label1/label2/label3
Example:

SENTENCE,LABEL,Task2
"The prospective ultrasound findings were correlated with the final diagnoses , laparotomy findings , and pathology findings .",outcome/other,1

tagRow

csv file with 2 row - input, target
each token separate by space
no header needed
Example:

"在 歐 洲 , 梵 語 的 學 術 研 究 , 由 德 國 學 者 陸 特 和 漢 斯 雷 頓 開 創 。 後 來 威 廉 · 瓊 斯 發 現 印 歐 語 系 , 也 要 歸 功 於 對 梵 語 的 研 究 。 此 外 , 梵 語 研 究 , 也 對 西 方 文 字 學 及 歷 史 語 言 學 的 發 展 , 貢 獻 不 少 。 1 7 8 6 年 2 月 2 日 , 亞 洲 協 會 在 加 爾 各 答 舉 行 。 [SEP] 陸 特 和 漢 斯 雷 頓 開 創 了 哪 一 地 區 對 梵 語 的 學 術 研 究 ?",O A A O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

tagCol

csv file with 2 row - input, target
each token separate by space
no header needed
Example:

別 O
只 O
能 R
想 O
自 O
己 O
, O
想 M
你 M
周 O
圍 O
的 O
人 O
。 O

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfkit-0.1.15.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

tfkit-0.1.15-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file tfkit-0.1.15.tar.gz.

File metadata

  • Download URL: tfkit-0.1.15.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for tfkit-0.1.15.tar.gz
Algorithm Hash digest
SHA256 90d85a5238f24e35e00da22094e396cd1f8d42e66523871d5a94d363bb0a0589
MD5 29de57c53bd292ecba2f6986d60c534d
BLAKE2b-256 d6ddd621b43fb57620a9dab0c01e6b764ccc67dbbdc74fca3a1d3c0a358f2f5d

See more details on using hashes here.

File details

Details for the file tfkit-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: tfkit-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for tfkit-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 5d2614bd62baac130dac60d837e0ad62549669a9b6faf0c738979a083b1ecaf1
MD5 8d2fb5630717a7ff009bfb7378f99b5c
BLAKE2b-256 8de64184a2addaf597c268e3c1167c20873eb446b98047500ff63858763c94fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page