Transformers kit - NLP library for different downstream tasks, built on huggingface project
Project description
🤖 TFKit - Transformer Kit 🤗
NLP library for different downstream tasks, built on huggingface 🤗 project,
for developing wide variety of nlp tasks.
Feature
- support Bert/GPT/GPT2/XLM/XLNet/RoBERTa/CTRL/ALBert
- modularize data loading
- easy to modify
- special loss function for handling different cases: FocalLoss/ FocalBCELoss/ NegativeCrossEntropyLoss/ SmoothCrossEntropyLoss
- eval on different benchmark - EM / F1 / BLEU / METEOR / ROUGE / CIDEr / Classification Report / ...
- multi-class multi-task multi-label classifier
- word/sentence level text generation
- support beamsarch on decoding
- token tagging
Package Overview
tfkit | NLP library for different downstream tasks, built on huggingface project |
tfkit.classifier | multi-class multi-task multi-label classifier |
tfkit.gen_once | text generation in one time built on masklm model |
tfkit.gen_onebyone | text generation in one word by one word built on masklm model |
tfkit.tag | token tagging model |
tfkit.train.py | Run training |
tfkit.eval.py | Run evaluation |
Installation
TFKit requires Python 3.6 or later.
Installing via pip
pip install tfkit
Running TFKit
Once you've installed TFKit, you can run train.py for training or eval.py for evaluation.
$ tfkit-train
Run training
arguments:
--train training data path
--valid validation data path
--maxlen maximum text length
--model type of model ['once', 'onebyone', 'classify', 'tagRow', 'tagCol']
--config pre-train model bert-base-multilingual-cased
optional arguments:
-h, --help show this help message and exit
--resume resume from previous training
--savedir dir for model saving
--worker number of worker
--batch batch size
--lr learning rate
--epoch epoch rate
--tensorboard enable tensorboard
--cache enable data caching
$ tfkit-eval
Run evaluation on different benchmark
arguments:
--model model for evaluate
--valid validation data path
--metric metric for evaluate ['em', 'nlg', 'classification']
--config pre-train model bert-base-multilingual-cased
optional arguments:
-h, --help show this help message and exit
--batch batch size
--topk select top k result in classification task
--outprint enable printing result in console
--beamsearch enable beamsearch for text generation task
Dataset format
once
csv file with 2 row - input, target
each token separate by space
no header needed
Example:
"i go to school by bus","我 坐 巴 士 上 學"
onebyone
csv file with 2 row - input, target
each token separate by space
no header needed
Example:
"i go to school by bus","我 坐 巴 士 上 學"
classify
csv file with header
header - input,task1,task2...taskN
if some task have multiple label, use / to separate each label - label1/label2/label3
Example:
SENTENCE,LABEL,Task2
"The prospective ultrasound findings were correlated with the final diagnoses , laparotomy findings , and pathology findings .",outcome/other,1
tagRow
csv file with 2 row - input, target
each token separate by space
no header needed
Example:
"在 歐 洲 , 梵 語 的 學 術 研 究 , 由 德 國 學 者 陸 特 和 漢 斯 雷 頓 開 創 。 後 來 威 廉 · 瓊 斯 發 現 印 歐 語 系 , 也 要 歸 功 於 對 梵 語 的 研 究 。 此 外 , 梵 語 研 究 , 也 對 西 方 文 字 學 及 歷 史 語 言 學 的 發 展 , 貢 獻 不 少 。 1 7 8 6 年 2 月 2 日 , 亞 洲 協 會 在 加 爾 各 答 舉 行 。 [SEP] 陸 特 和 漢 斯 雷 頓 開 創 了 哪 一 地 區 對 梵 語 的 學 術 研 究 ?",O A A O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
tagCol
csv file with 2 row - input, target
each token separate by space
no header needed
Example:
別 O
只 O
能 R
想 O
自 O
己 O
, O
想 M
你 M
周 O
圍 O
的 O
人 O
。 O
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tfkit-0.0.7.tar.gz
.
File metadata
- Download URL: tfkit-0.0.7.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b1b31ea05914f9bdf7eb1dad57a0af70b8cc4b6c95d4d2524edfb730f198ea4 |
|
MD5 | 45e255df17f52a4e9c2d2042c1e7e272 |
|
BLAKE2b-256 | dfb462dd4d53a1ea18cef090894828e9b54f0f1c924b61c07a76d8f011389871 |
File details
Details for the file tfkit-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: tfkit-0.0.7-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2841a483d211174f6dd5ef30109ca368d85e800b10656f3a35d6cf58ae2fe14e |
|
MD5 | 6854735f444cd6a7aec9c0859b20edab |
|
BLAKE2b-256 | a054a42c4143bcabf13906d74908cade444865b8b6bfe261b762d41740e7dc09 |