A Keras-based and TensorFlow-backend language model toolkit.
Project description
LangML (Language ModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.
Outline
Features
- Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
- Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
- Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
- Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
- Prompt-Based Tuning: PTuning
Installation
You can install or upgrade langml/langml-cli via the following command:
pip install -U langml
Quick Start
Specify the Keras variant
- Use pure Keras (default setting)
export TF_KERAS=0
- Use TensorFlow Keras
export TF_KERAS=1
Load pretrained language models
from langml import WPTokenizer, SPTokenizer
from langml import load_bert, load_albert
# load bert / roberta plm
bert_model, bert = load_bert(config_path, checkpoint_path)
# load albert plm
albert_model, albert = load_albert(config_path, checkpoint_path)
# load wordpiece tokenizer
wp_tokenizer = WPTokenizer(vocab_path, lowercase)
# load sentencepiece tokenizer
sp_tokenizer = SPTokenizer(vocab_path, lowercase)
Finetune a model
from langml import keras, L
from langml import load_bert
config_path = '/path/to/bert_config.json'
ckpt_path = '/path/to/bert_model.ckpt'
vocab_path = '/path/to/vocab.txt'
bert_model, bert_instance = load_bert(config_path, ckpt_path)
# get CLS representation
cls_output = L.Lambda(lambda x: x[:, 0])(bert_model.output)
output = L.Dense(2, activation='softmax',
kernel_intializer=bert_instance.initializer)(cls_output)
train_model = keras.Model(bert_model.input, cls_output)
train_model.summary()
train_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizer.Adam(1e-5))
Use langml-cli to train baseline models
- Text Classification
$ langml-cli baseline clf --help
Usage: langml baseline clf [OPTIONS] COMMAND [ARGS]...
classification command line tools
Options:
--help Show this message and exit.
Commands:
bert
bilstm
textcnn
- Named Entity Recognition
$ langml-cli baseline ner --help
Usage: langml baseline ner [OPTIONS] COMMAND [ARGS]...
ner command line tools
Options:
--help Show this message and exit.
Commands:
bert-crf
lstm-crf
- Contrastive Learning
$ langml-cli baseline contrastive --help
Usage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...
contrastive learning command line tools
Options:
--help Show this message and exit.
Commands:
simcse
- Text Matching
$ langml-cli baseline matching --help
Usage: langml baseline matching [OPTIONS] COMMAND [ARGS]...
text matching command line tools
Options:
--help Show this message and exit.
Commands:
sbert
Documentation
Please visit the langml.readthedocs.io to check the latest documentation.
Reference
The implementation of pretrained language model is inspired by CyberZHG/keras-bert and bojone/bert4keras.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langml-0.4.2.tar.gz.
File metadata
- Download URL: langml-0.4.2.tar.gz
- Upload date:
- Size: 53.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edee03ff6bbd1cf27f9ac4218a606d94720dcaffb461226fbdde477a1b501e02
|
|
| MD5 |
05959481fa6344c95dc2bc214f8e6ef6
|
|
| BLAKE2b-256 |
f6c161020c2791b70c76d914cc1db819597af60b96b41eedaecbd4feb5894fc7
|
File details
Details for the file langml-0.4.2-py3-none-any.whl.
File metadata
- Download URL: langml-0.4.2-py3-none-any.whl
- Upload date:
- Size: 78.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
769d1b97d59a521efa1c7ad91145276cfac3ff8fbbf5730b3ea82c7ab2c13283
|
|
| MD5 |
c400774a50d79d29d43ae3cca5017f26
|
|
| BLAKE2b-256 |
ac27a5a37a02f14d297b5c98bed44023ceb7198083b8d1b40f31229e198ed51b
|