Skip to main content

A Keras-based and TensorFlow-backend language model toolkit.

Project description

LangML (Language ModeL) is a Keras-based and TensorFlow-backend language model toolkit, which provides mainstream pre-trained language models, e.g., BERT/RoBERTa/ALBERT, and their downstream application models.

pypi

Outline

Features

  • Common and widely-used Keras layers: CRF, Transformer, Attentions: Additive, ScaledDot, MultiHead, GatedAttentionUnit, and so on.
  • Pretrained Language Models: BERT, RoBERTa, ALBERT. Providing friendly designed interfaces and easy to implement downstream singleton, shared/unshared two-tower or multi-tower models.
  • Tokenizers: WPTokenizer (wordpiece), SPTokenizer (sentencepiece)
  • Baseline models: Text Classification, Named Entity Recognition, Contrastive Learning. It's no need to write any code, and just need to preprocess the data into a specific format and use the "langml-cli" to train various baseline models.
  • Prompt-Based Tuning: PTuning

Installation

You can install or upgrade langml/langml-cli via the following command:

pip install -U langml

Quick Start

Specify the Keras variant

  1. Use pure Keras (default setting)
export TF_KERAS=0
  1. Use TensorFlow Keras
export TF_KERAS=1

Load pretrained language models

from langml import WPTokenizer, SPTokenizer
from langml import load_bert, load_albert

# load bert / roberta plm
bert_model, bert = load_bert(config_path, checkpoint_path)
# load albert plm
albert_model, albert = load_albert(config_path, checkpoint_path)
# load wordpiece tokenizer
wp_tokenizer = WPTokenizer(vocab_path, lowercase)
# load sentencepiece tokenizer
sp_tokenizer = SPTokenizer(vocab_path, lowercase)

Finetune a model

from langml import keras, L
from langml import load_bert

config_path = '/path/to/bert_config.json'
ckpt_path = '/path/to/bert_model.ckpt'
vocab_path = '/path/to/vocab.txt'

bert_model, bert_instance = load_bert(config_path, ckpt_path)
# get CLS representation
cls_output = L.Lambda(lambda x: x[:, 0])(bert_model.output)
output = L.Dense(2, activation='softmax',
                 kernel_intializer=bert_instance.initializer)(cls_output)
train_model = keras.Model(bert_model.input, cls_output)
train_model.summary()
train_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizer.Adam(1e-5))

Use langml-cli to train baseline models

  1. Text Classification
$ langml-cli baseline clf --help
Usage: langml baseline clf [OPTIONS] COMMAND [ARGS]...

  classification command line tools

Options:
  --help  Show this message and exit.

Commands:
  bert
  bilstm
  textcnn
  1. Named Entity Recognition
$ langml-cli baseline ner --help
Usage: langml baseline ner [OPTIONS] COMMAND [ARGS]...

  ner command line tools

Options:
  --help  Show this message and exit.

Commands:
  bert-crf
  lstm-crf
  1. Contrastive Learning
$ langml-cli baseline contrastive --help
Usage: langml baseline contrastive [OPTIONS] COMMAND [ARGS]...

  contrastive learning command line tools

Options:
  --help  Show this message and exit.

Commands:
  simcse
  1. Text Matching
$ langml-cli baseline matching --help
Usage: langml baseline matching [OPTIONS] COMMAND [ARGS]...

  text matching command line tools

Options:
  --help  Show this message and exit.

Commands:
  sbert

Documentation

Please visit the langml.readthedocs.io to check the latest documentation.

Reference

The implementation of pretrained language model is inspired by CyberZHG/keras-bert and bojone/bert4keras.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langml-0.4.2.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

langml-0.4.2-py3-none-any.whl (78.4 kB view details)

Uploaded Python 3

File details

Details for the file langml-0.4.2.tar.gz.

File metadata

  • Download URL: langml-0.4.2.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.3

File hashes

Hashes for langml-0.4.2.tar.gz
Algorithm Hash digest
SHA256 edee03ff6bbd1cf27f9ac4218a606d94720dcaffb461226fbdde477a1b501e02
MD5 05959481fa6344c95dc2bc214f8e6ef6
BLAKE2b-256 f6c161020c2791b70c76d914cc1db819597af60b96b41eedaecbd4feb5894fc7

See more details on using hashes here.

File details

Details for the file langml-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: langml-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 78.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.3

File hashes

Hashes for langml-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 769d1b97d59a521efa1c7ad91145276cfac3ff8fbbf5730b3ea82c7ab2c13283
MD5 c400774a50d79d29d43ae3cca5017f26
BLAKE2b-256 ac27a5a37a02f14d297b5c98bed44023ceb7198083b8d1b40f31229e198ed51b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page