Skip to main content

Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT, GPT-2 and word2vec embedding.

Project description

Kashgari

GitHub Coverage Status PyPI

Overview | Performance | Quick start | Documentation | Contributing

Overview

Kashgare is simple and powerful NLP framework, build your state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks.

  • Human-friendly. Kashgare's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
  • Powerful and simple. Kashgare allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
  • Keras based. Kashgare builds directly on Keras, making it easy to train your models and experiment with new approaches using different embeddings and model structure.
  • Buildin transfer learning. Kashgare build-in pre-trained BERT and Word2vec embedding models, which makes it very simple to transfer learning to train your model.
  • Fully scalable. Kashgare provide a simple, fast, and scalable environment for fast experimentation.

Performance

Task Language Dataset Score Detail
Named Entity Recognition Chinese People's Daily Ner Corpus 92.20 (F1) 基于 BERT 的中文命名实体识别

Tutorials

Here is a set of quick tutorials to get you started with the library:

There are also articles and posts that illustrate how to use Kashgari:

Quick start

Requirements and Installation

The project is based on TenorFlow 1.14.0 and Python 3.6+, because it is 2019 and type hints is cool.

pip install kashgari-tf
# CPU
pip install tensorflow==1.14.0
# GPU
pip install tensorflow-gpu==1.14.0

Example Usage

lets run a NER labeling model with Bi_LSTM Model.

from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

model = BiLSTM_Model()
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

"""
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 97)                0
_________________________________________________________________
layer_embedding (Embedding)  (None, 97, 100)           320600
_________________________________________________________________
layer_blstm (Bidirectional)  (None, 97, 256)           235520
_________________________________________________________________
layer_dropout (Dropout)      (None, 97, 256)           0
_________________________________________________________________
layer_time_distributed (Time (None, 97, 8)             2056
_________________________________________________________________
activation_7 (Activation)    (None, 97, 8)             0
=================================================================
Total params: 558,176
Trainable params: 558,176
Non-trainable params: 0
_________________________________________________________________
Train on 20864 samples, validate on 2318 samples
Epoch 1/50
20864/20864 [==============================] - 9s 417us/sample - loss: 0.2508 - acc: 0.9333 - val_loss: 0.1240 - val_acc: 0.9607

"""

Run with GPT-2 Embedding

from kashgari.embeddings import GPT2Embedding
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiGRU_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

gpt2_embedding = GPT2Embedding('<path-to-gpt-model-folder>', sequence_length=30)
model = BiGRU_Model(gpt2_embedding)
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

Run with Bert Embedding

from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.labeling import BiGRU_Model
from kashgari.corpus import ChineseDailyNerCorpus

bert_embedding = BERTEmbedding('<bert-model-folder>', sequence_length=30)
model = BiGRU_Model(bert_embedding)

train_x, train_y = ChineseDailyNerCorpus.load_data()
model.fit(train_x, train_y)

Run with Word2vec Embedding

from kashgari.embeddings import WordEmbedding
from kashgari.tasks.labeling import BiLSTM_CRF_Model
from kashgari.corpus import ChineseDailyNerCorpus

bert_embedding = WordEmbedding('<Gensim embedding file>', sequence_length=30)
model = BiLSTM_CRF_Model(bert_embedding)
train_x, train_y = ChineseDailyNerCorpus.load_data()
model.fit(train_x, train_y)

Support for Training on Multiple GPUs

from kashgari.tasks.labeling import BiGRU_Model
from kashgari.corpus import ChineseDailyNerCorpus

model = BiGRU_Model()
train_x, train_y = ChineseDailyNerCorpus.load_data()
model.build_multi_gpu_model(gpus=2, 
                            cpu_merge=False, 
                            cpu_relocation=False,
                            x_train=train_x, 
                            y_train=train_y)

model.fit(train_x, train_y)

Contributing

Thanks for your interest in contributing! There are many ways to get involved; start with the contributor guidelines and then check these open issues for specific tasks.

Reference

This library is inspired by and references following frameworks and papers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kashgari-tf-0.5.0a1.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

kashgari_tf-0.5.0a1-py3-none-any.whl (69.3 kB view details)

Uploaded Python 3

File details

Details for the file kashgari-tf-0.5.0a1.tar.gz.

File metadata

  • Download URL: kashgari-tf-0.5.0a1.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for kashgari-tf-0.5.0a1.tar.gz
Algorithm Hash digest
SHA256 d8690cb0b82bd683ff7b7ae3122ba2e95b445678eaaa5fedafa0b70d6ca493ad
MD5 2aa9a8fcb4afd4a09a01c18e0ef55ea6
BLAKE2b-256 e604b09bf7bf3ce0368e8dff9c4bec388d40baf83b6a54138836662100be6bc4

See more details on using hashes here.

Provenance

File details

Details for the file kashgari_tf-0.5.0a1-py3-none-any.whl.

File metadata

  • Download URL: kashgari_tf-0.5.0a1-py3-none-any.whl
  • Upload date:
  • Size: 69.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for kashgari_tf-0.5.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 98dd12579512f334c0158b160d97072806b5defb35cf91e53d043b75680fa3ad
MD5 c8aac84658ff411b463c3606ae30c89a
BLAKE2b-256 0b8a329b02772caaff9fb266428f323ece0dbd883428c3b2b6e3547ee94c61d6

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page