Skip to main content

Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT, GPT-2 and word2vec embedding.

Project description

Kashgari

GitHub Coverage Status PyPI

Overview | Performance | Quick start | Documentation | 中文文档 | Contributing

🎉🎉🎉 We are proud to announce that we entirely rewrote Kashgari with tf.keras, now Kashgari comes with easier to understand API and is faster! 🎉🎉🎉

Overview

Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.

  • Human-friendly. Kashgari's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
  • Powerful and simple. Kashgari allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
  • Built-in transfer learning. Kashgari built-in pre-trained BERT and Word2vec embedding models, which makes it very simple to transfer learning to train your model.
  • Fully scalable. Kashgari provides a simple, fast, and scalable environment for fast experimentation, train your models and experiment with new approaches using different embeddings and model structure.
  • Production Ready. Kashgari could export model with SavedModel format for tensorflow serving, you could directly deploy it on the cloud.

Our Goal

  • Academic users Easier experimentation to prove their hypothesis without coding from scratch.
  • NLP beginners Learn how to build an NLP project with production level code quality.
  • NLP developers Build a production level classification/labeling model within minutes.

Performance

Task Language Dataset Score Detail
Named Entity Recognition Chinese People's Daily Ner Corpus 94.46 (F1) Text Labeling Performance Report

Tutorials

Here is a set of quick tutorials to get you started with the library:

There are also articles and posts that illustrate how to use Kashgari:

Quick start

Requirements and Installation

🎉🎉🎉 We renamed again for consistency and clarity. From now on, it is all kashgari. 🎉🎉🎉

The project is based on Python 3.6+, because it is 2019 and type hinting is cool.

Backend pypi version desc
TensorFlow 2.x pip install 'kashgari>=2.0.0' coming soon
TensorFlow 1.14+ pip install 'kashgari>=1.0.0,<2.0.0' current version
Keras pip install 'kashgari<1.0.0' legacy version

Find more info about the name changing.

Example Usage

Let's run an NER labeling model with Bi_LSTM Model.

from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

model = BiLSTM_Model()
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

"""
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 97)                0
_________________________________________________________________
layer_embedding (Embedding)  (None, 97, 100)           320600
_________________________________________________________________
layer_blstm (Bidirectional)  (None, 97, 256)           235520
_________________________________________________________________
layer_dropout (Dropout)      (None, 97, 256)           0
_________________________________________________________________
layer_time_distributed (Time (None, 97, 8)             2056
_________________________________________________________________
activation_7 (Activation)    (None, 97, 8)             0
=================================================================
Total params: 558,176
Trainable params: 558,176
Non-trainable params: 0
_________________________________________________________________
Train on 20864 samples, validate on 2318 samples
Epoch 1/50
20864/20864 [==============================] - 9s 417us/sample - loss: 0.2508 - acc: 0.9333 - val_loss: 0.1240 - val_acc: 0.9607

"""

Run with GPT-2 Embedding

from kashgari.embeddings import GPT2Embedding
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiGRU_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

gpt2_embedding = GPT2Embedding('<path-to-gpt-model-folder>', sequence_length=30)
model = BiGRU_Model(gpt2_embedding)
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

Run with Bert Embedding

from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.labeling import BiGRU_Model
from kashgari.corpus import ChineseDailyNerCorpus

bert_embedding = BERTEmbedding('<bert-model-folder>', sequence_length=30)
model = BiGRU_Model(bert_embedding)

train_x, train_y = ChineseDailyNerCorpus.load_data()
model.fit(train_x, train_y)

Sponsors

Support this project by becoming a sponsor. Your issues and feature request will be prioritized.[Become a sponsor]

Contributing

Thanks for your interest in contributing! There are many ways to get involved; start with the contributor guidelines and then check these open issues for specific tasks.

Feel free to join the WeChat group if you want to more involved in Kashgari's development.

Reference

This library is inspired by and references following frameworks and papers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kashgari-tf-0.5.5.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

kashgari_tf-0.5.5-py3-none-any.whl (75.1 kB view details)

Uploaded Python 3

File details

Details for the file kashgari-tf-0.5.5.tar.gz.

File metadata

  • Download URL: kashgari-tf-0.5.5.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for kashgari-tf-0.5.5.tar.gz
Algorithm Hash digest
SHA256 c9a5be9d197fb74caa75d39ee337f923d5a94e126272ed27cfaa80891fba597b
MD5 94bcf3ded15ad2219f774001bbb6b892
BLAKE2b-256 36e65c2b7548ff5f56c3224af30f65a238321725df1e91517c6e3b17ef2c8b19

See more details on using hashes here.

File details

Details for the file kashgari_tf-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: kashgari_tf-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for kashgari_tf-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 92d6561dd9d6b743791c2aba6aac7e1ec4902ec7e2e09cbf562f3c9d0730a489
MD5 071ccdb11813d462d55c3ffdeebfc633
BLAKE2b-256 2d42804e5199f4c9c2a461ff05773f61abe2094f2d430dac258a3a49a4b2a32d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page