simple and powerful state-of-the-art NLP framework with pre-trained word2vec and bert embedding.

These details have not been verified by PyPI

Project links

Homepage

Project description

Kashgari

Contributions welcome

Simple and powerful NLP framework, build your own state-of-art model in 5 minutes.

Kashgare is:

Human-friendly framework. Kashgare's code is very simple, well documented and tested, which makes it very easy to understand and modify.
Powerful and simple NLP library. Kashgare allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
A Keras NLP framework. Kashgare builds directly on Keras, making it easy to train your own models and experiment with new approaches using different embeddings and model structure.

Feature List

Embedding support
- Classic word2vec embedding
- BERT embedding
Text Classification Models
- CNN Classification Model
- CNN LSTM Classification Model
- Bidirectional LSTM Classification Model
Text Labeling Models (NER, PoS)
- Bidirectional LSTM Labeling Model
- Bidirectional LSTM CRF Labeling Model
- CNN LSTM Labeling Model
Model Training
Model Evaluate
GPU Support
Customize Model

Roadmap

ELMo Embedding
Pre-trained models
More model structure

Tutorials

Quick start

Requirements and Installation

The project is based on Keras 2.2.0+ and Python 3.6+, because it is 2019 and type hints is cool.

pip install kashgari
# CPU
pip install tensorflow
# GPU
pip install tensorflow-gpu

Example Usage

lets run a text classification with CNN model over SMP 2017 ECDT Task1.

>>> from kashgari.corpus import SMP2017ECDTClassificationCorpus
>>> from kashgari.tasks.classification import CNNLSTMModel

>>> x_data, y_data = SMP2017ECDTClassificationCorpus.get_classification_data()
>>> x_data[0]
['你', '知', '道', '我', '几', '岁']
>>> y_data[0]
'chat'

# provided classification models `CNNModel`, `BLSTMModel`, `CNNLSTMModel` 
>>> classifier = CNNLSTMModel()
>>> classifier.fit(x_data, y_data)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 10, 100)           87500     
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 10, 32)            9632      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 5, 32)             0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                3232      
=================================================================
Total params: 153,564
Trainable params: 153,564
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
 1/35 [..............................] - ETA: 32s - loss: 3.4652 - acc: 0.0469

... 

>>> x_test, y_test = SMP2017ECDTClassificationCorpus.get_classification_data('test')
>>> classifier.evaluate(x_test, y_test)
              precision    recall  f1-score   support

        calc       0.75      0.75      0.75         8
        chat       0.83      0.86      0.85       154
    contacts       0.54      0.70      0.61        10
    cookbook       0.97      0.94      0.95        89
    datetime       0.67      0.67      0.67         6
       email       1.00      0.88      0.93         8
         epg       0.61      0.56      0.58        36
      flight       1.00      0.90      0.95        21
...

Run with Bert Embedding

from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.classification import CNNLSTMModel
from kashgari.corpus import SMP2017ECDTClassificationCorpus

bert_embedding = BERTEmbedding('bert-base-chinese', sequence_length=30)                                   
model = CNNLSTMModel(bert_embedding)

train_x, train_y = SMP2017ECDTClassificationCorpus.get_classification_data()
model.fit(train_x, train_y)

Run with Word2vec embedded

from kashgari.embeddings import WordEmbeddings
from kashgari.tasks.classification import CNNLSTMModel
from kashgari.corpus import SMP2017ECDTClassificationCorpus

bert_embedding = WordEmbeddings('sgns.weibo.bigram', sequence_length=30)                                  
model = CNNLSTMModel(bert_embedding)
train_x, train_y = SMP2017ECDTClassificationCorpus.get_classification_data()
model.fit(train_x, train_y)

Reference

This library is inspired and reference following framework and papers.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.2

Jul 4, 2021

2.0.1

Oct 30, 2020

2.0.0

Sep 10, 2020

2.0.0a2 pre-release

Sep 3, 2020

1.1.5

Apr 25, 2020

1.1.4

Mar 30, 2020

1.1.3

Mar 29, 2020

1.1.2

Mar 27, 2020

1.1.1

Mar 13, 2020

1.1.0

Dec 27, 2019

1.0.0

Oct 18, 2019

0.2.6

Jul 12, 2019

0.2.4

Jun 6, 2019

0.2.1

Mar 5, 2019

0.2.0

Mar 5, 2019

0.1.9

Feb 28, 2019

0.1.8

Feb 22, 2019

0.1.7

Feb 22, 2019

0.1.6

Feb 4, 2019

0.1.5

Jan 31, 2019

0.1.4

Jan 28, 2019

0.1.3

Jan 28, 2019

This version

0.1.2

Jan 27, 2019

0.1.1

Jan 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kashgari-0.1.2.tar.gz (28.1 kB view details)

Uploaded Jan 27, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kashgari-0.1.2-py3-none-any.whl (40.8 kB view details)

Uploaded Jan 27, 2019 Python 3

File details

Details for the file kashgari-0.1.2.tar.gz.

File metadata

Download URL: kashgari-0.1.2.tar.gz
Upload date: Jan 27, 2019
Size: 28.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.6.5

File hashes

Hashes for kashgari-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`43b741b02b2f651fe2c2379f6a94404b1fcc9668659547222567958ac2e41130`
MD5	`f4ad1b8926aee14be3afe0861f0e3079`
BLAKE2b-256	`4e7e83ef0b941abc1448094ddcd230c257fe2511bea4e02ce61c4bb6369414a7`

See more details on using hashes here.

File details

Details for the file kashgari-0.1.2-py3-none-any.whl.

File metadata

Download URL: kashgari-0.1.2-py3-none-any.whl
Upload date: Jan 27, 2019
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.6.5

File hashes

Hashes for kashgari-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4b60819e9ec25b13088aa901171f088b598bfd39e5f5cb4e0d2100a0be53ca1`
MD5	`ae00530ec7529c3f3df276ad29ffa307`
BLAKE2b-256	`3fb0f15d5af90ba64ebd490efdf98a0c6ad2a60adba5f7533642fbf0ef6973c6`

See more details on using hashes here.

kashgari 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kashgari

Feature List

Roadmap

Tutorials

Quick start

Requirements and Installation

Example Usage

Run with Bert Embedding

Run with Word2vec embedded

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes