Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT, GPT-2 and word2vec embedding.
Project description
Kashgari
Overview | Performance | Installation | Documentation | Contributing
🎉🎉🎉 We released the 2.0.0 version with TF2 Support. 🎉🎉🎉
If you use this project for your research, please cite:
@misc{Kashgari
author = {Eliyar Eziz},
title = {Kashgari},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/BrikerMan/Kashgari}}
}
Overview
Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
- Human-friendly. Kashgari's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
- Powerful and simple. Kashgari allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
- Built-in transfer learning. Kashgari built-in pre-trained BERT and Word2vec embedding models, which makes it very simple to transfer learning to train your model.
- Fully scalable. Kashgari provides a simple, fast, and scalable environment for fast experimentation, train your models and experiment with new approaches using different embeddings and model structure.
- Production Ready. Kashgari could export model with
SavedModel
format for tensorflow serving, you could directly deploy it on the cloud.
Our Goal
- Academic users Easier experimentation to prove their hypothesis without coding from scratch.
- NLP beginners Learn how to build an NLP project with production level code quality.
- NLP developers Build a production level classification/labeling model within minutes.
Performance
Welcome to add performance report.
Task | Language | Dataset | Score |
---|---|---|---|
Named Entity Recognition | Chinese | People's Daily Ner Corpus | 95.57 |
Text Classification | Chinese | SMP2018ECDTCorpus | 94.57 |
Installation
The project is based on Python 3.6+, because it is 2019 and type hinting is cool.
Backend | kashgari version | desc |
---|---|---|
TensorFlow 2.2+ | pip install 'kashgari>=2.0.2' |
TF2.10+ with tf.keras |
TensorFlow 1.14+ | pip install 'kashgari>=1.0.0,<2.0.0' |
TF1.14+ with tf.keras |
Keras | pip install 'kashgari<1.0.0' |
keras version |
You also need to install tensorflow_addons
with TensorFlow.
TensorFlow Version | tensorflow_addons version |
---|---|
TensorFlow 2.1 | pip install tensorflow_addons==0.9.1 |
TensorFlow 2.2 | pip install tensorflow_addons==0.11.2 |
TensorFlow 2.3, 2.4, 2.5 | pip install tensorflow_addons==0.13.0 |
Tutorials
Here is a set of quick tutorials to get you started with the library:
- Tutorial 1: Text Classification
- Tutorial 2: Text Labeling
- Tutorial 3: Seq2Seq
- Tutorial 4: Language Embedding
There are also articles and posts that illustrate how to use Kashgari:
- 基于 Kashgari 2 的短文本分类: 数据分析和预处理
- 基于 Kashgari 2 的短文本分类: 训练模型和调优
- 基于 Kashgari 2 的短文本分类: 模型部署
- 15 分钟搭建中文文本分类模型
- 基于 BERT 的中文命名实体识别(NER)
- BERT/ERNIE 文本分类和部署
- 五分钟搭建一个基于BERT的NER模型
- Multi-Class Text Classification with Kashgari in 15 minutes
Examples:
Contributors ✨
Thanks goes to these wonderful people. And there are many ways to get involved. Start with the contributor guidelines and then check these open issues for specific tasks.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kashgari-2.0.2.tar.gz
.
File metadata
- Download URL: kashgari-2.0.2.tar.gz
- Upload date:
- Size: 53.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d67bc6852b63d8fe6722c25d77107042874b83b35c1c8adbf34196effa2ebc1d |
|
MD5 | 45090f7d0119a87c5783a14c068be176 |
|
BLAKE2b-256 | 93af7aff2d842f86527e293fe0671a2cb06a68109740d36275a86e8f8b845476 |
File details
Details for the file kashgari-2.0.2-py3-none-any.whl
.
File metadata
- Download URL: kashgari-2.0.2-py3-none-any.whl
- Upload date:
- Size: 89.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 528999057f3d1d02490643e446a6a4982cd0046171ce6122921a23e58aa381b0 |
|
MD5 | 480fba36c21653a4a88c90bacb0b6adf |
|
BLAKE2b-256 | a06ee123cf5a883dabbec608192bdf52a3d4ba6e0796de5cf47b5fd57cc0e49c |