Skip to main content

An Embedded CKIP Rasa NLU Components

Project description

rukip

An Embedded CKIP Rasa NLU Components

Introduction

This open-source library implements Rasa custom components.

It offerstokenizer and featurizer powered by ckiptagger as components in RasaNLU pipeline.

Installation

pip install rukip

rukip is a python library hosted on PyPI. Requirements:

  • python >= 3.6
  • rasa >= 1.4.0
  • ckiptagger[tf] >= 0.0.19

Usage

Download model files

The model files are available on several mirror sites.

You can download and extract to the desired path by the following steps:

  1. Downloads to ./data.zip
  2. Extracts to ./data/

Add CKIPTokenizer component into rasa nlu pipeline and configure model_path as ./data.

The following is the example Rasa NLU config file

 language: "zh"
 pipeline:
   - name: "rukip.tokenizer.CKIPTokenizer"
     model_path: "./data"
   - name: "rukip.featurizer.CKIPFeaturizer"
     model_path: "./data"
     token_features: ["word", "pos"]
   - name: "CRFEntityExtractor"
     features: [["ner_features"], ["ner_features"], ["ner_features"]]
   - name: "CountVectorsFeaturizer"
   - name: "EmbeddingIntentClassifier"

Components

CKIPTokenizer

This component has one required field (model_path) to be configured and offers two optional fields for user to assign dictionaries.

  • recommend_dict_path is the file containing list of user-defined recommended-word. Default is None
  • cooerce_dict_path is the file containing a list of must-word. Default is None

The following is the example of user-defined dictionary. Each line shows one pair of word and weight.

土地公 1
土地婆 1
公有 2
來亂的 1
緯來體育台 1

CKIPFeaturizer

This component has one required field (model_path) to be configured and offers another optional fields for user to assign.

  • token_features is list of features extracted from ckiptagger to generate ner_features. Default is ["word", "pos"]

Development

$> git clone git@github.com:circlelychen/rukip.git
$> pip install -r requirements-to-freeze.dev.txt
$> make test

License

licensed under the GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rukip-0.0.6.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rukip-0.0.6-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file rukip-0.0.6.tar.gz.

File metadata

  • Download URL: rukip-0.0.6.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for rukip-0.0.6.tar.gz
Algorithm Hash digest
SHA256 47587b060d2131467a37b9f16ff90c6a2d625b32b28f5486860be354583e2327
MD5 fca1c0825bfa894341a17d191cf92729
BLAKE2b-256 8c1a37d7b86201e7462be4a97e347decba4cb29cc1584ada511147cc5677931a

See more details on using hashes here.

File details

Details for the file rukip-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: rukip-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for rukip-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1ba74e90c1f7a94c7c04cae799a04f93ad33b629b78857742348887ebdc7dc3d
MD5 b255dab0d666c72f0a59008b23688137
BLAKE2b-256 8b969bf8fe84ab040e39564f0a169f62d4457a9bc352ba7003a6568a348c72ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page