rasa-nlu-gao

Rasa NLU a natural language parser for bots

These details have not been verified by PyPI

Project links

Project description

# Rasa NLU GQ
Rasa NLU (Natural Language Understanding) 是一个自然语义理解的工具，举个官网的例子如下：

> *"I'm looking for a Mexican restaurant in the center of town"*

And returning structured data like:

```
intent: search_restaurant
entities:
- cuisine : Mexican
- location : center
```

## Intent of this project
这个项目的目的和初衷，是由于官方的rasa nlu里面提供的components和models并不能满足实际需求。所以我自定义了一些components，并发布到Pypi上。可以通过`pip install rasa-nlu-gao`下载。后续会不断往里面填充和优化组件，也欢迎大家贡献。

## New features
目前新增的特性如下（请下载最新的rasa-nlu-gao版本）：
- 新增了实体识别的模型，一个是bilstm+crf，一个是idcnn+crf膨胀卷积模型，对应的yml文件配置如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "intent_featurizer_count_vectors"
token_pattern: "(?u)\b\w+\b"
- name: "intent_classifier_tensorflow_embedding"
- name: "ner_bilstm_crf"
lr: 0.001
char_dim: 100
lstm_dim: 100
batches_per_epoch: 10
seg_dim: 20
num_segs: 4
batch_size: 200
tag_schema: "iobes"
model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
clip: 5
optimizer: "adam"
dropout_keep: 0.5
steps_check: 100
```
- 新增了jieba词性标注的模块，可以方便识别名字，地名，机构名等等jieba能够支持的词性，对应的yml文件配置如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "ner_crf"
- name: "jieba_pseg_extractor"
part_of_speech: ["nr", "ns", "nt"]
- name: "intent_featurizer_count_vectors"
OOV_token: oov
token_pattern: "(?u)\b\w+\b"
- name: "intent_classifier_tensorflow_embedding"
```
- 新增了根据实体反向修改意图，对应的文件配置如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "ner_crf"
- name: "jieba_pseg_extractor"
- name: "intent_featurizer_count_vectors"
OOV_token: oov
token_pattern: '(?u)\b\w+\b'
- name: "intent_classifier_tensorflow_embedding"
- name: "entity_edit_intent"
entity: ["nr"]
intent: ["enter_data"]
min_confidence: 0
```
- 新增了word2vec提取词向量特征，对应的配置文件如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "intent_featurizer_wordvector"
vector: "data/vectors.txt"
- name: "intent_classifier_tensorflow_embedding"
- name: "ner_crf"
- name: "jieba_pseg_extractor"
```
- 新增了bert模型提取词向量特征，对应的配置文件如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "bert_vectors_featurizer"
ip: '172.16.10.46'
port: 5555
port_out: 5556
show_server_config: True
timeout: 10000
- name: "intent_classifier_tensorflow_embedding"
- name: "ner_crf"
- name: "jieba_pseg_extractor"
```
- 新增了对CPU和GPU的利用率的配置，主要是`intent_classifier_tensorflow_embedding`和`ner_bilstm_crf`这两个使用到tensorflow的组件，配置如下（当然config_proto可以不配置，默认值会将资源全部利用）：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "intent_featurizer_count_vectors"
token_pattern: '(?u)\b\w+\b'
- name: "intent_classifier_tensorflow_embedding"
config_proto: {
"device_count": 4,
"inter_op_parallelism_threads": 0,
"intra_op_parallelism_threads": 0,
"allow_growth": True
}
- name: "ner_bilstm_crf"
config_proto: {
"device_count": 4,
"inter_op_parallelism_threads": 0,
"intra_op_parallelism_threads": 0,
"allow_growth": True
}
```
- 新增了`embedding_bert_intent_classifier`分类器，对应的配置文件如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "bert_vectors_featurizer"
ip: '172.16.10.46'
port: 5555
port_out: 5556
show_server_config: True
timeout: 10000
- name: "intent_classifier_tensorflow_embedding_bert"
- name: "ner_crf"
- name: "jieba_pseg_extractor"
```

- 在基础词向量使用bert的情况下，后端的分类器使用tensorflow高级api完成，tf.estimator,tf.data,tf.example,tf.saved_model
`intent_estimator_classifier_tensorflow_embedding_bert`分类器，对应的配置文件如下：
```
language: "zh"

pipeline:
- name: "tokenizer_jieba"
- name: "bert_vectors_featurizer"
ip: '127.0.0.1'
port: 5555
port_out: 5556
show_server_config: True
timeout: 10000
- name: "intent_estimator_classifier_tensorflow_embedding_bert"
- name: "nlp_spacy"
- name: "ner_crf"
```

## Quick Install
```
pip install rasa-nlu-gao
```

## 🤖 Running of the bot
To train the NLU model:
```
python3 -m rasa_nlu_gao.train -c sample_configs/config_embedding_bilstm.yml --data data/examples/rasa/rasa_dataset_training.json --path models
```

To run the NLU model:
```
python3 -m rasa_nlu_gao.server -c sample_configs/config_embedding_bilstm.yml -P 8000 --path models
```

## Some Examples
具体的例子请看[rasa_chatbot_cn](https://github.com/GaoQ1/rasa_chatbot_cn)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

Oct 30, 2019

1.0.2

Oct 10, 2019

1.0.1

Oct 10, 2019

1.0.0

Oct 1, 2019

0.3.4

Jul 9, 2019

0.3.3

Jul 8, 2019

0.3.2

Jul 8, 2019

0.3.1

Jun 26, 2019

0.3.0

Jun 24, 2019

0.2.7

Apr 17, 2019

0.2.6

Apr 15, 2019

This version

0.2.5

Apr 11, 2019

0.2.4

Apr 9, 2019

0.2.3

Apr 9, 2019

0.2.2

Mar 19, 2019

0.2.1

Dec 20, 2018

0.2.0

Dec 19, 2018

0.1.9

Dec 13, 2018

0.1.8

Dec 6, 2018

0.1.7

Dec 5, 2018

0.1.6

Nov 23, 2018

0.1.5

Nov 12, 2018

0.1.4

Nov 6, 2018

0.1.3

Nov 5, 2018

0.1.2

Oct 26, 2018

0.1.1

Oct 10, 2018

0.1.0

Oct 10, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rasa-nlu-gao-0.2.5.tar.gz (145.3 kB view details)

Uploaded Apr 11, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rasa_nlu_gao-0.2.5-py2.py3-none-any.whl (204.9 kB view details)

Uploaded Apr 11, 2019 Python 2Python 3

File details

Details for the file rasa-nlu-gao-0.2.5.tar.gz.

File metadata

Download URL: rasa-nlu-gao-0.2.5.tar.gz
Upload date: Apr 11, 2019
Size: 145.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/3.6

File hashes

Hashes for rasa-nlu-gao-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`b977099e5f2e5d9d96282f4bd8b8285ad9c5dc20413b5036445878fb7bc52dbf`
MD5	`6eada18b2ce60a96ae662286c59b2fc9`
BLAKE2b-256	`72229b6660242e7ce2437de9b60e9560b682ad9f1c419af56bfcfb790f000077`

See more details on using hashes here.

File details

Details for the file rasa_nlu_gao-0.2.5-py2.py3-none-any.whl.

File metadata

Download URL: rasa_nlu_gao-0.2.5-py2.py3-none-any.whl
Upload date: Apr 11, 2019
Size: 204.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/3.6

File hashes

Hashes for rasa_nlu_gao-0.2.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`759621c4b472d78a384eaf80bb82b6a37c7b29048ed8d6a152113a76a11ffb09`
MD5	`18343c61a29f8ef315ca3f66a9d684cb`
BLAKE2b-256	`b3597bc9d09fa4183b2de6194849216d9a47abc3956bee768f9bf9cd3b014171`

See more details on using hashes here.

rasa-nlu-gao 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes