An Intent Classifier For Chatbot

Project description

An Intent Classifier For Chatbot

Introduction

The intent recognition is the very key component of a chatbot system. We can recognize a man's intent by what a user speak and the dialog context. It is a very easy daily activity for us human beings, however, it is a very hard task for computers.

The intent recognition is treated as a process of multi-labels classification. Concretely speaking, we use words and context as our input, and the output is multi-labels which means a user's words might have multi-intent.

input:
- words:
  Just a string of what user speak.
  Example: "I wanna known what time is it and how is the weather?"
- context(optional):
  A json string with kinds of context information.
  Example: '{"timestamp": 1553154627, "location": "Shanghai"}'
The words and context will be transformed to tfidf-vector and dict-vector respectively, and then the two vectors will be concatenated to form the final input vector.
intent labels:
- multi-labels:
  Labels string separated with ",", such as "time,weather".
- multi-levels:
  Similar intent labels will be put in the same category and form intent with multi-levels.
  Example:
  "news/sport_news/football_news",
  "news/sport_news/basketball_news",

Dataset

Intent Dataset

The intent dataset can be in any storage, like mysql database or local csv file. Two functions to load dataset from mysql and csv file have been implemented in dataset.py. They can be used simply by offering parameters, mysql connection configure or csv file path.
If intent dataset is put in different storage, you could implement function just like utils.load_from_mysql/load_from_csv. Just remember that the result of the function should be an instance of DataBunch which confined the fields of the dataset:

def load_from_xxx(xxx_params) -> DataBunch:
    words = []
    contexts = []
    intents = []

    # get words, contexts, intents
    ... 

    return DataBunch(words=np.array(words, dtype=np.str),
                     contexts=np.array(contexts, dtype=np.str),
                     intents=np.array(intents, dtype=np.str))

Rules

Rules is essentially a kind of dataset just like intent dataset. They can also be stored in any storage and fetched in the same way with intent dataset:

def load_from_xxx(xxx_params) -> DataBunch:
    words_rules = []
    context_rules = []
    intent_labels = []

    # get words, contexts, intents
    ... 

    return RuleBunch(words_rules=words_rules, context_rules=context_rules,
                     intent_labels=intent_labels)

Classifiers

The classification mechanism consists of rule-based and model-based approaches:

rule-based classification:
The rule-based approach predict intent labels using hand-written regexps and context keys&values to match the words and context. Unlike model-based classifier, we don't need to train the rule-based classifier but load rules from storage and then create instance directly with the rules:

import os

from intent_classifier import RuleClassifier, ModelClassifier, IntentClassifier
from intent_classifier.dataset import load_intents_from_mysql, load_rules_from_mysql

configs_mysql_rule = {"host": "xxx.xxx.xxx.xxx", "port": xxx,
                      "user": "xxxx", "password": "xxxx",
                      "db": "xxxx", "table": "xxxx",
                      "customer": "xxxx"}
rule_bunch = load_rules_from_mysql(configs_mysql_rule)

rule_classifier = RuleClassifier(rule_bunch)

model-based classification:
The model-based approach predict intent labels using machine learning models trainded from intent dataset. Suppose that we have trained and dumped models, we can use the models in this way:

folder = "xxx/xxx/xxx"
model_classifier = ModelClassifier(folder=folder, customer="xxx", lang="en", 
                                   ner=None, n_jobs=-1)
model_classifier.load(clf_id=None)  # load models with maximum id

IntentClassifier:
IntentClassifier wraps RuleClassifier and ModelClassifier to offer a final integration interface to predict intent labels. We can use just RuleClassifier or ModelClassifier to initialize IntentClassifer or use both them. If the two classifiers are in use, the ModelClassifier might be skipped if we have already got intent labels from the RuleClassifier.
Please note that we should try not use to many rules to predict the intent labels, the best way of which is to use model-based classifier. The rule-based classifier is an option only for very simple case no need of model or as a temporary solution when model is not ready, for instance we need retrain the model to add some new intents.

Training

Load intent dataset, create an instance of ModelClassifier and then fit the dataset.

configs_mysql_model = {"host": "xxx.xxx.xxx.xxx", "port": xxx,
                       "user": "xxxx", "password": "xxxx",
                       "db": "xxxx", "table": "xxxx",
                       "customer": "xxxx"}
data_bunch = load_intents_from_mysql(configs_mysql_model)

folder = os.path.join(os.getcwd(), "models")
model_classifier = ModelClassifier(folder=folder, customer="xxx", lang="en", 
                                   ner=None, n_jobs=-1)
model_classifier.fit(data_bunch)

Note that the param "ner" in Intent is for named entity recognition, which is optional to offer entity information in words.

Save Models

After finishing the fitting, run dump() to save the models in a sub-folder with name from datatime in the specified model folder. The models and report will be save in the sub-folder with name "intent.model" and "report.txt" respectively.

model_classifier.dump()

Load Models

Run load with specific clf_id or with default clf_id to load the most recent models.

model_classifier.load(clf_id="20190321113421")  # if clf_id is None, the model 
                                                # with maximum id will be loaded

Create IntentClassifier

intent_classifier = IntentClassifier(rule_classifier=rule_classifier,
                                     model_classifier=model_classifier)

Predict

Predict intent labels using predict() with words and contexts. The returned will be as list of intent labels.

intent_labels = intent_classifier.predict(
    word="I wanna known what time is it and how is the weather?",
    context={"timestamp": 1553154627, "location": "Shanghai"}
)

Requirements

Python 3.7
numpy
scipy
pandas
scikit-learn==0.20.3
joblib
pymysql (optional, for dataset from mysql)
jieba (optional, for Chinese tokenization)

Installation

pip install -e git+https://github.com/aitrek/intent_classifier.git

Project details

Release history Release notifications | RSS feed

This version

0.2.1

Mar 29, 2019

0.2.0

Mar 29, 2019

0.1.0

Mar 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intent_classifier-0.2.1.tar.gz (13.8 kB view details)

Uploaded Mar 29, 2019 Source

Built Distribution

intent_classifier-0.2.1-py3-none-any.whl (17.5 kB view details)

Uploaded Mar 29, 2019 Python 3

File details

Details for the file intent_classifier-0.2.1.tar.gz.

File metadata

Download URL: intent_classifier-0.2.1.tar.gz
Upload date: Mar 29, 2019
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15rc1

File hashes

Hashes for intent_classifier-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`aae3805bf484927cc0bb9b5f37ffb59a7ece9cb1c3018557383e645b83104d1e`
MD5	`c7613c8b3c91315f71e1ceb626a2709e`
BLAKE2b-256	`944f2a8e1714017b8f83a10f46ea6a987859092300869d0b5434adea9d56fdae`

See more details on using hashes here.

File details

Details for the file intent_classifier-0.2.1-py3-none-any.whl.

File metadata

Download URL: intent_classifier-0.2.1-py3-none-any.whl
Upload date: Mar 29, 2019
Size: 17.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15rc1

File hashes

Hashes for intent_classifier-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`519e87384bb51e9af0711a0782df5db52efc40f42d73b1bf9e49295a4c63e552`
MD5	`f28f97f76bd671a3e7ca63fafdc08173`
BLAKE2b-256	`c41010dd89c3d3f5aab639d0356ddc4394b3ab849c6f3e149732aa6a623f378b`

See more details on using hashes here.

intent-classifier 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

An Intent Classifier For Chatbot

Introduction

Dataset

Intent Dataset

Rules

Classifiers

Training

Save Models

Load Models

Create IntentClassifier

Predict

Requirements

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes