Skip to main content

A Rasa NLU component for composite entities

Project description

rasa_composite_entities

A Rasa NLU component for composite entities, developed to be used in the Dialogue Engine of Dialogue Technologies.

Installation

$ pip install rasa_composite_entities

The only external dependency is Rasa NLU itself, which should be installed anyway when you want to use this component.

After installation, the component can be added your pipeline like any other component:

language: "en_core_web_md"

pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "intent_classifier_sklearn"
- name: "rasa_composite_entities.CompositeEntityExtractor"

Usage

Simply add another entry to your training file (in JSON format) defining composite patterns:

"composite_entities": [
  {
    "name": "product_with_attributes",
    "patterns": [
      "@color @product with @pattern",
      "@pattern @color @product"
    ]
  }
],
"common_examples": [
    ...
]

Every word starting with a "@" will be considered a placeholder for an entity with that name. The component is agnostic to the origin of entities, you can use anything that Rasa NLU returns as the "entity" field in its messages. This means that you can not only use the entities defined in your common examples, but also numerical entities from duckling etc.

Longer patterns always take precedence over shorter patterns. If a shorter pattern matches entities that would also be matched by a longer pattern, the shorter pattern is ignored.

Patterns are regex expressions! You can use patterns like

"composite_entities": [
  {
    "name": "product_with_attributes",
    "patterns": [
      "(?:@pattern\\s+)?(?:@color\\s+)?@product(?:\\s+with @[A-Z,a-z]+)?"
    ]
  }
]

to match different variations of entity combinations. Be aware that you may need to properly escape your regexes to produce valid JSON files (in case of this example, you have to escape the backslashes with another backslash).

Explanation

Composite entities act as containers that group several entities into logical units. Consider the following example phrase:

I am looking for a red shirt with stripes and checkered blue shoes.

Properly trained, Rasa NLU could return entities like this:

"entities": [
  {
    "start": 19,
    "end": 22,
    "value": "red",
    "entity": "color",
    "confidence": 0.9419322376955782,
    "extractor": "ner_crf"
  },
  {
    "start": 23,
    "end": 28,
    "value": "shirt",
    "entity": "product",
    "confidence": 0.9435936216683031,
    "extractor": "ner_crf"
  },
  {
    "start": 34,
    "end": 41,
    "value": "stripes",
    "entity": "pattern",
    "confidence": 0.9233923349716401,
    "extractor": "ner_crf"
  },
  {
    "start": 46,
    "end": 55,
    "value": "checkered",
    "entity": "pattern",
    "confidence": 0.8877627536275875,
    "extractor": "ner_crf"
  },
  {
    "start": 56,
    "end": 60,
    "value": "blue",
    "entity": "color",
    "confidence": 0.6778344517453893,
    "extractor": "ner_crf"
  },
  {
    "start": 61,
    "end": 66,
    "value": "shoes",
    "entity": "product",
    "confidence": 0.536797743231954,
    "extractor": "ner_crf"
  }
]

It's hard to infer exactly what the user is looking for from this output alone. Is he looking for a striped and checkered shirt? Striped and checkered shoes? Or a striped shirt and checkered shoes?

By defining common patterns of entity combinations, we can automatically create entity groups. If we add the composite entity patterns as in the usage example above, the output will be changed to this:

"entities": [
  {
    "confidence": 1.0,
    "entity": "product_with_attributes",
    "extractor": "composite",
    "contained_entities": [
      {
        "start": 19,
        "end": 22,
        "value": "red",
        "entity": "color",
        "confidence": 0.9419322376955782,
        "extractor": "ner_crf"
      },
      {
        "start": 23,
        "end": 28,
        "value": "shirt",
        "entity": "product",
        "confidence": 0.9435936216683031,
        "extractor": "ner_crf"
      },
      {
        "start": 34,
        "end": 41,
        "value": "stripes",
        "entity": "pattern",
        "confidence": 0.9233923349716401,
        "extractor": "ner_crf"
      }
    ]
  },
  {
    "confidence": 1.0,
    "entity": "product_with_attributes",
    "extractor": "composite",
    "contained_entities": [
      {
        "start": 46,
        "end": 55,
        "value": "checkered",
        "entity": "pattern",
        "confidence": 0.8877627536275875,
        "extractor": "ner_crf"
      },
      {
        "start": 56,
        "end": 60,
        "value": "blue",
        "entity": "color",
        "confidence": 0.6778344517453893,
        "extractor": "ner_crf"
      },
      {
        "start": 61,
        "end": 66,
        "value": "shoes",
        "entity": "product",
        "confidence": 0.536797743231954,
        "extractor": "ner_crf"
      }
    ]
  }
]

Example

See the example folder for a minimal example that can be trained and tested. To get the output from above, run:

$ python -m rasa_nlu.train --path . --data train.json --config config_with_composite.yml
$ python -m rasa_nlu.server --path . --config config_with_composite.yml
$ curl -XPOST localhost:5000/parse -d '{"q": "I am looking for a red shirt with stripes and checkered blue shoes"}'

If you want to compare this output to the normal Rasa NLU output, use the alternative config_without_composite.yml config file.

The component also works when training using the server API:

$ python -m rasa_nlu.server --path . --config config_with_composite.yml
$ curl --request POST --header 'content-type: application/x-yml' --data-binary @train_http.yml --url 'localhost:5000/train?project=test_project'
$ curl -XPOST localhost:5000/parse -d '{"q": "I am looking for a red shirt with stripes and checkered blue shoes", "project": "test_project"}'

Caveats

Rasa NLU strips training files of any custom fields, including our "composite_entities" field. For our component to access this information, we have to circumenvent Rasa's train file loading process and get direct access to the raw data.

When training through the Rasa's train script, the train file paths are fetched through the command line arguments.

When training through the HTTP server, we exploit the fact that Rasa NLU creates temporary files containing the raw train data. Be aware that this creates a possible race condition when multiple training processes are executed simultaneously. If a new train process is started before the previous process has reached the CompositeEntityExtractor, there is a chance that the wrong train data will be picked up.

Similar projects

There is a pull request on Rasa NLU's Github page trying to implement composite entities. The request was closed without merging. The underlying code is available as a rasa component. However, the repository is currently lacking documentation and the implementation seems to be more limited than ours.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rasa_composite_entities-0.2.2.tar.gz (6.5 kB view details)

Uploaded Source

File details

Details for the file rasa_composite_entities-0.2.2.tar.gz.

File metadata

  • Download URL: rasa_composite_entities-0.2.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.7

File hashes

Hashes for rasa_composite_entities-0.2.2.tar.gz
Algorithm Hash digest
SHA256 fbb7d4509a92ced1b5e2390b5b46c043e57943ab01f08e2fe41bdc92b07f8944
MD5 05f1f180b47b8fa5418d5a791615f518
BLAKE2b-256 9d52e5e6359c9ce6d636afe98323949d041ac263f5b48aabc599ef6f174edf7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page