cross-words·PyPI

Chat bot sentences & story generator.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3.6
Topic
- Communications

Project description

cross-words
==========================================

`cross-words` is a python module that allows you to easily create a corpus of documents with parameterized entities.

The main goal of `cross-words` is to offer an easy way to create either sentences or stories for use in chat bot training.
As of May 2018, it is mostly designed to be used with [Rasa NLU/Core](http://rasa.com/)

1. [Installation](#install)
2. [How to use this package](#usage)

# 1. Installation<a name="install"></a>

You can install it with pip:

pip install cross-words

Or directly from github if you want the latest development version

pip install git+https://github.com/data-chirps/cross-words.git

# 2. How to use this package<a name="usage"></a>
## cross-words DSL
`cross-words` is based on a simple yet powerful Domain Specific Language.
When used along with Rasa NLU/Core, it uses 3 concepts:

- **intents:** the objective of the chatbot's user (e.g. ask to book a restaurant, confirm a chatbot inquiry etc.)
- **entities:** specific parts of a sentence containing key information (e.g. which restaurant to book, how many people etc.)
- **aliases:** lists of synonyms that can be used interchangeably

More details are available at [Rasa NLU](https://nlu.rasa.com/tutorial.html)

Given a configuration file (.txt) containing all of the above, `cross-words` is able to generate many training sentences/conversations using combinations of sentence parts.

`cross-words` configuration files look like this:

```
Could I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?

@[time_filter]
this month
this year
LTD
life to date
up to date
since release
since launch
since beginning of fiscal year

@[geo_filter]
France
Germany
US
United States
America
Canada
Italy

@[subject_filter]
birds
parrots
owl
dogs
cats
persian

~[owners]
owners
possessors
```

If asked for sentences, `cross-words` will generate a .md file whose first lines will be :

```
- Could I have the number of [birds](subject_filter) possessors in [Canada](geo_filter) [life to date](time_filter)?
- Could I have the number of [parrots](subject_filter) possessors in [United States](geo_filter) [since release](time_filter)?
- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [up to date](time_filter)?
- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [since release](time_filter)?
- Could I have the number of [dogs](subject_filter) owners in [United States](geo_filter) [LTD](time_filter)?
- Could I have the number of [dogs](subject_filter) owners in [Canada](geo_filter) [this year](time_filter)?
- Could I have the number of [cats](subject_filter) owners in [France](geo_filter) [this year](time_filter)?
- Could I have the number of [cats](subject_filter) owners in [US](geo_filter) [since release](time_filter)?
- Could I have the number of [cats](subject_filter) owners in [America](geo_filter) [this month](time_filter)?
- Could I have the number of [cats](subject_filter) owners in [Canada](geo_filter) [life to date](time_filter)?

```
This file is then ready to use as training input to Rasa NLU.

If asked for stories:

```
## Genereated Story 815310784239368
* acquisition{}
- utter_ask_time_filter
* acquisition{"time_filter": "since beginning of fiscal year"}
- slot{"time_filter": "since beginning of fiscal year"}
- utter_ask_geo_filter
* acquisition{"geo_filter": "America"}
- slot{"geo_filter": "America"}
- utter_ask_subject_filter
* acquisition{"subject_filter": "dogs"}
- slot{"subject_filter": "dogs"}
- action_acquisition

## Genereated Story 257661587723758
* acquisition{"time_filter": "since release", "geo_filter": "Germany"}
- slot{"time_filter": "since release"}
- slot{"geo_filter": "Germany"}
- utter_ask_subject_filter
* acquisition{"subject_filter": "owl"}
- slot{"subject_filter": "owl"}
- action_acquisition

## Genereated Story 877699493192194
* acquisition{"subject_filter": "parrots"}
- slot{"subject_filter": "parrots"}
- utter_ask_time_filter
* acquisition{"time_filter": "LTD"}
- slot{"time_filter": "LTD"}
- utter_ask_geo_filter
* acquisition{"geo_filter": "France"}
- slot{"geo_filter": "France"}
- action_acquisition
```
This file is then ready to use for training with Rasa Core.

## Generating files

`cross-words` mainly comes with 2 functions: parse_input and generate. All other functions are implementation details.

### generate(input_path, output_path="./xwords/outputs/", intent_string=None, output_prefix='', training_ratio=1.0, for_story=False, n_sub=None)
This is the main function of `cross-words'.

Given an input configuration file, it outputs all combinations of intents x entities x aliases into a .md file ready for training.

A few arguments allow to tune its behavior:

- **input_path:** path to the configuration file *(string)*
- **output_path:** path to the output folder where train/test files will be written *(string)*
- **intent_string** string to specify intent at the beginning of sentence files (for Rasa NLU) or inside genereated stories (for Rasa Core) *(string)*
- **output_prefix** string to specify beginning of names of files that are written *(string)*
- **training_ratio:** ratio between train and test sets. If .7, 30% of all generated combinations will be reserved into a test file. If 1.0, no test file will be created. *(float)*
- **for_story:** whether to generate sentences (for Rasa NLU) or stories (for Rasa Core) *(bool)*
- **n_sub:** number of sentences/stories (incl. test) to be taken as a subsample of all possible combinations of intents x entities x aliases *(int)* (required when generating stories for Rasa Core)

### parse_input(input_path)
This function is provided as a facilitator for experimentation purposes. It is the first function called by generate.

Given an input configuration file, generates:

- a list of intents in the form
```
['intent_sentence_0', 'intent_sentence_1', ...]

e.g. from above:
['Could I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?']
```
- a dictionnary of entitites in the form
```
{'entity_0': ['alternative_00', 'alternative_01', ...],
'entity_1': ['alternative_10', 'alternative_11', ...], ...}

e.g. from above:
{'time_filter': ['this month', 'this year', ...],
'geo_filter': ['France', 'Germany', ...], ...}
```
- a dictionnary of synonyms in the form
```
{'alias_0': ['alternative_00', 'alternative_01', ...],
'alias_1': ['alternative_10', 'alternative_11', ...], ...}

e.g. from above:
{'owners': ['owners', 'possessors']}
```

## Combination logic

`cross-words` is designed to compute sentences by placing all entities and alias alternative into all intents.

As a rule of thumb, the overall maximum number of generated sentences is in the order of:

nbintent sentences × avg. nbentity placeholders per intent sentence × avg. nbalternatives per entity × avg. nbalias placeholders per intent sentence × avg. nbalternatives per alias

As such, the created training files grow exponentially, hence the available *n_sub* parameter in **generate**

In the specific case of stories (Rasa Core), `cross-words` will also use *information availability* as an additional combination dimension.

For example, the two stories below are based on a different initially available information set given by the user:

```
## Genereated Story 257661587723758
* acquisition{"time_filter": "since release", "geo_filter": "Germany"}
- slot{"time_filter": "since release"}
- slot{"geo_filter": "Germany"}
- utter_ask_subject_filter
* acquisition{"subject_filter": "owl"}
- slot{"subject_filter": "owl"}
- action_acquisition

## Genereated Story 877699493192194
* acquisition{"time_filter": "since release"}
- slot{"time_filter": "since release"}
- utter_ask_subject_filter
* acquisition{"subject_filter": "owl"}
- slot{"subject_filter": "owl"}
- utter_ask_geo_filter
* acquisition{"geo_filter": "Germany"}
- slot{"geo_filter": "Germany"}
- action_acquisition
```

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3.6
Topic
- Communications

Release history Release notifications | RSS feed

This version

0.0.2

Jun 5, 2018

0.0.1

May 9, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cross_words-0.0.2-py2.py3-none-any.whl (15.4 kB view details)

Uploaded Jun 5, 2018 Python 2Python 3

File details

Details for the file cross_words-0.0.2-py2.py3-none-any.whl.

File metadata

Download URL: cross_words-0.0.2-py2.py3-none-any.whl
Upload date: Jun 5, 2018
Size: 15.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for cross_words-0.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`f54cfa676f5f7d5fe9bfe9241918086b57c1677f534f491b510d6a919137f0f8`
MD5	`5ce37ca41c80ef87e6598f4c896a0e76`
BLAKE2b-256	`63fd0af5f56f0dd7f499c54b1b309cb0cfcb234bf9e15592afe36a5204d5b351`

See more details on using hashes here.

cross-words 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes