airdialogue package
Project description
AirDialogue
AirDialogue is a benchmark dataset for goal-oriented dialogue generation research. This python library contains a collection of tookits that come with the dataset.
- AirDialogue paper
- AirDialogue dataset
- Reference implementation: AirDialogue Model
What's New
- Jul 13,2020: Fixed a bug in BLEU evaluation. The current version gives higher BLEU scores. Support evaluation for different roles and add KL-divergence metric (see
--infer_metrics
). - Jul 12,2020: We update the AirDialogue dataset to version v1.1. We fixed typos, misalignment between KB file and dialogue file. Please download and use the new data.
Prerequisites
General
- python (verified on 3.7)
- wget
Python Packages
- tensorflow (tested on 1.15.0)
- tqdm
- nltk
- flask (for visualization)
Install
To install the bleeding edge from github, use
python setup.py install
Quick Start
Scoring
The official scoring function evaluates the predictive results for a trained model and compare it to the AirDialogue dataset.
airdialogue score --true_data PATH_TO_DATA_FILE --true_kb PATH_TO_KB_FILE \
--infer_metrics bleu
--infer_metrics
can be one of (bleu:all|rouge:all|kl:all|bleu:brief|kl:brief).
brief
mode gives a single number metric. (bleu|kl) is equivalent to (belu:brief|kl:brief)
Context Generation
Context generator generates a valid context-action pair without conversatoin history.
airdialogue contextgen \
--output_data PATH_TO_OUTPUT_DATA_FILE \
--output_kb PATH_TO_OUTPUT_KB_FILE \
--num_samples 100
Preprocessing
AirDialogue proprocess tookie tokenizes dialogue. Preprocess on AirDialogue data requires 50GB of ram to work.
Parameter job_type is a set of 5 bits separted by |
, which reqpresents train|eval|infer|sp-train|sp-eval
.
Parameter input_type can be either context
for context only data or dialogue
for dialogue data with full history.
airdialogue prepro \
--data_file PATH_TO_DATA_FILE \
--kb_file PATH_TO_KB_FILE \
--output_dir "./data/airdialogue/" \
--output_prefix 'train' --job_type '0|0|0|1|0' --input_type context
Simulator
Simulator is built on top of context generator that provides not only a context-action pair but also a full conversation history generated by two templated chatbot agents.
airdialogue sim \
--output_data PATH_TO_OUTPUT_DATA_FILE \
--output_kb PATH_TO_OUTPUT_KB_FILE \
--num_samples 100
Visualization
Visualization tool displays the content of the raw json file.
airdialogue vis --data_path ./data/airdialogue/json/
Codalab simulator
To simulate running the Codalab selfplay workflow, you can run the following script that replicates the bundle workflow
for the competition. This requires a model/scripts/codalab_selfplay_step.sh
that can be run as
sh scripts/codalab_selfplay_step.sh out.txt data.json [kb.json]
More details can be found on the Airdialogue competition tutorial worksheet on Codalab.
bash airdialogue/codalab/simulate_codalab.sh <path_to_data>/json/dev_data.json <path_to_data>/json/dev_kb.json <model_folder>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airdialogue-essentials-0.1.tar.gz
.
File metadata
- Download URL: airdialogue-essentials-0.1.tar.gz
- Upload date:
- Size: 44.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60044289c585772573a15a9bdac66e51c37ba9173db30efdc7fba94226c148a4 |
|
MD5 | 95c85e8f7da4c211a64091690a9d5105 |
|
BLAKE2b-256 | a356a245130a59eb5fc26d2985f41a21fd65bfae6c9f4be1b725984655cc4a20 |
File details
Details for the file airdialogue_essentials-0.1-py3-none-any.whl
.
File metadata
- Download URL: airdialogue_essentials-0.1-py3-none-any.whl
- Upload date:
- Size: 60.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 244070fdffd8540b9502c62562e170a05896a8eec53895781381401a17632793 |
|
MD5 | dba5e2df68eb4214dd4a67f87d26c345 |
|
BLAKE2b-256 | c75ef5be631e605bef052cb95698a3f699b0cda3c17bebd7f4f52bb0e3deb954 |