Skip to main content

airdialogue package

Project description

AirDialogue

AirDialogue is a benchmark dataset for goal-oriented dialogue generation research. This python library contains a collection of tookits that come with the dataset.

What's New

  • Jul 13,2020: Fixed a bug in BLEU evaluation. The current version gives higher BLEU scores. Support evaluation for different roles and add KL-divergence metric (see --infer_metrics).
  • Jul 12,2020: We update the AirDialogue dataset to version v1.1. We fixed typos, misalignment between KB file and dialogue file. Please download and use the new data.

Prerequisites

General

  • python (verified on 3.7)
  • wget

Python Packages

  • tensorflow (tested on 1.15.0)
  • tqdm
  • nltk
  • flask (for visualization)

Install

To install the bleeding edge from github, use

python setup.py install

Quick Start

Scoring

The official scoring function evaluates the predictive results for a trained model and compare it to the AirDialogue dataset.

airdialogue score --true_data PATH_TO_DATA_FILE --true_kb PATH_TO_KB_FILE \
    --infer_metrics bleu

--infer_metrics can be one of (bleu:all|rouge:all|kl:all|bleu:brief|kl:brief). brief mode gives a single number metric. (bleu|kl) is equivalent to (belu:brief|kl:brief)

Context Generation

Context generator generates a valid context-action pair without conversatoin history.

airdialogue contextgen \
    --output_data PATH_TO_OUTPUT_DATA_FILE \
    --output_kb PATH_TO_OUTPUT_KB_FILE \
    --num_samples 100

Preprocessing

AirDialogue proprocess tookie tokenizes dialogue. Preprocess on AirDialogue data requires 50GB of ram to work. Parameter job_type is a set of 5 bits separted by |, which reqpresents train|eval|infer|sp-train|sp-eval. Parameter input_type can be either context for context only data or dialogue for dialogue data with full history.

airdialogue prepro \
  --data_file PATH_TO_DATA_FILE \
  --kb_file PATH_TO_KB_FILE \
  --output_dir "./data/airdialogue/" \
  --output_prefix 'train' --job_type '0|0|0|1|0' --input_type context

Simulator

Simulator is built on top of context generator that provides not only a context-action pair but also a full conversation history generated by two templated chatbot agents.

airdialogue sim \
    --output_data PATH_TO_OUTPUT_DATA_FILE \
    --output_kb PATH_TO_OUTPUT_KB_FILE \
    --num_samples 100

Visualization

Visualization tool displays the content of the raw json file.

airdialogue vis --data_path ./data/airdialogue/json/

Codalab simulator

To simulate running the Codalab selfplay workflow, you can run the following script that replicates the bundle workflow for the competition. This requires a model/scripts/codalab_selfplay_step.sh that can be run as

sh scripts/codalab_selfplay_step.sh out.txt data.json [kb.json]

More details can be found on the Airdialogue competition tutorial worksheet on Codalab.

bash airdialogue/codalab/simulate_codalab.sh <path_to_data>/json/dev_data.json <path_to_data>/json/dev_kb.json <model_folder>

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airdialogue-essentials-0.1.tar.gz (44.5 kB view details)

Uploaded Source

Built Distribution

airdialogue_essentials-0.1-py3-none-any.whl (60.1 kB view details)

Uploaded Python 3

File details

Details for the file airdialogue-essentials-0.1.tar.gz.

File metadata

  • Download URL: airdialogue-essentials-0.1.tar.gz
  • Upload date:
  • Size: 44.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for airdialogue-essentials-0.1.tar.gz
Algorithm Hash digest
SHA256 60044289c585772573a15a9bdac66e51c37ba9173db30efdc7fba94226c148a4
MD5 95c85e8f7da4c211a64091690a9d5105
BLAKE2b-256 a356a245130a59eb5fc26d2985f41a21fd65bfae6c9f4be1b725984655cc4a20

See more details on using hashes here.

File details

Details for the file airdialogue_essentials-0.1-py3-none-any.whl.

File metadata

  • Download URL: airdialogue_essentials-0.1-py3-none-any.whl
  • Upload date:
  • Size: 60.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for airdialogue_essentials-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 244070fdffd8540b9502c62562e170a05896a8eec53895781381401a17632793
MD5 dba5e2df68eb4214dd4a67f87d26c345
BLAKE2b-256 c75ef5be631e605bef052cb95698a3f699b0cda3c17bebd7f4f52bb0e3deb954

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page