Skip to main content

Jack the Reader is a Python framework for Machine Reading

Project description

Jack the Reader Wercker build badge codecov Gitter license

A Machine Reading Comprehension framework.
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!

Jack the Reader - or jack, for short - is a framework for building and using models on a variety of tasks that require reading comprehension. For more informations about the overall architecture, we refer to Jack the Reader – A Machine Reading Framework (ACL 2018).

Installation

To install Jack, install requirements and TensorFlow. In case you want to use PyTorch for writing models, please install PyTorch as well.

Supported ML Backends

We currently support TensorFlow and PyTorch. Readers can be implemented using both. Input and output modules (i.e., pre- and post-processing) are independent of the ML backend and can thus be reused for model modules that either backend. Though most models are implemented in TensorFlow by reusing the cumbersome pre- and post-processing it is easy to quickly build new readers in PyTorch as well.

Pre-trained Models

Find pre-trained models here.

Code Structure

  • jack.core - core abstractions used
  • jack.readers - implementations of models
  • jack.eval - task evaluation code
  • jack.util - utility code that is used throughout the framework, including shared ML code
  • jack.io - IO related code, including loading and dataset conversion scripts

Projects

Quickstart

Coding Tutorials - Notebooks & CLI

We provide ipython notebooks with tutorials on Jack. For the quickest start, you can begin here. If you're interested in training a model yourself from code, see this tutorial (we recommend the command-line, see below), and if you'd like to implement a new model yourself, this notebook gives you a tutorial that explains this process in more detail.

There is documentation on our command-line interface for actually training and evaluating models. For a high-level explanation of the ideas and vision, see Understanding Jack the Reader.

Command-line Training and Usage of a QA System

To illustrate how jack works, we will show how to train a question answering model using our command-line interface which is analoguous for other tasks (browse conf/ for existing task-dataset configurations). It is probably best to setup a virtual environment to avoid clashes with system wide python library versions.

First, install the framework:

$ python3 -m pip install -e .[tf]

Then, download the SQuAD dataset, and the GloVe word embeddings:

$ ./data/SQuAD/download.sh
$ ./data/GloVe/download.sh

Train a FastQA model:

$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or shorter, using our prepared config:

$ python3 bin/jack-train.py with config='./conf/qa/squad/fastqa.yaml'

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. quickstart.

You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/squad/ (e.g., bidaf.yaml or our own creation jack_qa.yaml). These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel.

If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ wget -O fastqa.zip https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader
from jack import readers
from jack.core import QASetting

fastqa_reader = readers.reader_from_file("./fastqa_reader")

support = """"It is a replica of the grotto at Lourdes,
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858.
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome),
is a simple, modern stone statue of Mary."""

answers = fastqa_reader([QASetting(
    question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    support=[support]
)])

print(answers[0][0].text)

Support

We are thankful for support from:

Developer guidelines

$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]

Citing

@InProceedings{weissenborn2018jack,
author    = {Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel},
title     = {{Jack the Reader – A Machine Reading Framework}},
booktitle = {{Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) System Demonstrations}},
Month     = {July},
year      = {2018},
url       = {https://arxiv.org/abs/1806.08727}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uclmr-jack-0.2.1.tar.gz (93.2 kB view details)

Uploaded Source

Built Distributions

uclmr_jack-0.2.1-py3.6.egg (129.2 kB view details)

Uploaded Source

uclmr_jack-0.2.1-py3-none-any.whl (135.3 kB view details)

Uploaded Python 3

File details

Details for the file uclmr-jack-0.2.1.tar.gz.

File metadata

  • Download URL: uclmr-jack-0.2.1.tar.gz
  • Upload date:
  • Size: 93.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for uclmr-jack-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4cd2b4efcec044d75f5dc196679604df8dc05c19f2a475115c9cded826365282
MD5 9210a888a4fe683ead044e8856e38fa5
BLAKE2b-256 38cdc798267e9e7f9f73d95ac3c7fe2210fc9ccd5807f15c90040891c7e6cbdb

See more details on using hashes here.

File details

Details for the file uclmr_jack-0.2.1-py3.6.egg.

File metadata

File hashes

Hashes for uclmr_jack-0.2.1-py3.6.egg
Algorithm Hash digest
SHA256 d974ec89345a44a6ed3f1e3e5ee1e8d95e008a401c4f1ee20eade6a4f5d18aef
MD5 a3889ddb760f67677104ced28e1507c8
BLAKE2b-256 70e38b86a42bf0928ca59534c2ce1be44c1b1c2e958585f194a7cc6f6a0e5817

See more details on using hashes here.

File details

Details for the file uclmr_jack-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for uclmr_jack-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae268c5e15c05652639a641fd3e43bdcb0e0af59088345e5a55120168e25a08a
MD5 36d2d41ff7ec6a5646bb6f96c6b36c9d
BLAKE2b-256 5ca03943928dd77b2d622f8d033a99394aecc686485606f771ce7310e1e92978

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page