Skip to main content

State-of-the-art Natural Language Processing toolkit for multi-task and transfer learning built on PyTorch.

Project description

jiant is an NLP toolkit

The multitask and transfer learning toolkit for natural language processing research

Generic badge codecov CircleCI Code style: black License: MIT

Why should I use jiant?

A few additional things you might want to know about jiant:

  • jiant is configuration file driven
  • jiant is built with PyTorch
  • jiant integrates with datasets to manage task data
  • jiant integrates with transformers to manage models and tokenizers.

Getting Started

Installation

To import jiant from source (recommended for researchers):

git clone https://github.com/nyu-mll/jiant.git
cd jiant
pip install -r requirements.txt

# Add the following to your .bash_rc or .bash_profile 
export PYTHONPATH=/path/to/jiant:$PYTHONPATH

If you plan to contribute to jiant, install additional dependencies with pip install -r requirements-dev.txt.

We recommended that you install jiant in a virtual environment or a conda environment.

To check jiant was correctly installed, run a simple example.

Quick Introduction

The following example fine-tunes a RoBERTa model on the MRPC dataset.

Python version:

from jiant.proj.simple import runscript as run
import jiant.scripts.download_data.runscript as downloader

# Download the Data
downloader.download_data(["mrpc"], "/content/data")

# Set up the arguments for the Simple API
args = run.RunConfiguration(
   run_name="simple",
   exp_dir="/content/exp",
   data_dir="/content/data",
   model_type="roberta-base",
   tasks="mrpc",
   train_batch_size=16,
   num_train_epochs=3
)

# Run!
run.run_simple(args)

Bash version:

python jiant/scripts/download_data/runscript.py \
    download \
    --tasks mrpc \
    --output_path /content/data
python jiant/proj/simple/runscript.py \
    run \
    --run_name simple \
    --exp_dir /content/data \
    --data_dir /content/data \
    --model_type roberta-base \
    --tasks mrpc \
    --train_batch_size 16 \
    --num_train_epochs 3

Examples of more complex training workflows are found here.

Contributing

The jiant project's contributing guidelines can be found here.

Looking for jiant v1.3.2?

jiant v1.3.2 has been moved to jiant-v1-legacy to support ongoing research with the library. jiant v2.x.x is more modular and scalable than jiant v1.3.2 and has been designed to reflect the needs of the current NLP research community. We strongly recommended any new projects use jiant v2.x.x.

jiant 1.x has been used in in several papers. For instructions on how to reproduce papers by jiant authors that refer readers to this site for documentation (including Tenney et al., Wang et al., Bowman et al., Kim et al., Warstadt et al.), refer to the jiant-v1-legacy README.

Citation

If you use jiant ≥ v2.0.0 in academic work, please cite it directly:

@misc{phang2020jiant,
    author = {Jason Phang and Phil Yeres and Jesse Swanson and Haokun Liu and Ian F. Tenney and Phu Mon Htut and Clara Vania and Alex Wang and Samuel R. Bowman},
    title = {\texttt{jiant} 2.0: A software toolkit for research on general-purpose text understanding models},
    howpublished = {\url{http://jiant.info/}},
    year = {2020}
}

If you use jiant ≤ v1.3.2 in academic work, please use the citation found here.

Acknowledgments

  • This work was made possible in part by a donation to NYU from Eric and Wendy Schmidt made by recommendation of the Schmidt Futures program, and by support from Intuit Inc.
  • We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan V GPU used at NYU in this work.
  • Developer Jesse Swanson is supported by the Moore-Sloan Data Science Environment as part of the NYU Data Science Services initiative.

License

jiant is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jiant-2.0.1.tar.gz (142.1 kB view details)

Uploaded Source

Built Distribution

jiant-2.0.1-py3-none-any.whl (385.4 kB view details)

Uploaded Python 3

File details

Details for the file jiant-2.0.1.tar.gz.

File metadata

  • Download URL: jiant-2.0.1.tar.gz
  • Upload date:
  • Size: 142.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for jiant-2.0.1.tar.gz
Algorithm Hash digest
SHA256 27437ec89923f805e3df9233eb53db59cb707927d56fee8fbe9a14185d78f64d
MD5 07dd4ccc53a84f62d822624e0f2c04e7
BLAKE2b-256 40fa0bcf7939765e45d7a0a559cca465779f3966d6444407abe88da8849c04d9

See more details on using hashes here.

File details

Details for the file jiant-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: jiant-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 385.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for jiant-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d88a23c10955c77d6631fe415d335e2c0886bd939392c70c27a70b11d40fd6a
MD5 13e81b12fed03032284bfdc452b642dc
BLAKE2b-256 9ca5235f627b8ae0f32b6892807cb4636fa459e8d12d2efbe93b2639769a37e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page