Skip to main content

NLP error analysis.

Project description

Polyjuice

This repository contains code for testing NLP Models as described in the following paper:

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld Association for Computational Linguistics (ACL), 2021

Bibtex for citations:

@inproceedings{polyjuice:acl21,
    title = "{P}olyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models",
    author = "Tongshuang Wu and Marco Tulio Ribeiro and Jeffrey Heer and Daniel S. Weld",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics"
}

Installation

From Pypi:

pip install polyjuice_nlp

From source:

git clone git@github.com:tongshuangwu/polyjuice.git
cd polyjuice
pip install -e .

Polyjuice depends on SpaCy and Huggingface Transformers. To use most functions, please also install the following:

# install pytorch, as here: https://pytorch.org/get-started/locally/#start-locally
pip install torch
# The SpaCy language package
python -m spacy download en_core_web_sm

Perturbation

from polyjuice import Polyjuice
# initiate a wrapper.
# model path is defaulted to our portable model:
# https://huggingface.co/uw-hai/polyjuice
# No need to change this unless you are using customized model
pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)

# the base sentence
text = "It is great for kids."

# perturb the sentence with one line:
# When running it for the first time, the wrapper will automatically
# load related models, e.g. the generator and the perplexity filter.
perturbations = pj.perturb(text)

More advanced APIs

Please see the documents in X for more explanations.

To perturb with more controls,

perturbations = pj.perturb(
    orig_sent=text,
    # can specify where to put the blank. Otherwise, it's automatically selected.
    # Can be a list or a single sentence.
    blanked_sent="It is [BLANK] for kids.",
    # can also specify the ctrl code (a list or a single code.)
    # The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
    ctrl_code="negation",
    # Customzie perplexity score. 
    perplex_thred=5,
    # number of perturbations to return
    num_perturbations=1,
    # the function also takes in additional arguments for huggingface generators.
    num_beams=3
)

# return: ['It is bad for kids too.',
# "It 's great for kids.",
# 'It is great even for kids.']

To get randomly placed blanks,

perturbations = pj.perturb(
    orig_sent=text,
    # can specify where to put the blank. Otherwise, it's automatically selected.
    # Can be a list or a single sentence.
    blanked_sent=["It is [BLANK] for kids.", "It is great for [BLANK]."],
    # can also specify the ctrl code (a list or a single code.)
    # The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
    ctrl_code="negation",
    # Customzie perplexity score. 
    perplex_thred=20,
    # number of perturbations to return
    num_perturbations=3,
    # the function also takes in additional arguments for huggingface generators.
    num_beams=3
)
# return: [
# 'It is not great for kids.', 
# 'It is great for kids but not for anyone.',
# 'It is great for kids but not for any adults.']

Selection

For selecting diverse and surprising perturbations (for augmentation and explanation experiments in our paper), please see the notebook demo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyjuice_nlp-0.1.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

polyjuice_nlp-0.1.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file polyjuice_nlp-0.1.1.tar.gz.

File metadata

  • Download URL: polyjuice_nlp-0.1.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for polyjuice_nlp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 449d1bf6d65381ed7d6ead6a581fd818debd176848512e3eeb1b9777dee317fa
MD5 692be194a52a23b6f593f54496ec30b0
BLAKE2b-256 a3d3929e94679a7775303fd21df55e70a1602abfe06e5e56c301ce8ffa36b3d6

See more details on using hashes here.

File details

Details for the file polyjuice_nlp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: polyjuice_nlp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for polyjuice_nlp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e9da5e698bffa6c428132067981efb995308873b8cadd2f308833581470c8351
MD5 c683f85843bb576aed6a5a123ccc52b6
BLAKE2b-256 c13a8b492162c01a6fae6750ed97e97d20566d2638d4b6aca24fdddb05e3d617

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page