Skip to main content

A tool for text counterfactual generation.

Project description

Polyjuice

This repository contains code for generating counterfactual sentences as described in the following paper:

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld Association for Computational Linguistics (ACL), 2021

Bibtex for citations:

@inproceedings{polyjuice:acl21,
    title = "{P}olyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models",
    author = "Tongshuang Wu and Marco Tulio Ribeiro and Jeffrey Heer and Daniel S. Weld",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics"
}

Installation

From Pypi:

pip install polyjuice_nlp

From source:

git clone git@github.com:tongshuangwu/polyjuice.git
cd polyjuice
pip install -e .

Polyjuice depends on SpaCy and Huggingface Transformers. To use most functions, please also install the following:

# install pytorch, as here: https://pytorch.org/get-started/locally/#start-locally
pip install torch
# The SpaCy language package
python -m spacy download en_core_web_sm

Perturbation

from polyjuice import Polyjuice
# initiate a wrapper.
# model path is defaulted to our portable model:
# https://huggingface.co/uw-hai/polyjuice
# No need to change this unless you are using customized model
pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)

# the base sentence
text = "It is great for kids."

# perturb the sentence with one line:
# When running it for the first time, the wrapper will automatically
# load related models, e.g. the generator and the perplexity filter.
perturbations = pj.perturb(text)

# return: ['It is bad for kids too.',
# "It 's great for kids.",
# 'It is great even for kids.']

More advanced APIs

Please see the documents in the main Python file for more explanations.

To perturb with more controls,

perturbations = pj.perturb(
    orig_sent=text,
    # can specify where to put the blank. Otherwise, it's automatically selected.
    # Can be a list or a single sentence.
    blanked_sent="It is [BLANK] for kids.",
    # can also specify the ctrl code (a list or a single code.)
    # The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
    ctrl_code="negation",
    # Customzie perplexity score. 
    perplex_thred=5,
    # number of perturbations to return
    num_perturbations=1,
    # the function also takes in additional arguments for huggingface generators.
    num_beams=3
)

# return: [
# 'It is not great for kids.', 
# 'It is great for kids but not for anyone.',
# 'It is great for kids but not for any adults.']

To detect ctrl code from a given sentence pair,

pj.detect_ctrl_code(
    "it's great for kids.", 
    "It is great for kids but not for any adults.")
# return: negation

To get randomly placed blanks,

random_blanks = py.get_random_blanked_sentences(
    sentence=text,
    # only allow selecting from a preset range of token indexes
    pre_selected_idxes=None,
    # only select from a subset of dep tags
    deps=None,
    # blank sub-spans or just single tokens
    is_token_only=False,
    # maximum number of returned index tuple
    max_blank_sent_count=3,
    # maximum number of blanks per returned sentence
    max_blank_block=1
)

Selection

For selecting diverse and surprising perturbations (for augmentation and explanation experiments in our paper), please see the notebook demo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyjuice_nlp-0.1.5.tar.gz (26.2 kB view hashes)

Uploaded Source

Built Distribution

polyjuice_nlp-0.1.5-py3-none-any.whl (30.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page