Skip to main content

A tool for text counterfactual generation.

Project description

Polyjuice

This repository contains code for generating counterfactual sentences as described in the following paper:

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld Association for Computational Linguistics (ACL), 2021

Bibtex for citations:

@inproceedings{polyjuice:acl21,
    title = "{P}olyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models",
    author = "Tongshuang Wu and Marco Tulio Ribeiro and Jeffrey Heer and Daniel S. Weld",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics"
}

Installation

From Pypi:

pip install polyjuice_nlp

From source:

git clone git@github.com:tongshuangwu/polyjuice.git
cd polyjuice
pip install -e .

Polyjuice depends on SpaCy and Huggingface Transformers. To use most functions, please also install the following:

# install pytorch, as here: https://pytorch.org/get-started/locally/#start-locally
pip install torch
# The SpaCy language package
python -m spacy download en_core_web_sm

Perturbation

from polyjuice import Polyjuice
# initiate a wrapper.
# model path is defaulted to our portable model:
# https://huggingface.co/uw-hai/polyjuice
# No need to change this unless you are using customized model
pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)

# the base sentence
text = "It is great for kids."

# perturb the sentence with one line:
# When running it for the first time, the wrapper will automatically
# load related models, e.g. the generator and the perplexity filter.
perturbations = pj.perturb(text)

# return: ['It is bad for kids too.',
# "It 's great for kids.",
# 'It is great even for kids.']

More advanced APIs

Please see the documents in the main Python file for more explanations.

To perturb with more controls,

perturbations = pj.perturb(
    orig_sent=text,
    # can specify where to put the blank. Otherwise, it's automatically selected.
    # Can be a list or a single sentence.
    blanked_sent="It is [BLANK] for kids.",
    # can also specify the ctrl code (a list or a single code.)
    # The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
    ctrl_code="negation",
    # Customzie perplexity score. 
    perplex_thred=5,
    # number of perturbations to return
    num_perturbations=1,
    # the function also takes in additional arguments for huggingface generators.
    num_beams=3
)

# return: [
# 'It is not great for kids.', 
# 'It is great for kids but not for anyone.',
# 'It is great for kids but not for any adults.']

To detect ctrl code from a given sentence pair,

pj.detect_ctrl_code(
    "it's great for kids.", 
    "It is great for kids but not for any adults.")
# return: negation

To get randomly placed blanks,

random_blanks = py.get_random_blanked_sentences(
    sentence=text,
    # only allow selecting from a preset range of token indexes
    pre_selected_idxes=None,
    # only select from a subset of dep tags
    deps=None,
    # blank sub-spans or just single tokens
    is_token_only=False,
    # maximum number of returned index tuple
    max_blank_sent_count=3,
    # maximum number of blanks per returned sentence
    max_blank_block=1
)

Selection

For selecting diverse and surprising perturbations (for augmentation and explanation experiments in our paper), please see the notebook demo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyjuice_nlp-0.1.5.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

polyjuice_nlp-0.1.5-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file polyjuice_nlp-0.1.5.tar.gz.

File metadata

  • Download URL: polyjuice_nlp-0.1.5.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.6.13

File hashes

Hashes for polyjuice_nlp-0.1.5.tar.gz
Algorithm Hash digest
SHA256 0c3a4915a5054e0b392cd591819e142ba4c18889659742772d611c14717c1a5c
MD5 539d55f70e85f0c57d49b1f1c6b7a895
BLAKE2b-256 85a2393fd0e829152b44c3cb1b5e2730b1ba9f223a40b197be5bf8fc40a743d8

See more details on using hashes here.

File details

Details for the file polyjuice_nlp-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: polyjuice_nlp-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.6.13

File hashes

Hashes for polyjuice_nlp-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6925c376c3682c4de4c21264982ccf372c5adc9d0d8b29a32d5c2166db72dfa0
MD5 0e6b745a698a05b96ab1fd707cbe40fa
BLAKE2b-256 1d11bef6a870acdef9249c91b159e3ce4a835e156c4a63f2ef645e20e29421fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page