NLP error analysis.
Project description
Polyjuice
This repository contains code for testing NLP Models as described in the following paper:
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld Association for Computational Linguistics (ACL), 2021
Bibtex for citations:
@inproceedings{polyjuice:acl21,
title = "{P}olyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models",
author = "Tongshuang Wu and Marco Tulio Ribeiro and Jeffrey Heer and Daniel S. Weld",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
year = "2021",
publisher = "Association for Computational Linguistics"
}
Installation
From Pypi:
pip install polyjuice_nlp
From source:
git clone git@github.com:tongshuangwu/polyjuice.git
cd polyjuice
pip install -e .
Polyjuice depends on SpaCy and Huggingface Transformers. To use most functions, please also install the following:
# install pytorch, as here: https://pytorch.org/get-started/locally/#start-locally
pip install torch
# The SpaCy language package
python -m spacy download en_core_web_sm
Perturbation
from polyjuice import Polyjuice
# initiate a wrapper.
# model path is defaulted to our portable model:
# https://huggingface.co/uw-hai/polyjuice
# No need to change this unless you are using customized model
pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)
# the base sentence
text = "It is great for kids."
# perturb the sentence with one line:
# When running it for the first time, the wrapper will automatically
# load related models, e.g. the generator and the perplexity filter.
perturbations = pj.perturb(text)
More advanced APIs
Please see the documents in X for more explanations.
To perturb with more controls,
perturbations = pj.perturb(
orig_sent=text,
# can specify where to put the blank. Otherwise, it's automatically selected.
# Can be a list or a single sentence.
blanked_sent="It is [BLANK] for kids.",
# can also specify the ctrl code (a list or a single code.)
# The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
ctrl_code="negation",
# Customzie perplexity score.
perplex_thred=5,
# number of perturbations to return
num_perturbations=1,
# the function also takes in additional arguments for huggingface generators.
num_beams=3
)
# return: ['It is bad for kids too.',
# "It 's great for kids.",
# 'It is great even for kids.']
To get randomly placed blanks,
perturbations = pj.perturb(
orig_sent=text,
# can specify where to put the blank. Otherwise, it's automatically selected.
# Can be a list or a single sentence.
blanked_sent=["It is [BLANK] for kids.", "It is great for [BLANK]."],
# can also specify the ctrl code (a list or a single code.)
# The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
ctrl_code="negation",
# Customzie perplexity score.
perplex_thred=20,
# number of perturbations to return
num_perturbations=3,
# the function also takes in additional arguments for huggingface generators.
num_beams=3
)
# return: [
# 'It is not great for kids.',
# 'It is great for kids but not for anyone.',
# 'It is great for kids but not for any adults.']
Selection
For selecting diverse and surprising perturbations (for augmentation and explanation experiments in our paper), please see the notebook demo.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file polyjuice_nlp-0.1.1.tar.gz
.
File metadata
- Download URL: polyjuice_nlp-0.1.1.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 449d1bf6d65381ed7d6ead6a581fd818debd176848512e3eeb1b9777dee317fa |
|
MD5 | 692be194a52a23b6f593f54496ec30b0 |
|
BLAKE2b-256 | a3d3929e94679a7775303fd21df55e70a1602abfe06e5e56c301ce8ffa36b3d6 |
File details
Details for the file polyjuice_nlp-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: polyjuice_nlp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9da5e698bffa6c428132067981efb995308873b8cadd2f308833581470c8351 |
|
MD5 | c683f85843bb576aed6a5a123ccc52b6 |
|
BLAKE2b-256 | c13a8b492162c01a6fae6750ed97e97d20566d2638d4b6aca24fdddb05e3d617 |