Fibber is a benchmarking suite for adversarial attacks on text classification.
Project description
An open source project from Data to AI Lab at MIT.
Fibber
Fibber is a library to evaluate different strategies to paraphrase natural language, especially how these strategies can break text classifiers without changing the meaning of a sentence.
- Documentation: https://DAI-Lab.github.io/fibber
- GitHub: https://github.com/DAI-Lab/fibber
Overview
Fibber is a library to evaluate different strategies to paraphrase natural language. In this library, we have several built-in paraphrasing strategies. We also have a benchmark framework to evaluate the quality of paraphrase. In particular, we use the GPT2 language model to measure how meaningful is the paraphrased text. We use a universal sentence encoder to evaluate the semantic similarity between original and paraphrased text. We also train a BERT classifier on the original dataset, and check of paraphrased sentences can break the text classifier.
Install
Requirements
fibber has been developed and tested on Python 3.6, 3.7 and 3.8
Also, although it is not strictly required, the usage of conda is highly recommended to avoid interfering with other software installed in the system in which fibber is run.
These are the minimum commands needed to create a conda environment using python3.6 for fibber:
# First you should install conda.
conda create -n fibber_env python=3.6
Afterward, you have to execute this command to activate the environment:
conda activate fibber_env
Then you should install tensorflow and pytorch. Please follow the instructions for tensorflow and pytorch. Fibber requires tensorflow>=2.0.0
and pytorch>=1.5.0
.
Remember to execute conda activate fibber_env
every time you start a new console to work on fibber!
Install from PyPI
After creating the conda environment and activating it, we recommend using pip in order to install fibber:
pip install fibber
This will pull and install the latest stable release from PyPI.
Use without install
If you are using this project for research purpose and want to make changes to the code, you can install all requirements by
git clone git@github.com:DAI-Lab/fibber.git
cd fibber
pip install --requirement requirement.txt
Then you can use fibber by
python -m fibber.datasets.download_datasets
python -m fibber.benchmark.benchmark
In this case, any changes you made on the code will take effect immediately.
Install from source
With your conda environment activated, you can clone the repository and install it from
source by running make install
on the stable
branch:
git clone git@github.com:DAI-Lab/fibber.git
cd fibber
git checkout stable
make install
Quickstart
In this short tutorial, we will guide you through a series of steps that will help you getting started with fibber.
(1) Install Fibber
(2) Get a demo dataset.
from fibber.datasets import get_demo_dataset
trainset, testset = get_demo_dataset()
(3) Create a Fibber object.
from fibber.fibber import Fibber
arg_dict = {
"use_gpu_id": 0,
"gpt2_gpu_id": 0,
"strategy_gpu_id": 0,
}
fibber = Fibber(arg_dict, dataset_name="demo", strategy_name="RandomStrategy",
trainset=trainset, testset=testset)
(4) Randomly sample a sentence from the test set, and paraphrase it.
The following command can randomly paraphrase the sentence into 5 different ways.
fibber.paraphrase_a_random_sentence(n=5)
The output is a tuple of (str, list, list).
# Original Text
'the movie slides downhill as soon as macho action conventions assert themselves .'
# 5 paraphrases
['conventions slides as as action assert macho downhill soon movie . the themselves',
'as . downhill action macho the themselves assert as slides conventions soon movie',
'movie as slides macho action . soon themselves the downhill as assert conventions',
'the soon assert as movie themselves macho conventions as downhill . action slides',
'downhill movie conventions slides the assert themselves action macho as as . soon'],
# Evaluation metrics of these 5 paraphrases.
[{'EditingDistance': 8,
'USESemanticSimilarity': 0.8859144449234009,
'GloVeSemanticSimilarity': 1.0000000321979126,
'GPT2GrammarQuality': 23.059619903564453},
{'EditingDistance': 9,
'USESemanticSimilarity': 0.8609699010848999,
'GloVeSemanticSimilarity': 1.0000000321979126,
'GPT2GrammarQuality': 39.824188232421875},
{'EditingDistance': 8,
'USESemanticSimilarity': 0.8530778288841248,
'GloVeSemanticSimilarity': 1.0000000321979126,
'GPT2GrammarQuality': 17.592607498168945},
{'EditingDistance': 9,
'USESemanticSimilarity': 0.8957847356796265,
'GloVeSemanticSimilarity': 1.0000000321979126,
'GPT2GrammarQuality': 24.76700210571289},
{'EditingDistance': 9,
'USESemanticSimilarity': 0.9004875421524048,
'GloVeSemanticSimilarity': 1.0000000321979126,
'GPT2GrammarQuality': 11.36586856842041}]
(5) You can also ask fibber to paraphrase your sentence.
fibber.paraphrase({"text0": "This movie is fantastic"}, "text0", 5)
Supported strategies
In this version, we implement three strategies
- IdentityStrategy:
- The identity strategy outputs the original text as its paraphrase.
- This strategy generates exactly 1 paraphrase for each original text regardless of
--num_paraphrases_per_text
flag.
- RandomStrategy:
- The random strategy outputs the random shuffle of words in the original text.
What's next?
For more details about fibber and all its possibilities and features, please check the documentation site.
History
version 0.0.1
This is the first release of Fibber library. This release contains:
- Datasets: fibber contains 6 built-in datasets.
- Metrics: fibber contains 6 metrics to evaluate the quality of paraphrased sentences. All metrics have a unified interface.
- Benchmark framework: the benchmark framework and easily evaluate the phraphrase strategies on built-in datasets and metrics.
- Strategies: this release contains 2 basic strategies, the identity strategy and random strategy.
- A unified Fibber interface: users can easily use fibber by creating a Fibber object.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fibber-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d2d61865576541328ec7ad34ee7e217c2ce446e3b3d1690232e801b3b8b484c |
|
MD5 | 6dbbb3f866f6d4250531da56cf4455a0 |
|
BLAKE2b-256 | 70b176c71554f19e5f2f56c18426fe48f52f1c6e174675e089cd21f99cb23b59 |