A python package for attacking Russian NLP models
Project description
Robustness Evaluation of Pre-trained Language Models in the Russian Language
This is a repo with experiments for Robustness Evaluation of Pre-trained Language Models in the Russian Language and a tool ru_attacker
for attacking Russian NLP models
Installation
pip install ru_attacker
Usage example
Set model
>>> from ru_attacker.models import RobertaModel
>>> roberta_checkpoints = "Roberta_checkpoints"
>>> ruRoberta = RobertaModel(roberta_checkpoints)
Set dataset
>>> from ru_attacker.models.set_dataset import get_data
>>> data_dir = "TERRa/val.jsonl"
>>> data = get_data(data_dir)
Set attack
You have to define transformation
, goal_function
and type_perturbation
. constraints
and search_method
are optional
>>> from ru_attacker.attacks.transformations import BackTranslation # transformation
>>> from ru_attacker.attacks.goal_function import LabelPreserving # goal function
>>> from ru_attacker.attacks.constraints import GrammarAcceptability, SemanticSimilarity # constraints
>>> from ru_attacker.attacks.search_method import GreedySearch # search method
>>> from ru_attacker.attacks import Attack # attack wrapper
>>> backtranslation = Attack(
transformation=BackTranslation(languages=["en", "fr", "de"]), # you can set languages manually or use the default ones
goal_function=LabelPreserving(),
type_perturbation="hypothesis", # to what part perturbation is applied {"hypothesis", "premise"}
constraints=[GrammarAcceptability(), SemanticSimilarity()],
search_method=GreedySearch()
)
Attack model and view results
>>> results = backtranslation.attack(ruRoberta, data)
[Succeeded / Failed / Skipped / Total] 0 / 1 / 0 / 1:
entailment --> entailment
original premise: """Решение носит символический характер, так как взыскать компенсацию практически невозможно"", - отмечается в сообщении."
original hypothesis: Взыскать компенсацию не получится.
transformed: Компенсации не будет.
[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2:
entailment --> not_entailment
original premise: Об этом вечером во вторник, 17 января, сообщила пресс-служба Спасательного департамента, отметив, что немецкую противотанковую мину Tellermine 42 обнаружили в на улице Кеэвисе в ходе земляных работ. Спасатели эвакуировали жителей окрестных домов, офисов и складских помещений. Уничтожать мину на месте не стали, поскольку это угрожало повреждению трассы трубопровода.
original hypothesis: На улице Кеэвисе жителей эвакуировали из-за мины.
transformed: На улице Касери эвакуировали жителей из мин.
Convert results to DataFrame
>>> import pandas as pd
>>> dataframe = pd.DataFrame(results)
Here is Tutorial
Experiments
All the data used in experiments and the results are in
data
folder (TERRa
and
results
correspondingly).
All experiments can be reproduced in Experiments.ipynb
.
Models checkpoints are available via:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ru_attacker-0.0.5.tar.gz
.
File metadata
- Download URL: ru_attacker-0.0.5.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5011870c41d84b940ea3320478ef0f43f40b9986e20edb4fc20969c03c8ff884 |
|
MD5 | b9c327f81d9e553d326a3e49d26e4c2a |
|
BLAKE2b-256 | 7d20c0e5926929ed91f4903efca141f0b23e8af5d6c1713301b99df4153ff4f5 |
File details
Details for the file ru_attacker-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: ru_attacker-0.0.5-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5138801118b7d093894740aed6cd95db7592086822274129e7150f382a165fcb |
|
MD5 | 39ffed6c146e68834cdb1dc7cb3bca35 |
|
BLAKE2b-256 | 4f7a43415adc5ec0bfea2a8c7153e23392f432e065b51a7a11109c2f94c84c62 |