Skip to main content

This package is designed to simulate adversarial on pre-trained language models (pre-LLM models)

Project description

IsoAdverse Documentation

Introduction

Welcome to IsoAdverse, a Python package designed to simulate adversarial on pre-trained language models (pre-LLM models). This package implements a range of attacks as described in recent research to help secure your AI Agents and LLMs.

Installation

To install the IsoAdverse package, you can use pip:

pip install iso-adverse

Quickstart

Here’s a quick example of how to use IsoAdverse to train a BERT model with adversarial training:

import torch
from isoadverse.utils.data_loader import get_data_loader
from isoadverse.utils.model_loader import get_model_and_tokenizer

# Load data and model
texts = ["This is a positive sentence.", "This is a negative sentence."]
labels = torch.tensor([1, 0])
train_loader = get_data_loader(texts, labels, batch_size=2)

model, tokenizer = get_model_and_tokenizer(model_name='bert-base-uncased')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

Attacks

IsoAdverse implements several adversarial attacks on text data. Below are the details of each attack.

Fast Gradient Sign Method (FGSM)

The FGSM attack perturbs the input text by leveraging the gradients of the loss with respect to the input.

from isoadverse.attacks.text_fgsm import text_fgsm_attack
print("Running FGSM Attack...")
perturbed_text = text_fgsm_attack(model, tokenizer, texts[0], torch.tensor([labels[0]]), epsilon=0.3)
print("Original Text:", texts[0])
print("Perturbed Text:", tokenizer.decode(perturbed_text[0]))

Projected Gradient Descent (PGD)

The PGD attack is an iterative attack method that performs multiple steps of FGSM.

from isoadverse.attacks.text_pgd import text_pgd_attack

print("\nRunning PGD Attack...")
perturbed_ids = text_fgsm_attack(model, tokenizer, texts[0], torch.tensor([labels[0]]), epsilon=0.3)
print("Original Text:", texts[0])
print("Perturbed Text:", tokenizer.decode(perturbed_ids[0]))

TextBugger

TextBugger perturbs the text by introducing character-level changes.

from isoadverse.attacks.textbugger import textbugger_attack

print("\nRunning TextBugger Attack...")
perturbed_text = textbugger_attack(texts[0], num_bugs=5)
print("Original Text:", texts[0])
print("Perturbed Text:", perturbed_text)

DeepWordBug

DeepWordBug introduces word-level perturbations by modifying words in the text.

from isoadverse.attacks.deepwordbug import deepwordbug_attack

print("\nRunning DeepWordBug Attack...")
perturbed_text = deepwordbug_attack(texts[0], num_bugs=5)
print("Original Text:", texts[0])
print("Perturbed Text:", perturbed_text)

Utilities

IsoAdverse includes utility functions for loading data and models, making it easier to integrate into your existing workflow.

Data Loader

The data loader utility helps load and prepare text datasets for training and evaluation.

from isoadverse.utils.data_loader import get_data_loader

train_loader = get_data_loader(texts, labels, batch_size=2)

Model Loader

The model loader utility provides pre-trained models and tokenizers.

from isoadverse.utils.model_loader import get_model_and_tokenizer

model, tokenizer = get_model_and_tokenizer(model_name='bert-base-uncased')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iso-adverse-0.2.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

iso_adverse-0.2.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file iso-adverse-0.2.0.tar.gz.

File metadata

  • Download URL: iso-adverse-0.2.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.8

File hashes

Hashes for iso-adverse-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2c3871d97eb4d412a66c40e978ee56ca3fbf22d9daf0613ed497e7074f94ae7a
MD5 1fba7b11338658a5db312694a372ada9
BLAKE2b-256 ae4e287943f0cb87cab2bc593ed6a1f7dbec2dfd29c894a470fa55af8df849bc

See more details on using hashes here.

File details

Details for the file iso_adverse-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: iso_adverse-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.8

File hashes

Hashes for iso_adverse-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3aa8a0f7d607e54fbf15af7dbaade7e611f8c1bbe1681c599deb19d8725c35f7
MD5 ef99167e4e2b910014a3adb9f8c201a8
BLAKE2b-256 f9505b73c65e2aacf1fa7549dcc0ff53897d52873909e33971fbbf5349133edf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page