Security and Privacy Risk Simulator for Machine Learning

Project description

AIJack: Security and Privacy Risk Simulator for Standard/Distributed Machine Learning

❤️ If you like AIJack, please consider becoming a GitHub Sponsor ❤️

What is AIJack?

AIJack allows you to assess the privacy and security risks of machine learning algorithms such as Model Inversion, Poisoning Attack, Evasion Attack, Free Rider, and Backdoor Attack. AIJack also provides various defense techniques like Differential Privacy, Homomorphic Encryption, and other heuristic approaches. In addition, AIJack provides APIs for many distributed learning schemes like Federated Learning and Split Learning. You can integrate many attack and defense methods into such collaborative learning with a few lines. We currently implement more than 30 state-of-arts methods. For more information, see the documentation.

Installation

You can install AIJack with pip. AIJack requires Boost and pybind11.

apt install -y libboost-all-dev
pip install -U pip
pip install "pybind11[global]"

pip install aijack

If you want to use the latest-version, you can directly install from GitHub.

pip install git+https://github.com/Koukyosyumei/AIJack

You can also use our Dockerfile.

Quick Start

We briefly introduce some example usages. You can also find more examples in documentation.

Basic Interface

For standard machine learning algorithm, AIJack allows you to simulate attacks against machine learning models with Attacker APIs. AIJack mostly supports PyTorch or sklearn models.

abstract code

attacker = Attacker(target_model)
result = attacker.attack()

For distributed learning such as Fedeated Learning, AIJack offers four basic APIs: Client, Server, API, and Manager. Client and Server represents each client and server within each distributed learning scheme, and we register the clients and servers to API. You can run this API and execute training via run method. Manager gives additional abilities such as attack, defense or parallel computing to Client, Server or API via attach method.

abstract code

client = [Client(), Client()]
server = Server()
api = API(client, server)
api.run() # execute training

c_manager = ClientManager()
s_manager = ServerManager()
ExtendedClient = c_manager.attach(Client)
ExtendedServer = c_manager.attach(Server)

extended_client = [ExtendedClient(), ExtendedClient()]
extended_server = ExtendedServer()
api = API(extended_client, extended_server)
api.run() # execute training

Federated Learning

FedAVG

FedAVG is the most representative algorithm of Federated Learning, where multiple clients jointly train a single model without sharing their local datasets. You can integrate any Pytorch models.

from aijack.collaborative.fedavg import FedAVGClient, FedAVGServer

clients = [FedAVGClient(local_model_1, user_id=0), FedAVGClient(local_model_2, user_id=1)]
optimizers = [optim.SGD(clients[0].parameters()), optim.SGD(clients[1].parameters())]

server = FedAVGServer(clients, global_model)

api = FedAVGAPI(
    server,
    clients,
    criterion,
    optimizers,
    dataloaders
)
api.run()

FedMD

Model-Distillation based Federated Learning does not need communicating gradients, which might decrease the information leakage.

from aijack.collaborative.fedmd import FedMDAPI, FedMDClient, FedMDServer

clients = [
    FedMDClient(Net().to(device), public_dataloader, output_dim=10, user_id=c)
    for c in range(client_size)
]
local_optimizers = [optim.SGD(client.parameters(), lr=lr) for client in clients]

server = FedMDServer(clients, Net().to(device))

api = FedMDAPI(
    server,
    clients,
    public_dataloader,
    local_dataloaders,
    F.nll_loss,
    local_optimizers,
    test_dataloader,
    num_communication=2,
)
api.run()

SecureBoost (Vertical Federated version of XGBoost)

AIJack supports not only neuralnetwork but also tree-based Federated Learning.

from aijacl.collaborative.tree import SecureBoostClassifierAPI, SecureBoostClient

keygenerator = PaillierKeyGenerator(512)
pk, sk = keygenerator.generate_keypair()

sclf = SecureBoostClassifierAPI(2,subsample_cols,min_child_weight,depth,min_leaf,
                  learning_rate,boosting_rounds,lam,gamma,eps,0,0,1.0,1,True)

sp1 = SecureBoostClient(x1, 2, [0], 0, min_leaf, subsample_cols, 256, False, 0)
sp2 = SecureBoostClient(x2, 2, [1], 1, min_leaf, subsample_cols, 256, False, 0)
sparties = [sp1, sp2]

sparties[0].set_publickey(pk)
sparties[0].set_secretkey(sk)
sparties[1].set_publickey(pk)

sclf.fit(sparties, y)
sclf.predict_proba(X)

MPI-backend

AIJack supports MPI-backend for some of Federated Learning methods.

FedAVG

from mpi4py import MPI
from aijack.collaborative.fedavg import FedAVGClient, FedAVGServer
from aijack.collaborative.fedavg import MPIFedAVGAPI, MPIFedAVGClientManager, MPIFedAVGServerManager

comm = MPI.COMM_WORLD
myid = comm.Get_rank()

mpi_client_manager = MPIFedAVGClientManager()
mpi_server_manager = MPIFedAVGServerManager()
MPIFedAVGClient = mpi_client_manager.attach(FedAVGClient)
MPIFedAVGServer = mpi_server_manager.attach(FedAVGServer)

if myid == 0:
    server = MPIFedAVGServer(comm, FedAVGServer(client_ids, model))
    api = MPIFedAVGAPI(
        comm,
        server,
        True,
        F.nll_loss,
        None,
        None,
        num_rounds,
        1,
    )
else:
    client = MPIFedAVGClient(comm, FedAVGClient(model, user_id=myid))
    api = MPIFedAVGAPI(
        comm,
        client,
        False,
        F.nll_loss,
        optimizer,
        dataloader,
        num_rounds,
        1,
    )

api.run()

FedMD

from mpi4py import MPI
from aijack.collaborative.fedmd import MPIFedMDAPI, MPIFedMDClient, MPIFedMDServer

comm = MPI.COMM_WORLD
myid = comm.Get_rank()

if myid == 0:
    server = MPIFedMDServer(comm, FedMDServer(client_ids, model))
    api = MPIFedMDAPI(
        comm,
        server,
        True,
        F.nll_loss,
        None,
        None,
    )
else:
    client = MPIFedMDClient(comm, FedMDClient(model, public_dataloader, output_dim=10, user_id=myid))
    api = MPIFedMDAPI(
        comm,
        client,
        False,
        F.nll_loss,
        optimizer,
        dataloader,
        public_dataloader,
    )

api.run()

Attack: Model Inversion

Model Inversion Attack steals the local training data via the shared information like gradients or parameters.

from aijack.attack.inversion import GradientInversionAttackServerManager

manager = GradientInversionAttackServerManager(input_shape, distancename="l2")
GradientInversionAttackFedAVGServer = manager.attach(FedAVGServer)

server = GradientInversionAttackFedAVGServer(clients, global_model)

api = FedAVGAPI(
    server,
    clients,
    criterion,
    optimizers,
    dataloaders
)
api.run()

reconstructed_training_data = server.attack()

Defense: Differential Privacy

One possible defense against Model Inversion Attack is using differential privacy. AIJack supports DPSGD, an optimizer which makes the trained model satisfy differential privacy.

from aijack.defense.dp import DPSGDManager, GeneralMomentAccountant, DPSGDClientManager

dp_accountant = GeneralMomentAccountant()
dp_manager = DPSGDManager(
    accountant,
    optim.SGD,
    dataset=trainset,
)

manager = DPSGDClientManager(dp_manager)
DPSGDFedAVGClient = manager.attach(FedAVGClient)

clients = [DPSGDFedAVGClient(local_model_1, user_id=0), DPSGDFedAVGClient(local_model_2, user_id=1)]

Defense: Soteria

Another defense algorithm soteria, which theoretically gurantees the lowerbound of reconstructino error.

from aijack.defense.soteria import SoteriaClientManager

manager = SoteriaClientManager("conv", "lin", target_layer_name="lin.0.weight")
SoteriaFedAVGClient = manager.attach(FedAVGClient)

clients = [SoteriaFedAVGClient(local_model_1, user_id=0), SoteriaFedAVGClient(local_model_2, user_id=1)]

Defense: Homomorophic Encryption

Clients in Federated Learning can also encrypt their local gradients to prevent the potential information leakage. For example, AIJack offers Paiilier Encryption with c++ backend, which faster than other python-based implementations.

from aijack.defense.paillier import PaillierGradientClientManager, PaillierKeyGenerator

keygenerator = PaillierKeyGenerator(key_length)
pk, sk = keygenerator.generate_keypair()

manager = PaillierGradientClientManager(pk, sk)
PaillierGradFedAVGClient = manager.attach(FedAVGClient)

clients = [
  PaillierGradFedAVGClient(local_model_1, user_id=0, server_side_update=False),
  PaillierGradFedAVGClient(local_model_2, user_id=1, server_side_update=False)
    ]

server = FedAVGServer(clients, global_model, lr=lr, server_side_update=False)

Attack: Poisoning

Poisoning Attack aims to deteriorate the performance of the trained model.

One famous approach is Label Flip Attack.

from aijack.attack.poison import LabelFlipAttackClientManager

manager = LabelFlipAttackClientManager(victim_label=0, target_label=1)
LabelFlipAttackFedAVGClient = manager.attach(FedAVGClient)

clients = [LabelFlipAttackFedAVGClient(local_model_1, user_id=0), FedAVGClient(local_model_2, user_id=1)]

Defense: FoolsGOld

One of the standard method to mitigate Poisoning Attack is FoolsGold, which calculates the similarity among clients and decrease the influence of the malicious clients.

from aijack.defense.foolsgold import FoolsGoldServerManager

manager = FoolsGoldServerManager()
FoolsGoldFedAVGServer = manager.attach(FedAVGServer)
server = FoolsGoldFedAVGServer(clients, global_model)

Attack: FreeRider

In real situation where the center server pay money for clients, it is important to detect freeriders who do not anything but pretend to locally train their models.

from aijack.attack.freerider import FreeRiderClientManager

manager = FreeRiderClientManager(mu=0, sigma=1.0)
FreeRiderFedAVGClient = manager.attach(FedAVGClient)

clients = [FreeRiderFedAVGClient(local_model_1, user_id=0), FedAVGClient(local_model_2, user_id=1)]

Split Learning

Split Learning is another collaborative learning scheme, where only one party owns the ground-truth labels.

SplitNN

from aijack.collaborative.splitnn import SplitNNAPI, SplitNNClient

clients = [SplitNNClient(model_1, user_id=0), SplitNNClient(model_2, user_id=1)]
optimizers = [optim.Adam(model_1.parameters()), optim.Adam(model_2.parameters())]

splitnn = SplitNNAPI(clients, optimizers, train_loader, criterion, num_epoch)
splitnn.run()

Attack: Label Leakage

AIJack supports norm-based label leakage attack against Split Learning.

from aijack.attack.labelleakage import NormAttackManager

manager = NormAttackManager(criterion, device="cpu")
NormAttackSplitNNAPI = manager.attach(SplitNNAPI)
normattacksplitnn = NormAttackSplitNNAPI(clients, optimizers)
leak_auc = normattacksplitnn.attack(target_dataloader)

Supported Algorithms

Distributed Learning

	Example	Paper
FedAVG	example	paper
FedProx	WIP	paper
FedKD	example	paper
FedMD	example	paper
FedGEMS	WIP	paper
DSFL	WIP	paper
SplitNN	example	paper
SecureBoost	example	paper

Attack

	Attack Type	Example	Paper
MI-FACE	Model Inversion	example	paper
DLG	Model Inversion	example	paper
iDLG	Model Inversion	example	paper
GS	Model Inversion	example	paper
CPL	Model Inversion	example	paper
GradInversion	Model Inversion	example	paper
GAN Attack	Model Inversion	example	paper
Shadow Attack	Membership Inference	example	paper
Norm attack	Label Leakage	example	paper
Delta Weights	Free Rider Attack	WIP	paper
Gradient descent attacks	Evasion Attack	example	paper
DBA	Backdoor Attack	WIP	paper
Label Flip Attack	Poisoning Attack	example	paper
History Attack	Poisoning Attack	example	paper
MAPF	Poisoning Attack	example	paper
SVM Poisoning	Poisoning Attack	example	paper

Defense

	Defense Type	Example	Paper
DPSGD	Differential Privacy	example	paper
Paillier	Homomorphic Encryption	example	paper
CKKS	Homomorphic Encryption	test	paper
Soteria	Others	example	paper
FoolsGold	Others	WIP	paper
Sparse Gradient	Others	example	paper
MID	Others	example	paper

Contact

welcome2aijack[@]gmail.com

Project details

Release history Release notifications | RSS feed

0.0.1b2 pre-release

Jan 1, 2024

0.0.1b1 pre-release

Aug 28, 2023

0.0.1a2 pre-release

Feb 17, 2023

This version

0.0.1a1 pre-release

Jan 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aijack-0.0.1a1.tar.gz (127.5 kB view details)

Uploaded Jan 2, 2023 Source

File details

Details for the file aijack-0.0.1a1.tar.gz.

File metadata

Download URL: aijack-0.0.1a1.tar.gz
Upload date: Jan 2, 2023
Size: 127.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for aijack-0.0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`b4525a4052f96e2f93a4d8e162e5efc88b84ed80f7e08b2c0fa232983c6abcdd`
MD5	`8d5a535e8bafcf7d616a1f240447ab58`
BLAKE2b-256	`bd3499aa20b53ff349db1f3626f2cc46bcfa53cbe643d76f4c8b4229c29b51f7`

See more details on using hashes here.

aijack 0.0.1a1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AIJack: Security and Privacy Risk Simulator for Standard/Distributed Machine Learning

What is AIJack?

Installation

Quick Start

Basic Interface

Federated Learning

FedAVG

FedMD

SecureBoost (Vertical Federated version of XGBoost)

MPI-backend

Attack: Model Inversion

Defense: Differential Privacy

Defense: Soteria

Defense: Homomorophic Encryption

Attack: Poisoning

Defense: FoolsGOld

Attack: FreeRider

Split Learning

SplitNN

Attack: Label Leakage

Supported Algorithms

Distributed Learning

Attack

Defense

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes