Tools for the statistical disclosure control of machine learning models
Project description
SACRO-ML: Disclosure Control Tools for ML Models
An increasing body of work has shown that machine learning (ML) models may expose confidential properties of the data on which they are trained. This has resulted in a wide range of proposed attack methods with varying assumptions that exploit the model structure and/or behaviour to infer sensitive information.
The sacroml package is a collection of tools and resources for managing the statistical disclosure control (SDC) of trained ML models. In particular, it provides:
- A safemodel package that extends commonly used ML models to provide ante-hoc SDC by assessing the theoretical risk posed by the training regime (such as hyperparameter, dataset, and architecture combinations) before (potentially) costly model fitting is performed. In addition, it ensures that best practice is followed with respect to privacy, e.g., using differential privacy optimisers where available. For large models and datasets, ante-hoc analysis has the potential for significant time and cost savings by helping to avoid wasting resources training models that are likely to be found to be disclosive after running intensive post-hoc analysis.
- An attacks package that provides post-hoc SDC by assessing the empirical disclosure risk of a classification model through a variety of simulated attacks after training. It provides an integrated suite of attacks with a common application programming interface (API) and is designed to support the inclusion of additional state-of-the-art attacks as they become available. In addition to membership inference attacks (MIA) such as the likelihood ratio attack (LiRA) and attribute inference, the package provides novel structural attacks that report cheap-to-compute metrics, which can serve as indicators of model disclosiveness after model fitting, but before needing to run more computationally expensive MIAs.
- Summaries of the results are written in a simple human-readable report.
Classification models from scikit-learn (including those implementing sklearn.base.BaseEstimator) and PyTorch are broadly supported within the package. Some attacks can still be run if only CSV files of the model predicted probabilities are supplied, e.g., if the model was produced in another language. See the examples for further information.
Installation
Python Package Index
$ pip install sacroml
Note: macOS users may need to install libomp due to a dependency on XGBoost:
$ brew install libomp
Conda
$ conda install sacroml
Usage
Quick-start example:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sacroml.attacks.likelihood_attack import LIRAAttack
from sacroml.attacks.target import Target
# Load dataset
X, y = load_breast_cancer(return_X_y=True, as_frame=False)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Fit model
model = RandomForestClassifier(min_samples_split=2, min_samples_leaf=1)
model.fit(X_train, y_train)
# Wrap model and data
target = Target(
model=model,
dataset_name="breast cancer",
X_train=X_train,
y_train=y_train,
X_test=X_test,
y_test=y_test,
)
# Create an attack object and run the attack
attack = LIRAAttack(n_shadow_models=100, output_dir="output_example")
attack.attack(target)
For more information, see the examples.
Documentation
See API documentation.
Contributing
See our contributing guide.
Acknowledgement
This work was supported by UK Research and Innovation as part of the Data and Analytics Research Environments UK (DARE UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automated Checking of Research Outputs (SACRO; MC_PC_23006), Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033), and TREvolution (MC_PC_24038). This project has also been supported by MRC and EPSRC (PICTURES; MR/S010351/1).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sacroml-1.4.3.tar.gz.
File metadata
- Download URL: sacroml-1.4.3.tar.gz
- Upload date:
- Size: 73.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7727ae4e605913b54a2971fae67ad6918a04420661ebaab0e5591a65521edc6c
|
|
| MD5 |
9b1a69a12e96e70387cf8dbf5cfa27a4
|
|
| BLAKE2b-256 |
f6d4dd8c0e9773dc50e592add305cf71b7885f7882694c2c25a9281582568f91
|
Provenance
The following attestation bundles were made for sacroml-1.4.3.tar.gz:
Publisher:
pypi.yml on AI-SDC/SACRO-ML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sacroml-1.4.3.tar.gz -
Subject digest:
7727ae4e605913b54a2971fae67ad6918a04420661ebaab0e5591a65521edc6c - Sigstore transparency entry: 870071225
- Sigstore integration time:
-
Permalink:
AI-SDC/SACRO-ML@b7b6a29f15d5b9f04ac17e75b53b39ede3c47380 -
Branch / Tag:
refs/tags/v1.4.3 - Owner: https://github.com/AI-SDC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@b7b6a29f15d5b9f04ac17e75b53b39ede3c47380 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sacroml-1.4.3-py3-none-any.whl.
File metadata
- Download URL: sacroml-1.4.3-py3-none-any.whl
- Upload date:
- Size: 86.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8820bc73c45fce1966fedf2702d1e05b71f38ae2f3446524d9c96847cfe798a1
|
|
| MD5 |
ee96e899d9c27e2d3a711b0e749838b3
|
|
| BLAKE2b-256 |
df627edc736a9e6088d0f36c7b17f5eddca21b5fab21f8ec521ce748135ffefa
|
Provenance
The following attestation bundles were made for sacroml-1.4.3-py3-none-any.whl:
Publisher:
pypi.yml on AI-SDC/SACRO-ML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sacroml-1.4.3-py3-none-any.whl -
Subject digest:
8820bc73c45fce1966fedf2702d1e05b71f38ae2f3446524d9c96847cfe798a1 - Sigstore transparency entry: 870071231
- Sigstore integration time:
-
Permalink:
AI-SDC/SACRO-ML@b7b6a29f15d5b9f04ac17e75b53b39ede3c47380 -
Branch / Tag:
refs/tags/v1.4.3 - Owner: https://github.com/AI-SDC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@b7b6a29f15d5b9f04ac17e75b53b39ede3c47380 -
Trigger Event:
release
-
Statement type: