Machine learning using Secure Multi-Party Computation
Project description
TNO MPC Lab - MPyC - Secure Learning
The TNO MPC lab consists of generic software components, procedures, and functionalities developed and maintained on a regular basis to facilitate and aid in the development of MPC solutions. The lab is a cross-project initiative allowing us to integrate and reuse previously developed MPC functionalities to boost the development of new protocols and solutions.
The package tno.mpc.mpyc.secure_learning is part of the TNO Python Toolbox.
This library has been developed with funding from different projects.
In particular, the basic building blocks and an initial version of this library have been developed within the VP AI program (2018) and the ERP AI program (2019), including an SVM model and initial versions of other models.
The current secure logistic regression model has been developed within the TKI HTSM LANCELOT project, a research collaboration between TNO, IKNL and Janssen.
LANCELOT is partly funded by PPS-surcharge for Research and Innovation of the Dutch Ministry of Economic Affairs and Climate Policy.
The secure lasso regression model has been developed in the BigMedilytics project. This project has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 780495.
In collaboration with the MPC Lab, BigMedilytics, LANCELOT, NLAIC and Appl.AI, contributed to a restructuring of the codebase to ensure a generic reusable library which can be expanded with other models and functionalities.
Limitations in (end-)use: the content of this software package may solely be used for applications that comply with international export control laws.
This implementation of cryptographic software has not been audited. Use at your own risk.
Content Explanation
Implementation based on Secure Multi-Party Computation (MPC) for training and evaluating of several machine learning models. It makes use of the MPyC framework.
Features
The library implements secure versions of popular machine learning methods in the form of MPC protocols. The underlying MPC functionalities are provided by the MPyC framework.
The library contains both regression and classification algorithms.
In particular, linear regression is implemented, with l2-penalty (Ridge), l1-penalty (Lasso), or a combination of both (ElasticNets). For what concerns classification problems, Support Vector Machines (SVM) are implemented, as well as logistic "regression". For the latter, the user can choose between an accurate implementation of the logistic function, and an approximated, but faster version. l1 and/or l2 penalties can also be selected.
The library allows users to choose either the gradient-descent or the SAG solver in order to train the implemented models.
Limitations
Currently, no code is provided to securely apply the trained models.
Documentation
Documentation of the tno.mpc.mpyc.secure_learning package can be found here.
Install
Easily install the tno.mpc.mpyc.secure_learning package using pip:
$ python -m pip install tno.mpc.mpyc.secure_learning
Note:
A significant performance improvement can be achieved by installing the GMPY2 library.
$ python -m pip install 'tno.mpc.mpyc.secure_learning[gmpy]'
If you wish to run the tests you can use:
$ python -m pip install 'tno.mpc.mpyc.secure_learning[tests]'
Usage
Run these examples as python example.py --no-log
to suppress the MPyC barrier logging. Append the argument -M 3
to simulate a three-party protocol.
Example Usage
Click here for an example of securely training a simple linear regression model with L2 penalty (Ridge).
example.py
import numpy as np from mpyc.runtime import mpc from sklearn import datasets from sklearn.linear_model import Ridge as RidgeSK import tno.mpc.mpyc.secure_learning.test.plaintext_utils.plaintext_objective_functions as plain_obj from tno.mpc.mpyc.secure_learning import PenaltyTypes, Ridge, SolverTypes # Notice that we use the entire dataset to train the model n_samples = 50 n_features = 5 # Fixed random state for reproducibility random_state = 3 tolerance = 1e-4 secnum = mpc.SecFxp(l=64, f=32) def get_mpc_data(X, y): X_mpc = [[secnum(x, integral=False) for x in row] for row in X.tolist()] y_mpc = [secnum(y, integral=False) for y in y.tolist()] return X_mpc, y_mpc def distribute_data_over_players(X_mpc, y_mpc): X_shared = [mpc.input(row, senders=0) for row in X_mpc] y_shared = mpc.input(y_mpc, senders=0) return X_shared, y_shared async def ridge_regression_example(): print("Ridge regression with gradient descent method") alpha = 0.2 # Create regression dataset X, y = datasets.make_regression( n_samples=n_samples, n_features=n_features, noise=25.0, random_state=random_state, ) X = np.array(X) y = np.array(y) X_mpc, y_mpc = get_mpc_data(X, y) async with mpc: X_shared, y_shared = distribute_data_over_players(X_mpc, y_mpc) # Train secure model model = Ridge(solver_type=SolverTypes.GD, alpha=alpha) async with mpc: coef_ = await model.compute_coef_mpc( X_shared, y_shared, tolerance=tolerance, ) # Results of secure model objective = plain_obj.objective(X, y, coef_, "linear", PenaltyTypes.L2, alpha) print("Securely obtained coefficients:", coef_) print("* objective:", objective) # Train plaintext model model_sk = RidgeSK( alpha=len(X) * alpha, solver="saga", random_state=random_state, fit_intercept=True, ) model_sk.fit(X, y) # Results of plaintext model coef_sk = np.append([model_sk.intercept_], model_sk.coef_).tolist() objective_sk = plain_obj.objective(X, y, coef_sk, "linear", PenaltyTypes.L2, alpha) print("Sklearn obtained coefficients: ", coef_sk) print("* objective:", objective_sk) if __name__ == "__main__": mpc.run(ridge_regression_example())
Click here for an example of securely training a logistic regression model with L1 penalty.
example.py
import numpy as np from mpyc.runtime import mpc from sklearn import datasets from sklearn.linear_model import LogisticRegression as LogisticRegressionSK import tno.mpc.mpyc.secure_learning.test.plaintext_utils.plaintext_objective_functions as plain_obj from tno.mpc.mpyc.secure_learning import ( ClassWeightsTypes, ExponentiationTypes, Logistic, PenaltyTypes, SolverTypes, ) # Notice that we use the entire dataset to train the model n_samples = 50 n_features = 5 # Fixed random state for reproducibility random_state = 3 tolerance = 1e-4 secnum = mpc.SecFxp(l=64, f=32) def get_mpc_data(X, y): X_mpc = [[secnum(x, integral=False) for x in row] for row in X.tolist()] y_mpc = [secnum(y, integral=False) for y in y.tolist()] return X_mpc, y_mpc def distribute_data_over_players(X_mpc, y_mpc): X_shared = [mpc.input(row, senders=0) for row in X_mpc] y_shared = mpc.input(y_mpc, senders=0) return X_shared, y_shared def sklearn_class_weights_dict(y): n_class_1 = sum([((y_i + 1) / 2) for y_i in y]) n_class_0 = len(y) - n_class_1 w_0 = len(y) / (2 * n_class_0) w_1 = len(y) / (2 * n_class_1) return {-1: w_0, 1: w_1} async def logistic_regression_example(): print( "Classification (Logistic regression) with l1 penalty, with gradient descent method" ) alpha = 0.1 # Create classification dataset X, y = datasets.make_classification( n_samples=n_samples, n_features=n_features, n_informative=1, n_redundant=0, n_classes=2, n_clusters_per_class=1, random_state=random_state, shift=0, weights=[0.25, 0.75], ) # Transform labels from {0, 1} to {-1, +1}. y = [-1 if x == 0 else 1 for x in y] X = np.array(X) y = np.array(y) X_mpc, y_mpc = get_mpc_data(X, y) async with mpc: X_shared, y_shared = distribute_data_over_players(X_mpc, y_mpc) # Train secure model with approximation of logistic function (faster, less accurate) model = Logistic( solver_type=SolverTypes.GD, exponentiation=ExponentiationTypes.APPROX, penalty=PenaltyTypes.L1, alpha=alpha, class_weights_type=ClassWeightsTypes.BALANCED, ) async with mpc: coef_approx = await model.compute_coef_mpc( X_shared, y_shared, tolerance=tolerance ) class_weights_dict = model.reveal_class_weights(y_shared) # Results of secure model (approximated logistic function) objective_approx = plain_obj.objective( X, y, coef_approx, "logistic", PenaltyTypes.L1, alpha, class_weights_dict ) print( "Securely obtained coefficients (approximated exponentiation):", coef_approx, ) print("* objective:", objective_approx) print("Class weights dictionary:", class_weights_dict) # Train secure model with exact logistic function (slower, more accurate) model = Logistic( solver_type=SolverTypes.GD, exponentiation=ExponentiationTypes.EXACT, penalty=PenaltyTypes.L1, alpha=alpha, class_weights_type=ClassWeightsTypes.BALANCED, ) async with mpc: coef_exact = await model.compute_coef_mpc( X_shared, y_shared, tolerance=tolerance ) # Results of secure model (exact logistic function) objective_exact = plain_obj.objective( X, y, coef_exact, "logistic", PenaltyTypes.L1, alpha, class_weights_dict, ) print( "Securely obtained coefficients (exact exponentiation): ", coef_exact, ) print("* objective:", objective_exact) print("Class weights dictionary:", class_weights_dict) # Train plaintext model model_sk = LogisticRegressionSK( solver="saga", random_state=random_state, fit_intercept=True, penalty="l1", C=1 / (len(X) * alpha), class_weight="balanced", ) class_weights_dict_sk = sklearn_class_weights_dict(y) model_sk.fit(X, y) coef_sk = np.append([model_sk.intercept_], model_sk.coef_).tolist() # Results of plaintest model objective_sk = plain_obj.objective( X, y, coef_sk, "logistic", PenaltyTypes.L1, alpha ) print("Sklearn obtained coefficients: ", coef_sk) print("* objective:", objective_sk) if __name__ == "__main__": mpc.run(logistic_regression_example())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for tno.mpc.mpyc.secure_learning-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b6fff14f1f0933ac56d9e2ce8cd9b40427b6010b12e58f9c7a4084f22eb24a7 |
|
MD5 | dd5769d4542835849144105bdef864c6 |
|
BLAKE2b-256 | 30c3334257b3f06694b2c2c0a9b2e1fff3b608a821c3da7962c0981f1a01f907 |