Skip to main content

Concrete ML is an open-source set of tools which aims to simplify the use of fully homomorphic encryption (FHE) for data scientists.

Project description


📒 Read documentation | 💛 Community support


Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of Concrete by Zama. It aims to simplify the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. Concrete ML was designed with ease-of-use in mind, so that data scientists can use it without knowledge of cryptography. Notably, the Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.

Main features.

Data scientists can use models with APIs which are close to the frameworks they use, with additional options to run inferences in FHE.

Concrete ML features:

  • built-in models, which are ready-to-use FHE-friendly models with a user interface that is equivalent to their the scikit-learn and XGBoost counterparts
  • support for customs models that can use quantization aware training. These are developed by the user using PyTorch or keras/tensorflow and are imported into Concrete ML through ONNX

Installation.

Depending on your OS, Concrete ML may be installed with Docker or with pip:

OS / HW Available on Docker Available on pip
Linux Yes Yes
Windows Yes Coming soon
Windows Subsystem for Linux Yes Yes
macOS 11+ (Intel) Yes Yes
macOS 11+ (Apple Silicon: M1, M2, etc.) Yes Yes

Note: Concrete ML only supports Python 3.8, 3.9 and 3.10.

Concrete ML can be installed on Kaggle (see question on community for more details) and on Google Colab.

Docker

To install with Docker, pull the concrete-ml image as follows:

docker pull zamafhe/concrete-ml:latest

Pip

To install Concrete ML from PyPi, run the following:

pip install -U pip wheel setuptools
pip install concrete-ml

You can find more detailed installation instructions in this part of the documentation

A simple Concrete ML example with scikit-learn.

A simple example which is very close to scikit-learn is as follows, for a logistic regression :

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression

# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)

# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)

# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)

# We then compile on a representative set 
model.compile(X_train)

# Finally we run the inference on encrypted inputs !
y_pred_fhe = model.predict(X_test, fhe="execute")

print("In clear  :", y_pred_clear)
print("In FHE    :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")

# Output:
    # In clear  : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # In FHE    : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # Similarity: 100%

It is also possible to call encryption, model prediction, and decryption functions separately as follows. Executing these steps separately is equivalent to calling predict_proba on the model instance.

# Predict probability for a single example
y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")

# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])

# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)

# Execute the linear product in FHE
q_y_enc = model.fhe_circuit.run(q_input_enc)

# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)

# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))

print("Probability with `predict_proba`: ", y_proba_fhe)
print("Probability with encrypt/run/decrypt calls: ", y0)

This example is explained in more detail in the linear model documentation. Concrete ML built-in models have APIs that are almost identical to their scikit-learn counterparts. It is also possible to convert PyTorch networks to FHE with the Concrete ML conversion APIs. Please refer to the linear models, tree-based models and neural networks documentation for more examples, showing the scikit-learn-like API of the built-in models.

Documentation.

Full, comprehensive documentation is available here: https://docs.zama.ai/concrete-ml.

Online demos and tutorials.

Various tutorials are given for built-in models and for deep learning In addition, several complete use-cases are explored:

  • Encrypted Large Language Model: convert a user-defined part of a Large Language Model for encrypted text generation. Shows the trade-off between quantization and accuracy for text generation and shows how to run the model in FHE.

  • Credit Scoring: predict the chance of a given loan applicant defaulting on loan repayment while keeping the user's data private. Shows how Concrete ML models easily replace their scikit-learn equivalents

  • Health diagnosis: based on a patient's symptoms, history and other health factors, give a diagnosis using FHE to preserve the privacy of the patient.

  • Titanic: solve the Kaggle Titanic competition. Implemented with XGBoost from Concrete ML, this example comes as a companion of the Kaggle notebook, and was the subject of a blogpost in KDnuggets.

  • Sentiment analysis with transformers: predict if an encrypted tweet / short message is positive, negative or neutral, using FHE. The live interactive demo is available on Hugging Face. This blog post explains how this demo works!

  • CIFAR10 FHE-friendly model with Brevitas: train a VGG9 FHE-compatible neural network using Brevitas, and a script to run the neural network in FHE. Execution in FHE takes ~4 minutes per image and shows an accuracy of 88.7%.

  • CIFAR10 / CIFAR100 FHE-friendly models with Transfer Learning approach: series of three notebooks, that convert a pre-trained FP32 VGG11 neural network into a quantized model using Brevitas. The model is fine-tuned on the CIFAR data-sets, converted for FHE execution with Concrete ML and evaluated using FHE simulation. For CIFAR10 and CIFAR100, respectively, our simulations show an accuracy of 90.2% and 68.2%.

  • FHE neural network splitting for client/server deployment: explains how to split a computationally-intensive neural network model in two parts. First, we execute the first part on the client side in the clear, and the output of this step is encrypted. Next, to complete the computation, the second part of the model is evaluated with FHE. This tutorial also shows the impact of FHE speed/accuracy trade-off on CIFAR10, limiting PBS to 8-bit, and thus achieving 62% accuracy.

  • Encrypted image filtering: filter encrypted images by applying filters such as black-and-white, ridge detection, or your own filter.

If you have built awesome projects using Concrete ML, feel free to let us know and we'll link to them!

Citing Concrete ML

To cite Concrete ML, notably in academic papers, please use the following entry, which list authors by order of first commit:

@Misc{ConcreteML,
  title={Concrete {ML}: a Privacy-Preserving Machine Learning Library using Fully Homomorphic Encryption for Data Scientists},
  author={Zama},
  year={2022},
  note={\url{https://github.com/zama-ai/concrete-ml}},
}

Need support?

License.

This software is distributed under the BSD-3-Clause-Clear license. If you have any questions, please contact us at hello@zama.ai.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

concrete_ml-1.4.0rc1-py3-none-any.whl (210.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page