Skip to main content

Secretflow Secure Machine Learning

Project description

SML: Secure Machine Learning

SML is a python module implementing machine learning algorithm with JAX, which can do secure training and inferring under the magic of SPU.

Our vision is to establish a general-purpose privacy-preserving machine learning(PPML) library, being a secure version of scikit-learn.

Normally, the APIs of our algorithms are designed to be as consistent as possible with scikit-learn. However, due to safety considerations and certain limitations of the SPU, some APIs will undergo changes. Detailed explanations will be provided for any differences in the doc.

Why not scikit-learn

First, scikit-learn is built top on Numpy and SciPy, running on centralized mode. So you must collect all data into one node, which can't protect the privacy of data.

The implementations in scikit-learn are usually very efficient and valid, then why not we just "translate" it to MPC?

The quick answer for this question is accuracy and efficiency.

In PPML, we observe that most framework encodes floating-point to fixed-point number, which parameterized by field(bitwidth of underlying integer) and fxp_fraction_bits(fractional part bitwidth), greatly restricting the effective range and precision of floating-point numbers. on other hand, The major determinant of computational overhead is determined by the MPC protocol, so the origin cpu-friendly ops may have pool performance.

Our Solution

So we establish a new library SML trying to bridge these gaps:

  1. accuracy: optimize and test the algorithm based on fixed-point number, e.g. prefer high-precision ops(rsqrt rather than 1/sqrt), essential re-transform to accommodate the valid range of non-linear ops (see fxp pitfalls).
  2. efficiency: use MPC-friendly op to replace CPU-friendly op, e.g. use numeric approximation trick to avoid sophistic computation, prefer arithmetic ops to comparison ops.

Of course, we also supply an easy-to-test toolbox for advanced developer who wants to develop their own MPC program:

  1. Simulator: provide a fixed-point computation environment and run at high speed. But it's unable to provide a real SPU performance environment, the test results cannot reflect the actual performance of the algorithm.
  2. Emulator: emulate on the real MPC protocol using multiple processes/Docker(coming soon), and can provide effective performance results.

So the accuracy can be proved if the algorithm pass the test of simulator, and you should test the efficiency using emulator.

WARNING: currently, SML is undergoing rapid developments, so it is not recommended for direct use in production environments.

Installation

First, you should clone the spu repo to your local disk:

git clone https://github.com/secretflow/spu.git

Some Prerequisites are required according to your system. After all these installed, you can run any test like:

# run kmeans simulation
# simulation: run program in single process
# used for correctness test
pytest -n auto sml/sml/cluster/tests/kmeans_test.py

# run kmeans emulation
# emulation: run program with multiple processes(LAN setting)
# or multiple dockers(WAN setting, will come soon)
# used for efficiency test.
python3 sml/sml/cluster/emulations/kmeans_emul.py

Algorithm Support lists

See support lists for all our algorithms and features we support.

Development

See development if you would like to contribute to SML.

FAQ

We collect some FAQ, you can check it first before submitting an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sf_sml-0.1.0.dev20250623-py3-none-manylinux_2_28_aarch64.whl (209.1 kB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64

sf_sml-0.1.0.dev20250623-py3-none-macosx_12_0_arm64.whl (209.1 kB view details)

Uploaded Python 3macOS 12.0+ ARM64

File details

Details for the file sf_sml-0.1.0.dev20250623-py3-none-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sf_sml-0.1.0.dev20250623-py3-none-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f49d2d9953aafd22743e3c146ef615a4d18c2ce0ddace6eb698b88f03c66cbc4
MD5 abd23f600a8e84db0825bfe1135f5b1b
BLAKE2b-256 ac15701d0df4bd8da6e743eb4901be098a9f3a2be1cd3205948a5a937981bdb4

See more details on using hashes here.

File details

Details for the file sf_sml-0.1.0.dev20250623-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sf_sml-0.1.0.dev20250623-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d3733b91ccf1b13562ae92f7269f81424f5ee45005893fba46d56f1d5e6abb8f
MD5 67478582ecf3c7f67df28cb7eb1a5821
BLAKE2b-256 8f510fbece1086b734e5ab0a5505315594f4ccc189d066392f974b2ee718ff64

See more details on using hashes here.

File details

Details for the file sf_sml-0.1.0.dev20250623-py3-none-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for sf_sml-0.1.0.dev20250623-py3-none-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 12d49cf6f6382ce8d8c8f28bc3c7279999fdb20278f31edc996eda25b11df53f
MD5 573b0b68ffdb7f286b3266e487b6a2be
BLAKE2b-256 46c6cd4a1c8de87f80aab6c60888c07e2769b16175016e43b618ec8d3480f6bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page