bitfount

Machine Learning and Federated Learning Library.

Project description

Federated learning and data analytics that just works

Python versions

Using the Docker images
Running the Python code
License

Using the Docker images

There are two docker images, one for running a Pod (ghcr.io/bitfount/pod:stable), and another for running a modelling task (ghcr.io/bitfount/modeller:stable).

Both of the images require a config.yaml file to be provided to them, by default they will try to load it from /mount/config/config.yaml inside the docker container. You can provide this file easily by mounting/binding a volume to the container, how you do this may vary depending on your platform/environment (Docker/docker-compose/ECS), if you have any problems doing this then feel free to reach out to us.

Alternative you could copy a config file into a stopped container using docker cp.

If you're using a CSV data source then you'll also need to mount your data to the container, this will need to be mounted at the path specified in your config, for simplicity it's easiest put your config and your CSV in the same directory and then mount it to the container.

Once your container is running you will need to check the logs and complete the login step, allowing your container to authenticate with Bitfount. The process is the same as when running locally (e.g. the tutorials), except that we can't open the login page automatically for you.

Running the Python code

Installation

Where to get it

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install bitfount

For DICOM support, you will need to install the DICOM extras:

pip install 'bitfount[dicom]'

If you want to use differential privacy (DP), you will need to install the DP extras:

pip install 'bitfount[dp]'

Ensure you are using python 3.8 or 3.9. The DP extra is not supported on 3.10.

If you are planning on using the bitfount package with Jupyter Notebooks, we recommend you install the splinter package bitfount[tutorials] which will make sure you are running compatible jupyter dependencies.

pip install 'bitfount[tutorials]'

Installation from sources

To install bitfount from source you need to create a python virtual environment.

In the bitfount directory (same one where you found this file after cloning the git repo), execute:

pip install -r requirements/requirements.in

These requirements are set to permissive ranges but are not guaranteed to work for all releases, especially the latest versions. For a pinned version of these requirements which are guaranteed to work, run the following command instead:

#!/bin/bash
PYTHON_VERSION=$(python -c "import platform; print(''.join(platform.python_version_tuple()[:2]))")
pip install -r requirements/${PYTHON_VERSION}/requirements.txt

To be able to use differential privacy (DP), you will need to additionally install the DP requirements. Please note that this is only compatible with Python version 3.8 and 3.9. Also, it is restricted to non-arm architectures:

#!/bin/bash
PYTHON_VERSION=$(python -c "import platform; print(''.join(platform.python_version_tuple()[:2]))")
PLATFORM_PROCESSOR=$(python -c "import platform; print(platform.processor())")

if [[ ${PYTHON_VERSION} == "38" || ${PYTHON_VERSION} == "39" ]] && [[ ${PLATFORM_PROCESSOR} != "arm" ]]; then
    pip install -r requirements/${PYTHON_VERSION}/differential_privacy/requirements-dp.txt
fi

For MacOS you also need to install libomp:

brew install libomp

Getting started (Tutorials)

In order to run the tutorials, you also need to install the tutorial requirements:

#!/bin/bash
PYTHON_VERSION=$(python -c "import platform; print(''.join(platform.python_version_tuple()[:2]))")
pip install -r requirements/${PYTHON_VERSION}/requirements-tutorial.txt

To get started using the Bitfount package in a federated setting, we recommend that you start with our tutorials. Run jupyter notebookand open up the first tutorial in the "Connecting Data & Creating Pods folder: running_a_pod.ipynb

Federated training scripts

Some simple scripts have been provided to run a Pod or Modelling job from a config file.

⚠️ If you are running from a source install (such as from git clone) you will need to use python -m scripts.<script_name> rather than use bitfount <script_name> directly.

To run a pod:

bitfount run_pod --path_to_config_yaml=<CONFIG_FILE>

To run a modelling job:

bitfount run_modeller --path_to_config_yaml=<CONFIG_FILE>

Basic Local Usage

As well as providing the ability to use data in remote pods, this package also enables local ML training. Some example code for this purpose is given below.

1. Import bitfount

import bitfount as bf

2. Create DataSource and load data

census_income = bf.CSVSource(
    path="https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/census_income.csv",
    ignore_cols=["fnlwgt"],
)
census_income.load_data()

3. Create Schema

schema = bf.BitfountSchema(
    census_income,
    table_name="census_income",
    force_stypes={
        "census_income": {
            "categorical":[
                "TARGET",
                "workclass",
                "marital-status",
                "occupation",
                "relationship",
                "race",
                "native-country",
                "gender",
                "education"
            ]
        }
    }
)

4. Transform Data

clean_data = bf.CleanDataTransformation()
processor = bf.TransformationProcessor([clean_data], schema.get_table_schema("census_income"))
census_income.data = processor.transform(census_income.data)
schema.add_datasource_tables(census_income, table_name="census_income")

5. Create DataStructure

census_income_data_structure=bf.DataStructure(
  table="census_income",
  target="TARGET",
)

6. Create and Train Model

nn = bf.PyTorchTabularClassifier(
    datastructure=census_income_data_structure,
    schema=schema,
    epochs=2,
    batch_size=256,
    optimizer=bf.Optimizer("RAdam", {"lr": 0.001}),
)
nn.fit(census_income)
nn.serialize("demo_task_model.pt")

7. Evaluate

preds, target = nn.evaluate()
metrics = bf.MetricCollection.create_from_model(nn)
results = metrics.compute(target, preds)
print(results)

8. Assert results

import numpy as np
assert nn._validation_results[-1]["validation_loss"] is not np.nan
assert results["AUC"] > 0.7

License

The license for this software is available in the LICENSE file. This can be found in the Github Repository, as well as inside the Docker image.

Project details

Release history Release notifications | RSS feed

This version

0.9.5

Feb 23, 2024

0.9.4

Feb 20, 2024

0.9.3

Feb 7, 2024

0.9.2

Feb 6, 2024

0.8.0

Jan 29, 2024

0.7.4

Jan 25, 2024

0.7.3

Jan 19, 2024

0.7.2

Jan 16, 2024

0.7.1

Jan 8, 2024

0.7.0

Jan 2, 2024

0.6.18

Dec 15, 2023

0.6.16

Dec 7, 2023

0.6.15

Nov 16, 2023

0.6.14

Nov 14, 2023

0.6.12

Nov 7, 2023

0.6.10

Oct 26, 2023

0.6.9

Oct 23, 2023

0.6.8

Oct 18, 2023

0.6.7

Oct 16, 2023

0.6.6

Oct 10, 2023

0.6.5

Oct 4, 2023

0.6.4

Oct 2, 2023

0.6.3

Sep 27, 2023

0.6.2

Sep 22, 2023

0.6.1

Sep 6, 2023

0.6.0

Aug 22, 2023

0.5.86

Aug 3, 2023

0.5.85

Jul 21, 2023

0.5.84

Jul 19, 2023

0.5.83

Jul 12, 2023

0.5.82

Jul 11, 2023

0.5.81

Jul 5, 2023

0.5.80

Jul 5, 2023

0.5.79

Jun 27, 2023

0.5.78

Jun 26, 2023

0.5.77

Jun 23, 2023

0.5.75

Jun 9, 2023

0.5.74

Jun 2, 2023

0.5.73

May 19, 2023

0.5.72

May 18, 2023

0.5.71

May 11, 2023

0.5.67

May 3, 2023

0.5.66

May 2, 2023

0.5.65

Apr 12, 2023

0.5.64

Apr 5, 2023

0.5.63

Mar 29, 2023

0.5.62

Mar 27, 2023

0.5.61

Mar 22, 2023

0.5.60

Mar 7, 2023

0.5.59

Mar 1, 2023

0.5.58

Feb 27, 2023

0.5.57

Feb 16, 2023

0.5.56

Feb 14, 2023

0.5.55

Feb 7, 2023

0.5.54

Feb 2, 2023

0.5.53

Jan 25, 2023

0.5.52

Jan 16, 2023

0.5.51

Jan 9, 2023

0.5.50

Dec 20, 2022

0.5.49

Dec 20, 2022

0.5.48

Dec 16, 2022

0.5.47

Dec 12, 2022

0.5.46

Dec 12, 2022

0.5.45

Dec 7, 2022

0.5.42

Dec 2, 2022

0.5.41

Nov 22, 2022

0.5.38

Oct 24, 2022

0.5.37

Oct 10, 2022

0.5.36

Oct 6, 2022

0.5.35

Sep 20, 2022

0.5.34

Sep 16, 2022

0.5.33

Sep 15, 2022

0.5.32

Sep 7, 2022

0.5.31

Aug 31, 2022

0.5.30

Aug 8, 2022

0.5.29

Jul 29, 2022

0.5.28

Jul 26, 2022

0.5.27

Jul 13, 2022

0.5.26

Jun 30, 2022

0.5.25

Jun 10, 2022

0.5.24

Jun 6, 2022

0.5.23

May 27, 2022

0.5.22

May 17, 2022

0.5.20

May 5, 2022

0.5.19

Apr 26, 2022

0.5.18

Apr 6, 2022

0.5.17

Apr 1, 2022

0.5.16

Mar 25, 2022

0.5.15

Mar 23, 2022

0.5.14

Mar 22, 2022

0.5.13

Mar 15, 2022

0.5.12

Mar 14, 2022

0.5.11

Mar 14, 2022

0.5.10

Feb 16, 2022

0.5.9

Feb 4, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitfount-0.9.5.tar.gz (533.4 kB view hashes)

Uploaded Feb 23, 2024 Source

Built Distribution

bitfount-0.9.5-py3-none-any.whl (609.0 kB view hashes)

Uploaded Feb 23, 2024 Python 3

Hashes for bitfount-0.9.5.tar.gz

Hashes for bitfount-0.9.5.tar.gz
Algorithm	Hash digest
SHA256	`ec213f4da3b9091595c586bb75ebf48d99d24ed979c7be14ca51a558891bfe4e`
MD5	`194d01bdbf41ecfa8a44bc46971a428a`
BLAKE2b-256	`0aadda8b64ae321ff6898b38ba4cbecfb952736460df00de52449e95f46d65e3`

Hashes for bitfount-0.9.5-py3-none-any.whl

Hashes for bitfount-0.9.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eaf10399547338361fc05a83128cf7fb73628c28c1257c613f30c16304e5f3fe`
MD5	`82cb438f2eef7f33217e9d033d26e95b`
BLAKE2b-256	`5ef87c81e8ceecb7d938d93fdbb9da5717aed31cbf2f1c91c69532280ad99f34`