Skip to main content

Federated Learning for Health

Project description

FL4Health

Principally, this repository contains the federated learning (FL) engine aimed at facilitating FL research, experimentation, and exploration, specifically targetting health applications.

The library source code is housed in the fl4health folder. This library is built on the foundational components of Flower, an open-source FL library in its own right. The documentation is here. This library contains a number of unique components that extend the functionality of Flower in a number of directions.

Summary of Approaches

The present set of FL approaches implemented in the library are:

More approaches are being implemented as they are prioritized. However, the library also provides significant flexibiltiy to implement strategies of your own.

Privacy Capabilities

In addition to the FL strategies, we also support several differentially private FL training approaches. These include:

Components

Checkpointing

Contains modules associated with basic checkpointing. Currently only supports checkpointing of pytorch models.

Client Managers

Houses modules associated with custom functionality on top of Flower's client managers. Client managers are responsible for, among other things, coordinating and sampling clients to participate in server rounds. We support several ways to sample clients in each round.

Clients

Here, implementations for specific FL strategies that affect client-side training or enforce certrain properties during training are housed. There is also a basic client that implements standard client-side optimization flows for convenience. For example, the FedProxClient adds the requisite proximal loss term to a provided standard loss prior to performing optimization.

Model Bases

Certain methods require special model architectures. For example APFL has twin models and separate global and personal forward passes. It also has a special update function associated with the convex combination parameter $\alpha$. This folder houses special code to facilitate use of these customizations to the neural network architectures.

Parameter Exchange

In vanilla FL, all model weights are exchanged between the server and clients. However, in many cases, either more or less information needs to be exchanged. SCAFFOLD requires that both weights and associated "control variates" be exchanged between the two entities. On the other hand, APFL only exchanges a subset of the parameters. The classes in this folder facilitate the proper handling of both of these situtations. More complicated adaptive parameter exchange techniques are also considered here. There is an example of this type of approach in the Examples folder under the partial_weight_exchange_example.

Privacy

This folder holds the current differential privacy accountants for the instance and client-level DP methods that have been implemented. They are based on the established "Moments Accountants." However, we are working to move these to the new "PRV Accountants."

Reporting

Currently, this holds the reporting integrations with Weights and Biases for experiment logging. It is capable of capturing both Server and Client metrics. For an example of using this integration, see the fedprox_example.

Server

Certain FL methods, such as Client-Level DP and SCAFFOLD with Warm Up, require special server-side flows to ensure that everything is properly handled. This code also establishes initialization communication between the client and server. For example, one can poll each of the clients to obtain the size of each client's dataset before proceeding to FL training.

Strategies

This folder contains implementations of distinct strategies going beyond those implemented in the standard Flower library. Certain methods require distinct aggregation procedures, such as Client-level differential privacy with adaptive clipping where a noisy aggregation must take place and special considerations are required for the clipping bits. Implementation of new strategies here allows one to customize the way in which parameters and other information communicated between a server and the clients is aggregated.

Examples

The examples folder contains an extensive set of ways to use the various components of the library, setup the different strategies implemented in the library, and how to run federated learning in general. These examples are an accessbile way to learn what is required to experiment with different FL capabilties. Each example has some documentation describing what is being implemented and how to run the code to see it in action. The examples span basic FedAvg implementations to differentially private SCAFFOLD.

NOTE: The contents of the examples folder is not packed with the FL4Health library on release to PyPi

Research Code

The research folder houses code associated with various research being conducted by the team at Vector. It may be used to perform experiments on the Cluster or to reproduce experiments from our research. The current research is:

  • FENDA-FL FLamby Experiments. There is a README in that folder that provides details on how to run the hyper-parameter sweeps, evaluations, and other experiments.

NOTE: The contents of the research folder is not packed with the FL4Health library on release to PyPi

Tests

All tests for the library are housed in the tests folder. These are run using pytest, see Running Tests below. These tests are automatically run through GitHub integrations on PRs to the main branch of this repository. PRs that fail any of the tests will not be eligible to be merged until they are are fixed.

If you use VSCode for development, you can setup the tests with the testing integration so that you can run debugging and other IDE features. Setup will vary depending on your VSCode environment, but in your .vscode folder your settings.json might look something like

{
    "python.testing.unittestArgs": [
        "-v",
        "-s",
        ".",
        "-p",
        "test_*.py"
    ],
    "python.testing.pytestEnabled": true,
    "python.testing.unittestEnabled": false,
    "python.testing.pytestArgs": [
        "."
    ]
}

NOTE: The contents of the tests folder is not packed with the FL4Health library on release to PyPi

Development Practices

We use the standard git development flow of branch and merge to main with PRs on GitHub. At least one member of the core team needs to approve a PR before it can be merged into main. As mentioned above, tests are run automatically on PRs with a merge target of main. Furthermore, a suite of static code checkers and formatters are also run on said PRs. These also need to pass for a PR to be eligible for merging into the main branch of the library. Currently, such checks run on python3.9.

Development Requirements

The library dependencies and those for development are listed in the pyproject.toml and requirements.txt files. You may use whatever virtual environment management tool that you would like. These include conda, poetry, and virtualenv. Poetry is used to produce our releases, which are managed and automated by GitHub.

The easiest way to create and activate a virtual environment is

virtualenv "ENV_PATH"
source "ENV_PATH/bin/activate"
pip install --upgrade pip
pip install -r requirements.txt

Coding Guidelines, Formatters, and Checks

For code style, we recommend the google style guide.

Pre-commit hooks apply black code formatting.

We also use flake8 and pylint for further static code analysis. The pre-commit hooks show errors which you need to fix before submitting a PR.

Last but not the least, we use type hints in our code which are checked using mypy. The mypy checks are strictly enforced. That is, all mypy checks must pass or the associated PR will not be mergeable.

The settings for mypy are in the mypy.ini, settings for flake8 are contained in the .flake8 file. Settings for black and isort come from the pyproject.toml and some standard checks are defined directly in the .pre-commit-config.yaml settings.

All of these checks and formatters are invoked by pre-commit hooks. These hooks are run remotely on GitHub. In order to ensure that your code conforms to these standards and therefore passes the remote checks, you can install the pre-commit hooks to be run locally. This is done by running (with your environment active)

pre-commit install

To run the checks, some of which will automatically re-format your code to fit the standards, you can run

pre-commit run --all-files

It can also be run on a subset of files by omitting the --all-files option and pointing to specific files or folders.

If you're using VSCode for development, pre-commit should setup git hooks that execute the pre-commit checks each time you check code into your branch through the integrated source-control as well. This will ensure that each of your commits conform to the desired format before they are run remotely and without needing to remember to run the checks before pushing to a remote. If this isn't done automatically, you can find instructions for setting up these hooks manually online.

Code Documentation

For code documentation, we try to adhere to the Google docstring style (See Here, Section: Comments and Docstrings). The implementation of an extensive set of comments for the code in this repository is a work-in-progress. However, we are continuing to work towards a better commented state for the code. For development, as stated in the style guide, any non-trivial or non-obvious methods added to the library should have a doc string. For our library this applies only to code added to the main library in fl4health. Examples, research code, and tests need not incorporate the strict rules of documentation, though clarifying and helpful comments in those code is strongly encouraged.

NOTE: As a matter of convention choice, classes are documented through their __init__ functions rather than at the "class" level.

If you are using VS Code a very helpful integration is available to facilitate the creation of properly formatted docstrings called autoDocstring VS Code Page and Documentation. This tool will automatically generate a docstring template when starting a docstring with triple quotation marks ("""). To get the correct format, the following settings should be prescribed:

{
    "autoDocstring.customTemplatePath": "",
    "autoDocstring.docstringFormat": "google",
    "autoDocstring.generateDocstringOnEnter": true,
    "autoDocstring.guessTypes": true,
    "autoDocstring.includeExtendedSummary": false,
    "autoDocstring.includeName": false,
    "autoDocstring.logLevel": "Info",
    "autoDocstring.quoteStyle": "\"\"\"",
    "autoDocstring.startOnNewLine": true
}

Running Tests

We use pytest for our unit and integration testing in the tests folder. These tests are automatically run on GitHub for PRs targeting the main branch. All tests need to pass before merging can happen. To run all tests in the tests folder one only run (with the venv active)

pytest .

To run a specific test with pytest, one runs

pytest tests/checkpointing/test_best_checkpointer.py

where the path is the relative one from the root directory. If you're using VSCode, you can use the integrated debugger from the test suite if you properly configure your project. The settings will depend on your specific environment, but a potential setup is shown above in the Tests Section.

Citation

Reference to cite when you use FL4Health in a project or a research paper:

D.B. Emerson, J. Jewell, F. Tavakoli, Y. Zhang, S. Ayromlou, and A. Krishnan (2023). FL4Health. https://github.com/vectorInstitute/FL4Health/. Computer Software, Vector Institute for Artificial Intelligence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fl4health-0.1.8.tar.gz (82.2 kB view details)

Uploaded Source

Built Distribution

fl4health-0.1.8-py3-none-any.whl (115.8 kB view details)

Uploaded Python 3

File details

Details for the file fl4health-0.1.8.tar.gz.

File metadata

  • Download URL: fl4health-0.1.8.tar.gz
  • Upload date:
  • Size: 82.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for fl4health-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3e44ef55a9dc176ea8d3710bae92365b44e313a96c11b0e1691ae51568097ee6
MD5 266e0c1ac99067ee865728db4d4c1700
BLAKE2b-256 24e2b6af3db3b9420009f260266fe8459e1fa9020003f09d68b935eecd52276b

See more details on using hashes here.

File details

Details for the file fl4health-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: fl4health-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 115.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for fl4health-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6156b47ed4db057c4fbe8a09fa246d01ad6517b02b8605410c92748052861afa
MD5 d02d36c953bdc493cd41d71b0b7c774a
BLAKE2b-256 e638126330240b4304e8a54f55c7259532dde0819968c9c939ffeba241daef07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page