Federated Learning for Health
Project description
FL4Health
Principally, this repository contains the federated learning (FL) engine aimed at facilitating FL research, experimentation, and exploration, specifically targetting health applications.
The library source code is housed in the fl4health
folder. This library is built on the foundational components of Flower, an open-source FL library in its own right. The documentation is here. This library contains a number of unique components that extend the functionality of Flower in a number of directions.
Summary of Approaches
The present set of FL approaches implemented in the library are:
- FedAvg
- Weighted
- Unweighted
- FedOpt
- FedAdam
- FedAdaGrad
- FedYogi
- FedProx
- Adaptive
- Uniform
- SCAFFOLD
- Standard
- With Warmup
- DP-Scaffold
- Personal FL (Continued Local Training)
- APFL
- FENDA-FL
More approaches are being implemented as they are prioritized. However, the library also provides significant flexibiltiy to implement strategies of your own.
Privacy Capabilities
In addition to the FL strategies, we also support several differentially private FL training approaches. These include:
- Instance-level FL privacy
- Client-level FL privacy with Adaptive Clipping
- Weighted and Unweighted FedAvg
Components
Checkpointing
Contains modules associated with basic checkpointing. Currently only supports checkpointing of pytorch models.
Client Managers
Houses modules associated with custom functionality on top of Flower's client managers. Client managers are responsible for, among other things, coordinating and sampling clients to participate in server rounds. We support several ways to sample clients in each round.
Clients
Here, implementations for specific FL strategies that affect client-side training or enforce certrain properties during training are housed. There is also a basic client that implements standard client-side optimization flows for convenience. For example, the FedProxClient adds the requisite proximal loss term to a provided standard loss prior to performing optimization.
Model Bases
Certain methods require special model architectures. For example APFL has twin models and separate global and personal forward passes. It also has a special update function associated with the convex combination parameter $\alpha$. This folder houses special code to facilitate use of these customizations to the neural network architectures.
Parameter Exchange
In vanilla FL, all model weights are exchanged between the server and clients. However, in many cases, either more or less information needs to be exchanged. SCAFFOLD requires that both weights and associated "control variates" be exchanged between the two entities. On the other hand, APFL only exchanges a subset of the parameters. The classes in this folder facilitate the proper handling of both of these situtations. More complicated adaptive parameter exchange techniques are also considered here. There is an example of this type of approach in the Examples folder under the partial_weight_exchange_example.
Privacy
This folder holds the current differential privacy accountants for the instance and client-level DP methods that have been implemented. They are based on the established "Moments Accountants." However, we are working to move these to the new "PRV Accountants."
Reporting
Currently, this holds the reporting integrations with Weights and Biases for experiment logging. It is capable of capturing both Server and Client metrics. For an example of using this integration, see the fedprox_example.
Server
Certain FL methods, such as Client-Level DP and SCAFFOLD with Warm Up, require special server-side flows to ensure that everything is properly handled. This code also establishes initialization communication between the client and server. For example, one can poll each of the clients to obtain the size of each client's dataset before proceeding to FL training.
Strategies
This folder contains implementations of distinct strategies going beyond those implemented in the standard Flower library. Certain methods require distinct aggregation procedures, such as Client-level differential privacy with adaptive clipping where a noisy aggregation must take place and special considerations are required for the clipping bits. Implementation of new strategies here allows one to customize the way in which parameters and other information communicated between a server and the clients is aggregated.
Examples
The examples folder contains an extensive set of ways to use the various components of the library, setup the different strategies implemented in the library, and how to run federated learning in general. These examples are an accessbile way to learn what is required to experiment with different FL capabilties. Each example has some documentation describing what is being implemented and how to run the code to see it in action. The examples span basic FedAvg implementations to differentially private SCAFFOLD.
NOTE: The contents of the examples folder is not packed with the FL4Health library on release to PyPi
Research Code
The research folder houses code associated with various research being conducted by the team at Vector. It may be used to perform experiments on the Cluster or to reproduce experiments from our research. The current research is:
- FENDA-FL FLamby Experiments. There is a README in that folder that provides details on how to run the hyper-parameter sweeps, evaluations, and other experiments.
NOTE: The contents of the research folder is not packed with the FL4Health library on release to PyPi
Tests
All tests for the library are housed in the tests folder. These are run using pytest
, see Running Tests below. These tests are automatically run through GitHub integrations on PRs to the main branch of this repository. PRs that fail any of the tests will not be eligible to be merged until they are are fixed.
If you use VSCode for development, you can setup the tests with the testing integration so that you can run debugging and other IDE features. Setup will vary depending on your VSCode environment, but in your .vscode folder your settings.json
might look something like
{
"python.testing.unittestArgs": [
"-v",
"-s",
".",
"-p",
"test_*.py"
],
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.pytestArgs": [
"."
]
}
NOTE: The contents of the tests folder is not packed with the FL4Health library on release to PyPi
Development Practices
We use the standard git development flow of branch and merge to main with PRs on GitHub. At least one member of the core team needs to approve a PR before it can be merged into main. As mentioned above, tests are run automatically on PRs with a merge target of main. Furthermore, a suite of static code checkers and formatters are also run on said PRs. These also need to pass for a PR to be eligible for merging into the main branch of the library. Currently, such checks run on python3.9.
Development Requirements
For development and testing, we use Poetry for dependency management. The library dependencies and those for development and testing are listed in the pyproject.toml
file. You may use whatever virtual environment management tool that you would like. These include conda, poetry itself, and virtualenv. Poetry is also used to produce our releases, which are managed and automated by GitHub.
The easiest way to create and activate a virtual environment is by using the virtualenv package:
virtualenv "ENV_PATH"
source "ENV_PATH/bin/activate"
pip install --upgrade pip poetry
poetry install --with "dev, dev-local, test, codestyle"
Coding Guidelines, Formatters, and Checks
For code style, we recommend the google style guide.
Pre-commit hooks apply black code formatting.
We also use flake8 and pylint for further static code analysis. The pre-commit hooks show errors which you need to fix before submitting a PR.
Last but not the least, we use type hints in our code which are checked using mypy. The mypy checks are strictly enforced. That is, all mypy checks must pass or the associated PR will not be mergeable.
The settings for mypy
are in the mypy.ini
, settings for flake8
are contained in the .flake8
file. Settings for black
and isort
come from the pyproject.toml
and some standard checks are defined directly in the .pre-commit-config.yaml
settings.
All of these checks and formatters are invoked by pre-commit hooks. These hooks are run remotely on GitHub. In order to ensure that your code conforms to these standards and therefore passes the remote checks, you can install the pre-commit hooks to be run locally. This is done by running (with your environment active)
pre-commit install
To run the checks, some of which will automatically re-format your code to fit the standards, you can run
pre-commit run --all-files
It can also be run on a subset of files by omitting the --all-files
option and pointing to specific files or folders.
If you're using VSCode for development, pre-commit should setup git hooks that execute the pre-commit checks each time you check code into your branch through the integrated source-control as well. This will ensure that each of your commits conform to the desired format before they are run remotely and without needing to remember to run the checks before pushing to a remote. If this isn't done automatically, you can find instructions for setting up these hooks manually online.
Code Documentation
For code documentation, we try to adhere to the Google docstring style (See Here, Section: Comments and Docstrings). The implementation of an extensive set of comments for the code in this repository is a work-in-progress. However, we are continuing to work towards a better commented state for the code. For development, as stated in the style guide, any non-trivial or non-obvious methods added to the library should have a doc string. For our library this applies only to code added to the main library in fl4health
. Examples, research code, and tests need not incorporate the strict rules of documentation, though clarifying and helpful comments in those code is strongly encouraged.
NOTE: As a matter of convention choice, classes are documented through their __init__
functions rather than at the "class" level.
If you are using VS Code a very helpful integration is available to facilitate the creation of properly formatted docstrings called autoDocstring VS Code Page and Documentation. This tool will automatically generate a docstring template when starting a docstring with triple quotation marks ("""). To get the correct format, the following settings should be prescribed:
{
"autoDocstring.customTemplatePath": "",
"autoDocstring.docstringFormat": "google",
"autoDocstring.generateDocstringOnEnter": true,
"autoDocstring.guessTypes": true,
"autoDocstring.includeExtendedSummary": false,
"autoDocstring.includeName": false,
"autoDocstring.logLevel": "Info",
"autoDocstring.quoteStyle": "\"\"\"",
"autoDocstring.startOnNewLine": true
}
Running Tests
We use pytest for our unit and integration testing in the tests folder. These tests are automatically run on GitHub for PRs targeting the main branch. All tests need to pass before merging can happen. To run all tests in the tests folder one only run (with the venv active)
pytest .
To run a specific test with pytest, one runs
pytest tests/checkpointing/test_best_checkpointer.py
where the path is the relative one from the root directory. If you're using VSCode, you can use the integrated debugger from the test suite if you properly configure your project. The settings will depend on your specific environment, but a potential setup is shown above in the Tests Section.
Citation
Reference to cite when you use FL4Health in a project or a research paper:
D.B. Emerson, J. Jewell, F. Tavakoli, Y. Zhang, S. Ayromlou, and A. Krishnan (2023). FL4Health. https://github.com/vectorInstitute/FL4Health/. Computer Software, Vector Institute for Artificial Intelligence.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fl4health-0.1.10.tar.gz
.
File metadata
- Download URL: fl4health-0.1.10.tar.gz
- Upload date:
- Size: 89.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07506024d754886fdc11227316be41ad463740463c3bfbdcf740511e02c77590 |
|
MD5 | 872426010a33f289b9dd39eeb11de98d |
|
BLAKE2b-256 | 63ca9c53c668a8a1bda111ff2c852d85c5f4221dcb04d607f28ddc51ecd51f90 |
File details
Details for the file fl4health-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: fl4health-0.1.10-py3-none-any.whl
- Upload date:
- Size: 123.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cb7cfd2ff4cb91d24caf3ed1e0fec3820286f42c35880cf2b6e5d62c304c819 |
|
MD5 | 9d683c3a612598f32da9bac8a22c8c38 |
|
BLAKE2b-256 | 6584bbe6d11e7bc486b4860931885e2913b57d6e213da00c73a54d906d444770 |