Skip to main content

FLIP - Federated Learning for Imaging Platform library built on NVIDIA FLARE

Project description

flip-fl-base

This repository contains the FLIP federated learning base application built on NVIDIA FLARE (NVFLARE). It includes the FL services (server, clients, admin API) and the base application code that users extend with their own training logic.

Quick Start

Prerequisites

  • Docker and Docker Compose
  • uv (Python package manager)
  • AWS CLI configured (for downloading test data)

1. Provision an FL Network

Before running anything, you need to provision a federated learning network. This generates the required certificates, keys, and configuration files:

make nvflare-provision NET_NUMBER=1

This creates:

  • Network-specific compose file: deploy/compose-net-1.yml
  • Service secrets in workspace/net-1/services/ (gitignored)

You can provision multiple networks with different ports:

make nvflare-provision NET_NUMBER=2 FL_PORT=8004 ADMIN_PORT=8005

Warning: Provisioned files contain cryptographic signatures. Any modification will cause errors. Always re-run provisioning if changes are needed.

2. Build Docker Images

make build NET_NUMBER=1

3. Start the FL Network

make up NET_NUMBER=1

This starts:

  • fl-server-net-1: Aggregation server
  • fl-client-1-net-1, fl-client-2-net-1: Training clients
  • flip-fl-api-net-1: FastAPI admin interface

To stop the network:

make down NET_NUMBER=1

To clean up images and containers:

make clean NET_NUMBER=1

Development Mode

DEV mode lets you test your FL applications locally before deploying to production.

Configure Environment

Edit .env.development:

LOCAL_DEV=true
DEV_IMAGES_DIR=../data/accession-resources    # Path to your images
DEV_DATAFRAME=../data/sample_get_dataframe.csv  # Path to your dataframe
JOB_TYPE=standard

Add Your Application Files

Place your files in src/<JOB_TYPE>/app/custom/:

  • trainer.py - Training logic (FLIP_TRAINER executor)
  • validator.py - Validation logic (FLIP_VALIDATOR executor)
  • models.py - Model definitions (get_model function)
  • config.json - Hyperparameters (requires LOCAL_ROUNDS and LEARNING_RATE)
  • transforms.py - Data transforms (optional)

Run the Simulator

make run-container

This runs the NVFLARE simulator in Docker with 2 clients, mounting your app folder for live changes.

Testing

Download Test Data

Download x-ray classification test data (requires AWS S3 access):

make download-xrays-data

Download spleen segmentation test data (requires AWS S3 access):

make download-spleen-data

Download model checkpoints for evaluation tests:

make download-checkpoints

Run Integration Tests

Test different job types with the spleen dataset:

# Standard federated training (classification task)
make test-xrays-standard

# Standard federated training (segmentation task)
make test-spleen-standard

# Model evaluation pipeline (requires model checkpoint file)
make test-spleen-evaluation

# Diffusion model training
make test-spleen-diffusion

# Run all integration tests
make test

Run Unit Tests

make unit-test

Manage Test Applications

Copy the spleen test app to your dev folder:

make copy-spleen-app

Save your changes back to the test folder:

make save-spleen-app

Project Structure

├── src/                    # FL application types
│   ├── standard/           # Standard FedAvg training
│   ├── evaluation/         # Distributed model evaluation
│   ├── diffusion_model/    # Two-stage VAE + diffusion training
│   └── fed_opt/            # Custom federated optimization
├── fl_services/            # NVFLARE service definitions
│   ├── fl-base/            # Base Docker image
│   ├── fl-api-base/        # FastAPI admin service
│   ├── fl-client/          # Base FL client service
│   └── fl-server/          # Base FL server service
├── deploy/                 # Docker compose files and templates
├── workspace/              # Provisioned secrets (gitignored)
├── tests/                  # Integration test applications
|  ├── examples/            # Example applications for integration testing
|  └── unit/                # Unit tests
└── .env.development        # Local environment configuration

Job Types

Set via JOB_TYPE environment variable:

Type Description
standard Federated training with FedAvg aggregation (default)
evaluation Distributed model evaluation without training
diffusion_model Two-stage training (VAE encoder + diffusion)
fed_opt Custom federated optimization

NVFLARE App Structure

An NVFLARE app requires this structure:

app/
├── config/
│   ├── config_fed_server.json
│   └── config_fed_client.json
└── custom/
    ├── trainer.py
    ├── validator.py
    ├── models.py
    └── config.json

For different configurations per client/server, use multiple app folders with a meta.json containing a deploy_map. See NVFLARE documentation.

Application and tutorials

Applications that will run on FLIP will take files from the app of choice (contained in both the custom and config folders described above), and files that are uploaded by the user to the UI. These files are customisable by the user, and examples compatible with different types of apps will be available in tutorials.

image.png

These are the following app / tutorial compatibilities:

App Tutorial
standard image_segmentation/3d_spleen_segmentation
diffusion_model image_synthesis/latent_diffusion_model
fed_opt image_segmentation/3d_spleen_segmentation
evaluation image_evaluation/3d_spleen_segmentation
standard image_classification/xray_classification

User Application Requirements

The standard application requires:

File Description
trainer.py Training logic with FLIP_TRAINER class inheriting from Executor
validator.py Validation logic with FLIP_VALIDATOR class inheriting from Executor
models.py Model definitions with get_model() function
config.json Must include LOCAL_ROUNDS and LEARNING_RATE

Production Testing via GitHub Actions

Pull requests automatically push to a dev S3 bucket for testing:

s3://flipdev/base-application-dev/pull-requests/<PR_NUMBER>/src/

To test on the FLIP platform, update FL_APP_BASE_BUCKET in the flip repo environment variables to point to your PR's bucket.

S3 Bucket Mounting (Optional)

For automatic sync between local development and S3:

  1. Install s3fs

  2. Configure credentials:

    echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ~/.passwd-s3fs
    chmod 600 ~/.passwd-s3fs
    
  3. Mount the bucket:

    s3fs flip:/base-application-dev/src/standard/app/ ./app/ -o passwd_file=${HOME}/.passwd-s3fs
    

For automatic mounting on boot, add to /etc/fstab:

flip <PATH_TO_APP>/app fuse.s3fs _netdev,allow_other 0 0

Test with mount -a before relying on it.

CI/CD

These workflows use GitHub OIDC to securely authenticate to AWS (no long-lived AWS keys required). They use an IAM role with a policy that allows S3 operations.

  • PR to any branch: Pushes to dev S3 bucket for testing on AWS dev account:
    • (dev) s3://flipdev/base-application-dev/pull-requests/<PR_NUMBER>/src/
  • Merge to develop: Syncs src/ to S3 buckets on AWS dev and staging accounts:
    • (dev) s3://flipdev/base-application-dev/src/
    • (staging) s3://flipstag/base-application/src/
  • Merge to main: Syncs src/ to S3 bucket in AWS prod account:
    • (prod) s3://flipprod/base-application/src/

Warning: Never manually sync to the production bucket.

Makefile Reference

Network Management

Command Description
make nvflare-provision NET_NUMBER=X Provision FL network X
make build NET_NUMBER=X Build Docker images for network X
make up NET_NUMBER=X Start FL network X
make down NET_NUMBER=X Stop FL network X
make clean NET_NUMBER=X Remove containers and images

Development

Command Description
make run-container Run NVFLARE simulator in Docker

Testing Commands

Command Description
make unit-test Run pytest unit tests
make test-spleen-standard Test standard job with spleen data
make test-spleen-evaluation Test evaluation job with spleen data
make test-spleen-diffusion Test diffusion model with spleen data
make test Run all integration tests

Data Management

Command Description
make download-spleen-data Download spleen test images from S3
make download-checkpoints Download model checkpoints from S3
make copy-spleen-app Copy test app to dev folder
make save-spleen-app Save dev changes to test folder
make pull-spleen-app Pull latest app from tutorials repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flip_utils-0.1.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flip_utils-0.1.0-py3-none-any.whl (82.4 kB view details)

Uploaded Python 3

File details

Details for the file flip_utils-0.1.0.tar.gz.

File metadata

  • Download URL: flip_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for flip_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5bb6b07410da7f9b18bf56f36d7680714963308e0bd691511425b5a7274190ac
MD5 bb005631876327ed7a72741f21a4079d
BLAKE2b-256 6ca488a99a9dd0ef92c01664e79a8db090bbc04389147aa1ad35f92c6888ce25

See more details on using hashes here.

File details

Details for the file flip_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flip_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 82.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for flip_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 658bf9ce07700fff241d50a54f6b7d2435c618eba56ca3d268aacff08af8a1cc
MD5 13b948f73224bc85669cd594d66ac2a9
BLAKE2b-256 8a66dfe6f3447fb63e5a392f29f4b21e08a93063d008f105f0c534dc741c986e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page