Skip to main content

Federated FeatureCloud.ai workflows on remote machines

Project description

fedflow

The aim of this project is to automate federated workflows with FeatureCloud.ai by orchestrating remote machines. FeatureCloud.ai enables federated machine learning through a web interface and a locally executed controller instance. However, analyses require manual interaction with the graphical interface of the website by all participating clients. This is prohibitive for iterative analyses and results in analyses that are not easily reproducible.

This package introduces i) an API to interact headlessly with FeatureCloud.ai ii) the orchestration of remote machines (e.g. VMs) to execute fully-automated federated analyses, and iii) an example of a fully reproducible snakemake workflow using this package.

Concept

Fedflow is an orchestration tool to use FeatureCloud.ai in reproducible workflows without manual interaction. It is designed to either run federated analyses on locally-simulated VMs for testing purposes, or to execute the same workflow on remote machines (e.g. cloud instances or machines at participating research institutes).

fedflow_overview


fedflow_remote

Requirements & installation

fedflow can be installed with pip

pip install fedflow-featurecloud

For simulations vagrant, libvirt and the vagrant-libvirt plugin are required

User accounts on FeatureCloud.ai need to be created via the website.

Usage

There are 2 main components in this project:

  1. a program to run federated workflows via FeatureCloud.ai headlessly: fedflow
  2. an API to interact with FeatureCloud.ai: fcauto. fedflow uses this API to execute workflows, hence users are not required interact directly with it.

fedflow

This program orchestrates all client machines and instructs them to interact with FeatureCloud.ai to automate federated workflows.

usage: fedflow [-h] (-c CONFIG | -t)

Federated FeatureCloud.ai workflows on remote machines

options:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Path to the config file
  -t, --template        Generate template config

fcauto

This is an API for the FeatureCloud.ai website. It's primary purpose is to be used by fedflow, but it can also be used in a stand-alone manner. There are several subcommands to interact with FeatureCloud. Check fcauto SUBCOMMAND --help for arguments/options.

usage: fcauto [-h] {create,join,monitor,query,contribute,reset,list-apps} ...

FeatureCloud automation tool

positional arguments:
  {create,join,monitor,query,contribute,reset,list-apps}
    create              Create a new FeatureCloud project (as coordinator)
    join                Join an existing FeatureCloud project
    monitor             Monitor a running FeatureCloud project
    query               Query FeatureCloud project status
    contribute          Contribute data to a FeatureCloud project
    reset               Reset a FeatureCloud project to status 'ready'
    list-apps           List available apps on FeatureCloud

options:
  -h, --help            show this help message and exit

Configuration

Configuration of fedflow requires 2 files.

  1. An environment file that contains the credentials of the participating FeatureCloud.ai users.

This is in the format USER=PASSWORD in .env in the working directory.

FeatureCloud accounts are low-privilege and low-impact if compromised, since they contain very limited information and the platform itself holds no data or results of previous analyses. In a multi-user environment it is advised to set chmod 600 .env && chown <user> .env

  1. A toml configuration of an automated FeatureCloud workflow.

A template config file can be generated with fedflow -t The format is as follows:

project_id = 0
tool = ""
sim = false
outdir = "results/"

[[clients]]
fc_username = "FC_USER"
data = []
coordinator = false
username = "USER"
hostname = "HOSTNAME"
sshkey = ".ssh/id_rsa"

[[clients]]
fc_username = "FC_USER"
data = []
coordinator = false
username = "USER"
hostname = "HOSTNAME"
sshkey = ".ssh/id_rsa"

The config must either contain tool=<str> with one of the FeatureCloud apps (either listed on their website or with fcauto list-apps) or project_id=<int>. If a tool is given, a new project template will be created and used, if project_id is given a previously-created project template will be used instead.

If sim = true is set, vagrant VMs are launched and used for the federated execution. In that case the parameters hostname, username, port, sshkey of all clients are ignored. To use other remote machines these connection details need to be provided. The field sshkey is the path to the ssh key used to authenticate the user on the remote. Exchange of connection credentials is only automated when using vagrant VMs.

The clients participating in the federated analysis are specified as an array of [[clients]]. One client is required to take the role of coordinator = true. This client's FeatureCloud user will create or initiate the project and monitor its execution.

Example usage

In example/ there are test data and configurations to run the 'mean' test app across 3 clients. FeatureCloud users need to be provided in the config example/config_mean.toml, and their credentials in a .env file.

fedflow -c config_mean.toml

Provisioning of client VMs/participating machines

System dependencies on remotes are installed automatically using a shell script shipped in fedsim/provision.py.

  • python3.12, python3.12-venv
  • docker

Limitations

The API in this project does not automate the creation of user accounts for FeatureCloud.ai or the registration of their participating sites.

Terms of accceptable use and responsible automation

This library provides programmatic access to functionality of featurecloud.ai that is normally available through manual interaction with the website. It does not bypass authentication mechanisms or access controls. To promote responsible use, the library enforces request limits and identifies itself explicitly via a User-Agent header at each request. These measures are intended to reduce the risk of excessive load or unintended disruption to the website. Users are responsible for ensuring that their use of this library complies with the website’s Terms of Service and applicable policies. High-volume scraping, aggressive polling, or other usage patterns that exceed normal interactive behavior are out of scope and may result in blocking by the website operator. If the website signals rate limiting or access restrictions (e.g. HTTP 429 or 403 responses), users need to reduce their request frequency or stop usage. The author does not operate or control the target website and makes no guarantees regarding access or compatibility.

pytests

Run the pytest suite with pytest -s --cov or pytest -s --cov -m 'not integration' to skip slow tests. There are not many tests not marked as integration though, since most of the code in this project interacts with VMs. Some tests require featurecloud credentials in the environment, which are not provided in this repo.

Example analysis

The directory analysis/workflow_comp contains an example analysis workflow that runs different FeatureCloud analyses with the automation described here. Details are available in a separate README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedflow_featurecloud-0.0.3.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedflow_featurecloud-0.0.3-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file fedflow_featurecloud-0.0.3.tar.gz.

File metadata

  • Download URL: fedflow_featurecloud-0.0.3.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for fedflow_featurecloud-0.0.3.tar.gz
Algorithm Hash digest
SHA256 cd306c7af90fde579c6b1be07de97e577391d5b678c4d14cbcad97f054342992
MD5 bc3396c780f261eea435be3deed750e4
BLAKE2b-256 2d245584de8a60a827a0684c803a53b7e0a739a176239a4cb45078ada59d4e5b

See more details on using hashes here.

File details

Details for the file fedflow_featurecloud-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for fedflow_featurecloud-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 78a4c32a076d01671147d159beb3a42b78e9aa81463817005450ab4ce11f0234
MD5 f5c41dfc1c8565103a40c1fdc7ebd5fb
BLAKE2b-256 32d4e2bee748c052c330b828c66bf36d3394eb2f00944ae7c92219f867fd2d22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page