Generic Federated Learning Simulator with PyTorch
Project description
FedSim
FedSim is a comprehensive and flexible Federated Learning Simulator! It aims to provide the researchers with an easy to develope/maintain simulator for Federated Learning. See documentation at here!
Installation
pip install fedsim
Usage
As package
Here is a demo:
from logall import TensorboardLogger
from fedsim.distributed.centralized.training import FedAvg
from fedsim.distributed.data_management import BasicDataManager
from fedsim.models.mcmahan_nets import cnn_cifar100
from fedsim.losses import CrossEntropyLoss
from fedsim.scores import Accuracy
n_clients = 1000
dm = BasicDataManager("./data", "cifar100", n_clients)
sw = TensorboardLogger(path=None)
alg = FedAvg(
data_manager=dm,
num_clients=n_clients,
sample_scheme="uniform",
sample_rate=0.01,
model_def=cnn_cifar100,
epochs=5,
criterion_def=partial(CrossEntropyLoss, log_freq=100),
batch_size=32,
metric_logger=sw,
device="cuda",
)
alg.hook_local_score(
partial(Accuracy, log_freq=50),
split_name='train,
score_name="accuracy",
)
alg.hook_global_score(
partial(Accuracy, log_freq=40),
split_name='test,
score_name="accuracy",
)
report_summary = alg.train(rounds=1)
fedsim-cli tool
For help with cli check here:
fedsim-cli --help
DataManager
Any custom DataManager class should inherit from fedsim.data_manager.data_manager.DataManager (or its children) and implement its abstract methods. For example:
from fedsim.distributed.data_management import DataManager
class CustomDataManager(DataManager)
def __init__(self, root, other_arg, ...):
self.other_arg = other_arg
# note that super should be called at the end of init \
# because the abstract classes are called in its __init__
super(CustomDataManager, self).__init__(root, seed, save_dir=save_dir)
def make_datasets(self, root: str) -> Iterable[Dict[str, object]]:
"""Abstract method to be implemented by child class.
Args:
dataset_name (str): name of the dataset.
root (str): directory to download and manipulate data.
save_dir (str): directory to store the data after partitioning.
Raises:
NotImplementedError: if the dataset_name is not defined
Returns:
Iterable[Dict[str, object]]: dict of local datasets [split:dataset]
followed by global ones.
"""
raise NotImplementedError
def partition_local_data(self, datasets: Dict[str, object]) -> Dict[str, Iterable[Iterable[int]]]:
raise NotImplementedError
def get_identifiers(self) -> Sequence[str]:
""" Returns identifiers
to be used for saving the partition info.
Raises:
NotImplementedError: this abstract method should be
implemented by child classes
Returns:
Sequence[str]: a sequence of str identifing class instance
"""
raise NotImplementedError
Integration with fedsim-cli (DataManager)
To automatically include your custom data-manager by the provided cli tool, you can place your class in a python file and pass its path to -d or –data-manager option (without .py) followed by colon and name of the data-manager. For example, if you have data-manager DataManager stored in foo/bar/my_custom_dm.py, you can pass –data-manager foo/bar/my_custom_dm:DataManager.
Included DataManager
Provided with the simulator is a basic DataManager called BasicDataManager which for now supports the following datasets
It supports the popular partitioning schemes (iid, Dirichlet distribution, unbalanced, etc.).
CentralFLAlgorithm
Any custom CentralFLAlgorithm class should inherit from fedsim.distributed.centralized.CentralFLAlgorithm (or its children) and implement its abstract methods. For example:
Architecture
Example
from typing import Optional, Hashable, Mapping, Dict, Any
from fedsim.distributed.centralized import CentralFLAlgorithm
class CustomFLAlgorithm(CentralFLAlgorithm):
def __init__(
data_manager, metric_logger, num_clients, sample_scheme, sample_rate, model_def, epochs, criterion_def,
optimizer_def, local_optimizer_def, lr_scheduler_def=None, local_lr_scheduler_def,
r2r_local_lr_scheduler_def=None, batch_size=32, test_batch_size=64, device="cuda", other_arg, ...
):
self.other_arg = other_arg
...
super(CustomFLAlgorithm, self).__init__(
data_manager, metric_logger, num_clients, sample_scheme, sample_rate, model_def, epochs, criterion_def,
optimizer_def, local_optimizer_def, lr_scheduler_def=None, local_lr_scheduler_def,
r2r_local_lr_scheduler_def=None, batch_size=32, test_batch_size=64, device="cuda",
)
# make mode and optimizer
model = self.get_model_def()().to(self.device)
params = deepcopy(parameters_to_vector(model.parameters()).clone().detach())
optimizer = optimizer_def(params=[params])
lr_scheduler = None
if lr_scheduler_def is not None:
lr_scheduler = lr_scheduler_def(optimizer=optimizer)
# write model and optimizer to server
self.write_server("model", model)
self.write_server("cloud_params", params)
self.write_server("optimizer", optimizer)
self.write_server("lr_scheduler", lr_scheduler)
...
def send_to_client(self, client_id: int) -> Mapping[Hashable, Any]:
""" returns context to send to the client corresponding to the client_id.
.. warning::
Do not send shared objects like server model if you made any
before you deepcopy it.
Args:
client_id (int): id of the receiving client
Raises:
NotImplementedError: abstract class to be implemented by child
Returns:
Mapping[Hashable, Any]: the context to be sent in form of a Mapping
"""
...
def send_to_server(self, client_id: int, datasets: Dict[str, Iterable],
round_scores: Dict[str, Dict[str, fedsim.scores.Score]], epochs: int, criterion: nn.Module,
train_batch_size: int, inference_batch_size: int, optimizer_def: Callable,
lr_scheduler_def: Optional[Callable] = None, device: Union[int, str] = "cuda",
ctx: Optional[Dict[Hashable, Any]] = None) -> Mapping[str, Any]:
"""client operation on the recieved information.
Args:
client_id (int): id of the client
datasets (Dict[str, Iterable]): this comes from Data Manager
round_scores (Dict[str, Dict[str, fedsim.scores.Score]]): dictionary of
form {'split_name':{'score_name': score_def}} for global scores to
evaluate at the current round.
epochs (``int``): number of epochs to train
criterion (nn.Module): either 'ce' (for cross-entropy) or 'mse'
train_batch_size (int): training batch_size
inference_batch_size (int): inference batch_size
optimizer_def (float): class for constructing the local optimizer
lr_scheduler_def (float): class for constructing the local lr scheduler
device (Union[int, str], optional): Defaults to 'cuda'.
ctx (Optional[Dict[Hashable, Any]], optional): context reveived.
Returns:
Mapping[str, Any]: client context to be sent to the server
"""
...
def receive_from_client(self, client_id: int, client_msg: Mapping[Hashable, Any], aggregator: Any):
""" receive and aggregate info from selected clients
Args:
client_id (int): id of the sender (client)
client_msg (Mapping[Hashable, Any]): client context that is sent
aggregator (Any): aggregator instance to collect info
"""
raise NotImplementedError
def optimize(self, aggregator: Any) -> Mapping[Hashable, Any]:
""" optimize server mdoel(s) and return metrics to be reported
Args:
aggregator (Any): Aggregator instance
Returns:
Mapping[Hashable, Any]: context to be reported
"""
...
def deploy(self) -> Optional[Mapping[Hashable, Any]]:
""" return Mapping of name -> parameters_set to test the model
"""
raise NotImplementedError
def report(self, dataloaders, round_scores: Dict[str, Dict[str, Any]], metric_logger: Any,
device: str, optimize_reports: Mapping[Hashable, Any],
deployment_points: Optional[Mapping[Hashable, torch.Tensor]] = None) -> None:
"""test on global data and report info
Args:
dataloaders (Any): dict of data loaders to test the global model(s)
metric_logger (Any): the logging object (e.g., SummaryWriter)
device (str): 'cuda', 'cpu' or gpu number
optimize_reports (Mapping[Hashable, Any]): dict returned by optimzier
deployment_points (Mapping[Hashable, torch.Tensor], optional): output of deploy method
"""
...
Integration with fedsim-cli (CentralFLAlgorithm)
To automatically include your custom algorithm by the provided cli tool, you can place your class in a python and pass its path to -a or –algorithm option (without .py) followed by colon and name of the algorithm. For example, if you have algorithm CustomFLAlgorithm stored in a foo/bar/my_custom_alg.py, you can pass –algorithm foo/bar/my_custom_alg:CustomFLAlgorithm.
other attributes and methods provide by CentralFLAlgorithm
method |
functionality |
---|---|
CentralFLAlgorithm.get_model_def() |
returns the class object of the model architecture |
CentralFLAlgorithm.write_server(key, obj) |
stores obj in server memory, accessible with key |
CentralFLAlgorithm.write_client(client_id, key, obj) |
stores obj in client_id’s memory, accessible with key |
CentralFLAlgorithm.read_server(key) |
returns obj associated with key in server memory |
CentralFLAlgorithm.read_client(client_id, key) |
returns obj associated with key in client_id’s memory |
Included FL algorithms
Alias |
Paper |
---|---|
FedAvg |
|
FedNova |
|
FedProx |
|
FedDyn |
|
AdaBest |
Model Architectures
Included Architectures
The models used by FedAvg paper are supported:
McMahan’s 2 layer mlp for MNIST
McMahan’s CNN for CIFAR10 and CIFAR100
To use them import fedsim.model.mcmahan_nets.
Integration with fedsim-cli
To automatically include your custom model by the provided cli tool, you can place your class in a python and pass its path to -m or –model option (without .py) followed by colon and name of the model. For example, if you have model CustomModel stored in a foo/bar/my_custom_model.py, you can pass –model foo/bar/my_custom_alg:CustomModel.
Learning Rate Schedulers
fedsim-cli fed-learn accepts 3 scheduler objects.
lr-scheduler: learning rate scheduler for server optimizer.
local-lr-scheduler: learning rate scheduler for client optimizer.
r2r-local-lr-scheduler: schedules the initial learning rate that is delivered to the clients of each round.
These arguments are passed to instances of the centralized FL algorithms.
fedsim-cli examples
The following command splits CIFAR100 on 1000 idd partitions and then uses AdaBest algorithm with \(\mu=0.02\) and \(\beta=0.96\) to train a model. It randomly draws 1% of all clients (200 clietns, first 200 paritions of the 1000) at each round (2 clients) and uses SGD with lr=0.05 and weight_decay=0.001 as for the local learning rate. Local training batch size is 50.
fedsim-cli fed-learn -a AdaBest mu:0.02 beta:0.96 -m cnn_cifar100 -d BasicDataManager dataset:cifar100 num_partitions:1000 -r 1001 -n 200 --local-optimizer SGD lr:0.05 weight_decay:0.001 --batch-size 50 --client-sample-rate 0.01 --global-score Accuracy log_freq:100 split:test --global-score Accuracy score_name:high_freq_acc log_freq:10 split:test
The following command tunes \(\alpha\) for FedDyn algorithm. It uses Gaussian Process to maximize the average of the last 10 (by default, change with --n-point-summary option) values of server.avg.test.cross_entropy_score. \(\alpha\) is tuned for float numbers (marked with Real based on skopt converntion) between 0 and 0.3. Additionally, learning rate of the local optimizer is tuned between 0.04 to 0.14.
fedsim-cli fed-tune --maximize --eval-metric server.avg.test.cross_entropy_score --n-iters 20 --skopt-random-state 10 --criterion CrossEntropyLoss log_freq:10 -n 1000 -m cnn_cifar100 --batch-size 50 --test-batch-size 75 --global-score Accuracy log_freq:100 split:test --global-score CrossEntropyScore log_freq:100 split:test --epochs 5 --local-optimizer SGD weight_decay:0.001 lr:Real:0.04-0.14 -r 4000 -a FedDyn alpha:Real:0-0.3
Side Notes
Do not use double underscores (__) in argument names of your customized classes.
0.7.0 (2022-09-10)
algorithms got more secure with local storage
redefined model architectures
fixed bug in default step closure’
made random seed more consistent
0.6.2 (2022-08-31)
fixed some errors in docstring of central FL algorithms
add sample balance param to to identifiers of data manager
0.6.1 (2022-08-17)
fixed bug in partition_global_data of BasicDataManager
some changes in default values for better log storage and aggregation
0.6.0 (2022-08-16)
changed the name of cli directory
added cli tests
added support for pytorch original lr schedulers
improved docs
added version option to fedsim-cli
0.5.0 (2022-08-15)
completed lr schedulers and generalized them for all levels
changed some argument names and default values
0.4.1 (2022-08-12)
fixed bugs with mismatched loss_fn argument name in cli commands
changed all eval_freq arguemnts to unified log_req
0.4.0 (2022-08-12)
changed the structure of scores and losses
made it possible to hook multiple local and global scores
0.3.1 (2022-08-09)
added advanced learning rate schedulers
properly tested r2r lr scheduler
0.3.0 (2022-08-09)
added fine-tuning to cli, fed-tune
cleaner cli
made optimizers and schedulers user definable
improved logging
0.2.0 (2022-08-01)
cleaned the API reference in docs
changed cli name to fedsim-cli
improved documentation
improved importing
changed the way custom objects are passed to cli
0.1.4 (2022-07-23)
changed FLAlgorithm to CentralFLAlgorithm for more clearity
set default device to cuda if available otherwise to cpu in fed-learn cli
fix wrong superclass names in demo
fix the confusion with save_dir and save_path in DataManager classes
0.1.3 (2022-07-08)
the documentation is redesigned and mostly automated.
documentation now is available at https://fesim.varnio.com
added code of coduct from github tempalate
0.1.2 (2022-07-05)
changed ownership of repo from fedsim-dev to varnio
0.1.1 (2022-06-22)
added fedsim.scores which wraps torch loss functions and sklearn scores
moved reporting mechanism of distributed algorithm for supporting auto monitor
added AppendixAggregator which is used to hold metric scores and report final results
apply a patch for wrong pypi supported python versions
0.1.0 (2022-06-21)
First major pre-release.
The package is restructured
docs is updated and checked to pass through tox steps
0.0.4 (2022-06-14)
Fourth release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.