Skip to main content

fedimpute is a benchmarking tool for federated imputation

Project description

FedImpute: a benchmarking and evaluation tool for federated imputation across various missing data scenarios.

License: GPL v3 Docs site

FedImpute is a benchmarking tool for the evaluation of federated imputation algorithms over various missing data scenarios under horizontally partitioned data.

Installation

Firstly, install python >= 3.10.0, we have two ways to install

Install from pip:

pip install fedimpute

Install from package repo:

git clone https://github.com/idsla/FedImpute
cd FedImpute

python -m venv ./venv

# window gitbash
source ./venv/Scripts/activate

# linux/unix
source ./venv/bin/activate

# Install the required packages
pip install -r requirements.txt

Basic Usage

Step 1. Prepare Data

import numpy as np
data = np.random.rand(10000, 10)
data_config = {
    'task_type': 'regression',
    'clf_type': None,
    'num_cols': 9,
}

Step 2. Simulate Federated Missing Data Scenario

from fedimpute.simulator import Simulator
simulator = Simulator()
simulation_results = simulator.simulate_scenario(
    data, data_config, num_clients = 10, dp_strategy='iid-even', ms_mech_type='mcar', verbose=1
)

Step 3. Execute Federated Imputation Algorithms

Note that if you use cuda version of torch, remember to set environment variable for cuda deterministic behavior

# bash (linux)
export CUBLAS_WORKSPACE_CONFIG=:4096:8
# powershell (windows)
$Env:CUBLAS_WORKSPACE_CONFIG = ":4096:8"
from fedimpute.execution_environment import FedImputeEnv
env = FedImputeEnv()
env.configuration(imputer = 'fed_ice', fed_strategy='fedavg', fit_mode = 'fed')
env.setup_from_simulator(simulator = simulator, verbose=1)

env.run_fed_imputation(run_type='sequential')

Step 4. Evaluate imputation outcomes

from fedimpute.evaluation import Evaluator

evaluator = Evaluator()
evaluator.evaluate(env, ['imp_quality', 'pred_downstream_local', 'pred_downstream_fed'])
evaluator.show_results()

Supported Data Partition Strategies

  • Natural Partition: this can be done by reading list of datasets, see "Dataset and Preprocessing" section in documentation
  • Artifical Partition
    • column: partition based on discrete values of the column in the dataset
    • iid-even: iid partition with even sample sizes
    • iid-dir: iid parititon with sample sizes following dirichlet distribution
    • niid-dir: non-iid partition based on some columns with dirichlet ditribution
    • niid-path: non-iid partition based on some columns with pathological distribution (shard partition)

Supported Missing Data Mechanism

  • mcar: MCAR missing mechanism
  • mar-homo: Homogeneous MAR missing mechansim
  • mar-heter: Heterogeneous MAR missing mechanism
  • mnar-homo: Homogeneours MNAR missing mechanism
  • mnar-heter: Heterogenous MNAR missing mechanism

Supported Federated Imputation Algorithms

Federated Imputation Algorithms:

Method Type Fed Strategy Imputer (code) Reference
Fed-Mean Non-NN - fed_mean -
Fed-EM Non-NN - fed_em EM, FedEM
Fed-ICE Non-NN - fed_ice FedICE
Fed-MissForest Non-NN - fed_missforest MissForest, Fed Randomforest
MIWAE NN fedavg, fedprox, fedavg_ft, ... miwae MIWAE
GAIN NN fedavg, fedprox, fedavg_ft, ... gain GAIN
Not-MIWAE NN fedavg, fedprox, fedavg_ft, ... notmiwae Not-MIWAE
GNR NN fedavg, fedprox, fedavg_ft, ... gnr GNR

Federated Strategies:

Method Type Fed_strategy(code) Reference
FedAvg global FL fedavg FedAvg
FedProx global FL fedprox FedProx
Scaffold global FL scaffold Scaffold
FedAdam global FL fedadam FedAdam
FedAdagrad global FL fedadagrad FedAdaGrad
FedYogi global FL fedyogi FedYogi
FedAvg-FT personalized FL fedavg_ft FedAvg-FT

FedImputeBench - Benckmarking Analysis Using FedImpute

We use FedImpute to initialize a benchmarking analysis for federated imputation algorithms. The repo for FedImputeBench can be found here

Contact

For any questions, please contact Sitao Min

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedimpute-0.0.5.tar.gz (114.1 kB view details)

Uploaded Source

Built Distribution

fedimpute-0.0.5-py3-none-any.whl (179.1 kB view details)

Uploaded Python 3

File details

Details for the file fedimpute-0.0.5.tar.gz.

File metadata

  • Download URL: fedimpute-0.0.5.tar.gz
  • Upload date:
  • Size: 114.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.4 Windows/10

File hashes

Hashes for fedimpute-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ed2da6bc16cf91c369d3685a1742112966066205b818b5cc6c5db4085291e3b7
MD5 27820a4daf76c4e6d5d6ffd5e458e22e
BLAKE2b-256 16d5ae266e6fe719f0cd5cdf7e35e86c754aa5c717b847c41e1c6dd049b60d8d

See more details on using hashes here.

File details

Details for the file fedimpute-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: fedimpute-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 179.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.4 Windows/10

File hashes

Hashes for fedimpute-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 30d7a28a9e879de007bb7a3d000ed4b9fc18e6e01d6a0d41270ab0928a7979fa
MD5 636335c6ee963990a145d08a9c44fa74
BLAKE2b-256 4b3eca811d9220dc7c5529cdd5af78bc87c13983e388896a6fba90ab10d20735

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page