A federated Random Forest implementation

These details have not been verified by PyPI

Project description

fed_rf_mk

A Python package for implementing federated learning with Random Forests using PySyft.

Description

fed_rf_mk is a federated learning implementation that allows multiple parties to collaboratively train Random Forest models without sharing their raw data. This package leverages PySyft's secure federated learning framework to protect data privacy while enabling distributed model training.

Key features:

Secure federated training of Random Forest classifiers
Weighted model averaging based on client importance
Incremental learning approach for multi-round training
Evaluation of global models on local test data
Support for both training and evaluation clients

Installation

Prerequisites

Python 3.10.12 or higher

Installing from PyPI

pip install fed_rf_mk

Installing from Source

git clone https://github.com/AlexandreCotorobai/fed_rf.git
cd fed_rf
pip install -e .

Setting Up a Federated Learning Environment

1. Launch Data Silos (Servers)

There are multiple ways to launch PySyft servers, each representing a data silo with its local dataset:

Option 1: Create a custom launcher script

Create a main.py file:

import argparse
from fed_rf_mk.server import launch_datasite

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Launch a single DataSite server independently.")

    parser.add_argument("--name", type=str, required=True, help="The name of the DataSite (e.g., silo1, silo2, etc.)")
    parser.add_argument("--port", type=int, required=True, help="The port number for the DataSite")
    parser.add_argument("--data_path", type=str, required=True, help="Path to the dataset")
    parser.add_argument("--mock_path", type=str, help="Path to mock dataset")

    args = parser.parse_args()

    launch_datasite(name=args.name, port=args.port, data_path=args.data_path, mock_path=args.mock_path)

Then run it:

python main.py --name silo1 --port 8080 --data_path path/to/data1.csv

Option 2: Launch programmatically in your code

from fed_rf_mk.server import launch_datasite

# Launch a server directly from your code
launch_datasite(name="silo1", port=8080, data_path="path/to/data1.csv", mock_path="path/to/mock.csv")

2. Set Up the Federated Learning Client

With the servers running, you can now set up a federated learning client:

from fed_rf_mk.client import FLClient

# Initialize the client
fl_client = FLClient()

# Add training clients with weights (weight parameter is optional)
fl_client.add_train_client(
    name="silo1", 
    url="http://localhost:8080", 
    email="fedlearning@rf.com", 
    password="****", 
    weight=0.4
)
fl_client.add_train_client(
    name="silo2", 
    url="http://localhost:8081", 
    email="fedlearning@rf.com", 
    password="****"
)

# Add evaluation client (doesn't contribute to training but evaluates the model)
fl_client.add_eval_client(
    name="silo3", 
    url="http://localhost:8082", 
    email="fedlearning@rf.com", 
    password="****"
)

3. Configure Data and Model Parameters

Define the parameters for your data preprocessing and Random Forest model:

# Define data parameters
data_params = {
    "target": "target_column",              # Target column name
    "ignored_columns": ["id", "timestamp"]  # Columns to exclude from training
}

# Define model parameters
model_params = {
    "model": None,                  # Initial model (None for first round)
    "n_base_estimators": 100,       # Number of trees for the initial model
    "n_incremental_estimators": 10, # Number of trees to add in each subsequent round
    "train_size": 0.8,              # Proportion of data to use for training
    "test_size": 0.2,               # Proportion of data to use for testing
    "sample_size": None,            # Sample size for training (None uses all available data)
    "fl_epochs": 3                  # Number of federated learning rounds
}

# Set parameters in the client
fl_client.set_data_params(data_params)
fl_client.set_model_params(model_params)

4. Send Requests to Clients

Send the code execution requests to all clients:

fl_client.send_request()

# Check the status of the requests
fl_client.check_status_last_code_requests()

5. Run the Federated Training

Execute the federated learning process:

fl_client.run_model()

This will:

Train local models on each client
Collect and aggregate the models based on weights
Run multiple federated rounds (controlled by the fl_epochs parameter in modelParams)

6. Evaluate the Federated Model

Finally, evaluate the federated model on the evaluation clients:

evaluation_results = fl_client.run_evaluate()
print(evaluation_results)

Complete Example

Below is a complete example workflow based on the provided main.ipynb:

from fed_rf_mk.client import FLClient

# Initialize the client
rf_client = FLClient()

# Connect to data silos
rf_client.add_train_client(name="silo1", url="http://localhost:8080", email="fedlearning@rf.com", password="****", weight=0.4)
rf_client.add_train_client(name="silo2", url="http://localhost:8081", email="fedlearning@rf.com", password="****")
rf_client.add_eval_client(name="silo3", url="http://localhost:8082", email="fedlearning@rf.com", password="****")

# Define parameters
questions = ['Q' + str(i) for i in range(1, 13)]
dataParams = {
    "target": "Q1",
    "ignored_columns": ["patient_id", "source"] + questions
}

modelParams = {
    "model": None,
    "n_base_estimators": 100,
    "n_incremental_estimators": 1,
    "train_size": 0.2,
    "test_size": 0.5,
    "sample_size": None,
    "fl_epochs": 2
}

rf_client.set_data_params(dataParams)
rf_client.set_model_params(modelParams)

# Send requests
rf_client.send_request()
rf_client.check_status_last_code_requests()

# Run federated training
rf_client.run_model()

# Evaluate model
evaluation_results = rf_client.run_evaluate()
print(evaluation_results)

Client Weighting

The package supports weighted aggregation of models based on client importance. You can:

Explicitly assign weights: Provide a weight for each client when adding them:

fl_client.add_train_client(name="silo1", url="url", email="email", password="pwd", weight=0.6)
fl_client.add_train_client(name="silo2", url="url", email="email", password="pwd", weight=0.4)

Mixed weighting: Assign weights to some clients and let others be calculated automatically:

fl_client.add_train_client(name="silo1", url="url", email="email", password="pwd", weight=0.6)
fl_client.add_train_client(name="silo2", url="url", email="email", password="pwd") # Weight will be calculated

Equal weighting: Don't specify any weights, and all clients will receive equal weight.

Understanding the Code Architecture

The package is organized as follows:

client.py: Contains the main FLClient class for orchestrating federated learning
server.py: Provides functions for launching and managing PySyft servers
datasites.py: Handles dataset creation and server setup
datasets.py: Contains utilities for data processing
utils.py: Provides helper functions for visualization and communication

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

PySyft for the secure federated learning framework
scikit-learn for the Random Forest implementation

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.0

Sep 29, 2025

0.0.7rc1 pre-release

Sep 29, 2025

0.0.6

Jul 19, 2025

0.0.6rc3 pre-release

Jul 19, 2025

0.0.6rc2 pre-release

Jul 19, 2025

0.0.6rc1 pre-release

Jul 19, 2025

0.0.5

May 26, 2025

0.0.5rc3 pre-release

May 26, 2025

0.0.5rc2 pre-release

May 26, 2025

0.0.5rc1 pre-release

May 26, 2025

0.0.4

Apr 25, 2025

0.0.4rc2 pre-release

Apr 25, 2025

0.0.4rc1 pre-release

Apr 25, 2025

This version

0.0.3

Mar 12, 2025

0.0.2

Mar 12, 2025

0.0.1

Mar 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fed_rf_mk-0.0.3.tar.gz (13.6 kB view details)

Uploaded Mar 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fed_rf_mk-0.0.3-py3-none-any.whl (12.4 kB view details)

Uploaded Mar 12, 2025 Python 3

File details

Details for the file fed_rf_mk-0.0.3.tar.gz.

File metadata

Download URL: fed_rf_mk-0.0.3.tar.gz
Upload date: Mar 12, 2025
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fed_rf_mk-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`026c1e5b07259f0566ee4130ab6a72f80c98145f19eb84a2c4e755db84559a7c`
MD5	`63f8f264ce93f096005ff907410a9b5c`
BLAKE2b-256	`5974ee9869a4969910d38d98d165fbca9dc036693c2c9ff5632ca35be297c2d6`

See more details on using hashes here.

File details

Details for the file fed_rf_mk-0.0.3-py3-none-any.whl.

File metadata

Download URL: fed_rf_mk-0.0.3-py3-none-any.whl
Upload date: Mar 12, 2025
Size: 12.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fed_rf_mk-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`af121dd9c38f071995e1bb27390506997635129c42717f6e2e284fd6adc4ccbc`
MD5	`b48dc54b1753a5c87a3bde0198d5df91`
BLAKE2b-256	`3cdd22395ea2f3e430a00612e4cdac3052e2cf03f3779432fb1f16d038668e5c`

See more details on using hashes here.

fed-rf-mk 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

fed_rf_mk

Description

Installation

Prerequisites

Installing from PyPI

Installing from Source

Setting Up a Federated Learning Environment

1. Launch Data Silos (Servers)

Option 1: Create a custom launcher script

Option 2: Launch programmatically in your code

2. Set Up the Federated Learning Client

3. Configure Data and Model Parameters

4. Send Requests to Clients

5. Run the Federated Training

6. Evaluate the Federated Model

Complete Example

Client Weighting

Understanding the Code Architecture

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes