Flower Datasets
Project description
Flower Datasets
Flower Datasets (flwr-datasets
) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the Flower Labs
team that also created Flower: A Friendly Federated Learning Framework.
[!TIP] For complete documentation that includes API docs, how-to guides and tutorials, please visit the Flower Datasets Documentation and for full FL example see the Flower Examples page.
Installation
For a complete installation guide visit the Flower Datasets Documentation
pip install flwr-datasets[vision]
Overview
Flower Datasets library supports:
- downloading datasets - choose the dataset from Hugging Face's
datasets
, - partitioning datasets - customize the partitioning scheme,
- creating centralized datasets - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).
Thanks to using Hugging Face's datasets
used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
- Hugging Face,
- PyTorch,
- TensorFlow,
- Numpy,
- Pandas,
- Jax,
- Arrow.
Create custom partitioning schemes or choose from the implemented partitioning schemes:
- Partitioner (the abstract base class)
Partitioner
- IID partitioning
IidPartitioner(num_partitions)
- Dirichlet partitioning
DirichletPartitioner(num_partitions, partition_by, alpha)
- Distribution partitioning
DistributionPartitioner(distribution_array, num_partitions, num_unique_labels_per_partition, partition_by, preassigned_num_samples_per_label, rescale)
- InnerDirichlet partitioning
InnerDirichletPartitioner(partition_sizes, partition_by, alpha)
- Pathological partitioning
PathologicalPartitioner(num_partitions, partition_by, num_classes_per_partition, class_assignment_mode)
- Natural ID partitioning
NaturalIdPartitioner(partition_by)
- Size based partitioning (the abstract base class for the partitioners dictating the division based the number of samples)
SizePartitioner
- Linear partitioning
LinearPartitioner(num_partitions)
- Square partitioning
SquarePartitioner(num_partitions)
- Exponential partitioning
ExponentialPartitioner(num_partitions)
- more to come in the future releases (contributions are welcome).
Comparison of Partitioning Schemes on CIFAR10
PS: This plot was generated using a library function (see flwr_datasets.visualization package for more).
Usage
Flower Datasets exposes the FederatedDataset
abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: load_partition(partition_id, split)
and load_split(split)
.
Here's a basic quickstart example of how to partition the MNIST dataset:
from flwr_datasets import FederatedDataset
from flwr_datasets.partitioners import IidPartitioner
# The train split of the MNIST dataset will be partitioned into 100 partitions
partitioner = IidPartitioner(num_partitions=100)
fds = FederatedDataset("ylecun/mnist", partitioners={"train": partitioner})
partition = fds.load_partition(0)
centralized_data = fds.load_split("test")
For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.
Future release
Here are a few of the things that we will work on in future releases:
- ✅ Support for more datasets (especially the ones that have user id present).
- ✅ Creation of custom
Partitioner
s. - ✅ More out-of-the-box
Partitioner
s. - ✅ Passing
Partitioner
s viaFederatedDataset
'spartitioners
argument. - ✅ Customization of the dataset splitting before the partitioning.
- ✅ Simplification of the dataset transformation to the popular frameworks/types.
- Creation of the synthetic data,
- Support for Vertical FL.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flwr_datasets-0.4.0.tar.gz
.
File metadata
- Download URL: flwr_datasets-0.4.0.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53f4d955c394d1731abb97bd47eaf5d50048f5bd0310548f18082ec2f7004295 |
|
MD5 | 99b5d40a5bd7c00f7b4b32ad20f296a8 |
|
BLAKE2b-256 | 01aefd1148ac7d79d400cb7a15a0cc06cf967f627790bf868c686ebfb05cb546 |
File details
Details for the file flwr_datasets-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: flwr_datasets-0.4.0-py3-none-any.whl
- Upload date:
- Size: 78.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8accc0c88520914d79e826655bee5ce014b124e5e32db1778525202f58d12b8b |
|
MD5 | 6fdacbf2a8d4ebc4976ff23b5e10bb40 |
|
BLAKE2b-256 | 8731d255d03053fd29d0eb1fff99bc983026eec68ce55e7f08853036b9ea9195 |