Skip to main content

Flower Datasets

Project description

Flower Datasets

GitHub license PRs Welcome Build Downloads Slack

Flower Datasets (flwr-datasets) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the Flower Labs team that also created Flower: A Friendly Federated Learning Framework. Flower Datasets library supports:

  • downloading datasets - choose the dataset from Hugging Face's datasets,
  • partitioning datasets - customize the partitioning scheme,
  • creating centralized datasets - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).

Thanks to using Hugging Face's datasets used under the hood, Flower Datasets integrates with the following popular formats/frameworks:

  • Hugging Face,
  • PyTorch,
  • TensorFlow,
  • Numpy,
  • Pandas,
  • Jax,
  • Arrow.

Create custom partitioning schemes or choose from the implemented partitioning schemes:

  • Partitioner (the abstract base class) Partitioner
  • IID partitioning IidPartitioner(num_partitions)
  • Natural ID partitioner NaturalIdPartitioner
  • Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) SizePartitioner
  • Linear partitioner LinearPartitioner
  • Square partitioner SquarePartitioner
  • Exponential partitioner ExponentialPartitioner
  • more to come in future releases.

Installation

With pip

Flower Datasets can be installed from PyPi

pip install flwr-datasets

Install with an extension:

  • for image datasets:
pip install flwr-datasets[vision]
  • for audio datasets:
pip install flwr-datasets[audio]

If you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.

Usage

Flower Datasets exposes the FederatedDataset abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: load_partition(partition_id, split) and load_split(split).

Here's a basic quickstart example of how to partition the MNIST dataset:

from flwr_datasets import FederatedDataset

# The train split of the MNIST dataset will be partitioned into 100 partitions
mnist_fds = FederatedDataset("mnist", partitioners={"train": 100}

mnist_partition_0 = mnist_fds.load_partition(0, "train")

centralized_data = mnist_fds.load_split("test")

For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

Future release

Here are a few of the things that we will work on in future releases:

  • ✅ Support for more datasets (especially the ones that have user id present).
  • ✅ Creation of custom Partitioners.
  • ✅ More out-of-the-box Partitioners.
  • ✅ Passing Partitioners via FederatedDataset's partitioners argument.
  • ✅ Customization of the dataset splitting before the partitioning.
  • Simplification of the dataset transformation to the popular frameworks/types.
  • Creation of the synthetic data,
  • Support for Vertical FL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flwr_datasets-0.1.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flwr_datasets-0.1.0-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file flwr_datasets-0.1.0.tar.gz.

File metadata

  • Download URL: flwr_datasets-0.1.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/23.3.0

File hashes

Hashes for flwr_datasets-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73e522bca15fb4df5a3c63890733bfef930666ebed6be7c6fb9f149b2b78f350
MD5 d1f29fa2c9bf718bc4fb6dc9db012adb
BLAKE2b-256 6c7235d75338fb6faad9525748ed255c02acfd2df60228395458b75e8020c8cf

See more details on using hashes here.

File details

Details for the file flwr_datasets-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flwr_datasets-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/23.3.0

File hashes

Hashes for flwr_datasets-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54e2726b861eb4dc738353a61e1c2433746523e6da5195e84c34c538a2e7657e
MD5 02092e4f06b8e286d056261b78acd789
BLAKE2b-256 c57435e33e080d19a3bdb0d646c49e0e37a4cfd45a66c6d2880e3c5f57a53f56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page