Skip to main content

Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)

Project description

fedops-dataset

fedops-dataset is a local-first dataset toolkit for multimodal FL experiments (FedMS2-v8 style).

It supports:

  • raw data bootstrap (fetch-raw)
  • partition/feature/simulation generation (create-v8)
  • Python API for runtime-driven usage (FedOpsLocalDataset)

Python requirement: >=3.8

Install

pip install fedops-dataset

Local-First Quickstart

1) Fetch or setup raw data

# CREMA-D
fedops-dataset fetch-raw --dataset crema_d --data-root /path/to/fed_multimodal/data

# PTB-XL
fedops-dataset fetch-raw --dataset ptb-xl --data-root /path/to/fed_multimodal/data

# Hateful Memes: provide a prepared source folder, then symlink/copy into data-root
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-source-dir /path/to/hateful_memes_source \
  --hateful-memes-mode symlink

2) Validate raw roots

fedops-dataset check-raw-datasets \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

3) Generate v8 artifacts (alpha, ps, pm)

# Dry run first
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 0.1 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes \
  --dry-run

# Real run
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 0.1 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

Python API (Runtime-Driven)

Direct local usage

from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="hateful_memes",
    alpha=0.1,
    sample_missing_rate=0.2,   # ps
    modality_missing_rate=0.8, # pm
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed_multimodal/data",
    hateful_memes_root="/path/to/fed_multimodal/data/hateful_memes",
)

ds.prepare(dry_run=False)
partition = ds.load_partition()
simulation = ds.load_simulation()
client0_records = ds.client_records(0, use_simulation=True)

Flower-style runtime config usage

from fedops_dataset import FedOpsLocalDataset

run_config = {
    "repo-root": "/path/to/fed-multimodal",
    "data-root": "/path/to/fed_multimodal/data",
    "hateful-memes-root": "/path/to/fed_multimodal/data/hateful_memes",
}

# Simulation mode example (Flower simulation engine)
node_config = {"partition-id": 0, "num-partitions": 10}

ds = FedOpsLocalDataset.from_runtime_config(
    dataset="crema_d",
    alpha=0.1,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    run_config=run_config,
    node_config=node_config,
)

mode = ds.node_mode(node_config)  # "simulation"
records = ds.client_records_from_node_config(node_config, use_simulation=True)

Path Semantics

  • Simulation mode:
    • detected when node_config has partition-id and num-partitions
    • client records can be resolved from partition-id
  • Deployment mode:
    • if node_config has data-path, it is used as runtime data root
    • each node can point to its own local data path
  • No hardcoded path is required:
    • pass run_config/node_config, CLI args, or env vars

Environment Variables

export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed_multimodal/data/hateful_memes

Optional HF Artifact Client

FedOpsDatasetClient remains available if you also host artifacts in a Hugging Face dataset repo. It is optional for local/original-data mode.

Maintainer Release

cd fedops_dataset
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<pypi-token>
./scripts/publish_pypi.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedops_dataset-0.3.3.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedops_dataset-0.3.3-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file fedops_dataset-0.3.3.tar.gz.

File metadata

  • Download URL: fedops_dataset-0.3.3.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.3.tar.gz
Algorithm Hash digest
SHA256 56e878344f4feba9026260305aa4a3b60d65d6eefe96e13415a297a167df63b0
MD5 6e30dd080d1f22b889da3b45f8e619ee
BLAKE2b-256 73b8c91e4e7bfed813e19fa5a70aad1d3586c94e95184377c9de2e2a45957a10

See more details on using hashes here.

File details

Details for the file fedops_dataset-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: fedops_dataset-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9f197b4cd55a5dd35e7b55ce979806d020b8e69c23d7bb5e06f9f9f04e4f6aa6
MD5 93efc32b49caede72610693bd369342b
BLAKE2b-256 c029fb5cd1568b19c7408ea914a51f22c76a32d62da9e9c5894ece6057d44a20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page