Skip to main content

Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)

Project description

fedops-dataset

fedops-dataset is a local-first dataset toolkit for multimodal FL experiments (FedMS2-v8 style).

It supports:

  • raw data bootstrap (fetch-raw)
  • partition/feature/simulation generation (create-v8)
  • Python API for runtime-driven usage (FedOpsLocalDataset)

Python requirement: >=3.8

Install

pip install fedops-dataset

Local-First Quickstart

1) Fetch or setup raw data

# CREMA-D
fedops-dataset fetch-raw --dataset crema_d --data-root /path/to/fed_multimodal/data

# PTB-XL
fedops-dataset fetch-raw --dataset ptb-xl --data-root /path/to/fed_multimodal/data

# Hateful Memes (auto-download from HF repo)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-repo-id neuralcatcher/hateful_memes \
  --hateful-memes-revision main

# Hateful Memes (manual prepared source folder)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-source-dir /path/to/hateful_memes_source \
  --hateful-memes-mode symlink

2) Validate raw roots

fedops-dataset check-raw-datasets \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

3) Generate v8 artifacts (alpha, ps, pm)

# Dry run first
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 0.1 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes \
  --dry-run

# Real run
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

Note on alpha:

  • both --alpha 5.0 and --alpha 50 resolve to artifact token alpha50
  • --alpha 0.1 resolves to alpha01

Python API (Runtime-Driven)

Direct local usage

from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="hateful_memes",
    alpha=0.1,
    sample_missing_rate=0.2,   # ps
    modality_missing_rate=0.8, # pm
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed_multimodal/data",
    hateful_memes_root="/path/to/fed_multimodal/data/hateful_memes",
)

ds.prepare(dry_run=False)
partition = ds.load_partition()
simulation = ds.load_simulation()
client0_records = ds.client_records(0, use_simulation=True)

Flower-style runtime config usage

from fedops_dataset import FedOpsLocalDataset

run_config = {
    "repo-root": "/path/to/fed-multimodal",
    "data-root": "/path/to/fed_multimodal/data",
    "hateful-memes-root": "/path/to/fed_multimodal/data/hateful_memes",
}

# Simulation mode example (Flower simulation engine)
node_config = {"partition-id": 0, "num-partitions": 10}

ds = FedOpsLocalDataset.from_runtime_config(
    dataset="crema_d",
    alpha=0.1,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    run_config=run_config,
    node_config=node_config,
)

mode = ds.node_mode(node_config)  # "simulation"
records = ds.client_records_from_node_config(node_config, use_simulation=True)

Path Semantics

  • Simulation mode:
    • detected when node_config has partition-id and num-partitions
    • client records can be resolved from partition-id
  • Deployment mode:
    • if node_config has data-path, it is used as runtime data root
    • each node can point to its own local data path
  • No hardcoded path is required:
    • pass run_config/node_config, CLI args, or env vars

Environment Variables

export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed_multimodal/data/hateful_memes

Optional HF Artifact Client

FedOpsDatasetClient remains available if you also host artifacts in a Hugging Face dataset repo. It is optional for local/original-data mode.

Maintainer Release

cd fedops_dataset
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<pypi-token>
./scripts/publish_pypi.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedops_dataset-0.3.4.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedops_dataset-0.3.4-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file fedops_dataset-0.3.4.tar.gz.

File metadata

  • Download URL: fedops_dataset-0.3.4.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.4.tar.gz
Algorithm Hash digest
SHA256 f3ea39d84cfe007b0172254a5a547d56731c8f345865584296d129984fa00c16
MD5 05bdd83670bf31c33cabf2c7a9e4ad01
BLAKE2b-256 2ccdf51bb5c0f3e8c60556c6c69ae53f3029706476507fc7f9c9bc60299e24f1

See more details on using hashes here.

File details

Details for the file fedops_dataset-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: fedops_dataset-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37bbd79ab25c482f6a2b68d9f8fe42eebef7dd6980263754706fd3da874a5ccd
MD5 e0e3e77edb27c38526149b17c47db218
BLAKE2b-256 2723f1f170c4001a8705216bd96d28fbc506546629949c5daf2aca96f667ecea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page