Skip to main content

Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)

Project description

fedops-dataset

fedops-dataset is a local-first dataset toolkit for multimodal FL experiments (FedMS2-v8 style).

It supports:

  • raw data bootstrap (fetch-raw)
  • partition/feature/simulation generation (create-v8)
  • Python API for runtime-driven usage (FedOpsLocalDataset)

Python requirement: >=3.8

Install

pip install fedops-dataset

Local-First Quickstart

1) Fetch or setup raw data

# CREMA-D
fedops-dataset fetch-raw --dataset crema_d --data-root /path/to/fed_multimodal/data

# PTB-XL
fedops-dataset fetch-raw --dataset ptb-xl --data-root /path/to/fed_multimodal/data

# Hateful Memes (direct public git download; default)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-repo-id neuralcatcher/hateful_memes \
  --hateful-memes-revision main \
  --hateful-memes-fetch-method git

# Hateful Memes (direct archive URL)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-fetch-method archive \
  --hateful-memes-archive-url https://<host>/hateful_memes.zip

# Hateful Memes (manual prepared source folder)
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-source-dir /path/to/hateful_memes_source \
  --hateful-memes-mode symlink

2) Validate raw roots

fedops-dataset check-raw-datasets \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

3) Generate v8 artifacts (alpha, ps, pm)

# Dry run first
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 0.1 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes \
  --dry-run

# Real run
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.8 \
  --repo-root /path/to/fed-multimodal \
  --data-root /path/to/fed_multimodal/data \
  --hateful-memes-root /path/to/fed_multimodal/data/hateful_memes

Note on alpha:

  • both --alpha 5.0 and --alpha 50 resolve to artifact token alpha50
  • --alpha 0.1 resolves to alpha01

Python API (Runtime-Driven)

Direct local usage

from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="hateful_memes",
    alpha=0.1,
    sample_missing_rate=0.2,   # ps
    modality_missing_rate=0.8, # pm
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed_multimodal/data",
    hateful_memes_root="/path/to/fed_multimodal/data/hateful_memes",
)

ds.prepare(dry_run=False)
partition = ds.load_partition()
simulation = ds.load_simulation()
client0_records = ds.client_records(0, use_simulation=True)

Flower-style runtime config usage

from fedops_dataset import FedOpsLocalDataset

run_config = {
    "repo-root": "/path/to/fed-multimodal",
    "data-root": "/path/to/fed_multimodal/data",
    "hateful-memes-root": "/path/to/fed_multimodal/data/hateful_memes",
}

# Simulation mode example (Flower simulation engine)
node_config = {"partition-id": 0, "num-partitions": 10}

ds = FedOpsLocalDataset.from_runtime_config(
    dataset="crema_d",
    alpha=0.1,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    run_config=run_config,
    node_config=node_config,
)

mode = ds.node_mode(node_config)  # "simulation"
records = ds.client_records_from_node_config(node_config, use_simulation=True)

Path Semantics

  • Simulation mode:
    • detected when node_config has partition-id and num-partitions
    • client records can be resolved from partition-id
  • Deployment mode:
    • if node_config has data-path, it is used as runtime data root
    • each node can point to its own local data path
  • No hardcoded path is required:
    • pass run_config/node_config, CLI args, or env vars

Environment Variables

export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed_multimodal/data/hateful_memes

Optional HF Artifact Client

FedOpsDatasetClient remains available if you also host artifacts in a Hugging Face dataset repo. It is optional for local/original-data mode.

Maintainer Release

cd fedops_dataset
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<pypi-token>
./scripts/publish_pypi.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedops_dataset-0.3.5.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedops_dataset-0.3.5-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file fedops_dataset-0.3.5.tar.gz.

File metadata

  • Download URL: fedops_dataset-0.3.5.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.5.tar.gz
Algorithm Hash digest
SHA256 8f6887b76726118e004f73437103d5a14f134abf6db33c54ad8edf9ff1df5bac
MD5 b4dc4a90adbc0b3d3e276528f860eb98
BLAKE2b-256 2575317b78043f410ee11c4963b300d2119a49c68fa40ccf1ff8b29c8ab743cc

See more details on using hashes here.

File details

Details for the file fedops_dataset-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: fedops_dataset-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0377eda8125ca7590e4a10fa3401e2bc122e193a987a1e1c72685d91f9b66515
MD5 28b860576d0d8c6f60aec9073c915dd2
BLAKE2b-256 07e4402fbddfd47b420dd3404b88a92fabf5b60b7c9b8c5f8df8ac07ed840946

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page