Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)

These details have not been verified by PyPI

Project description

fedops-dataset

fedops-dataset is a local-first dataset toolkit for multimodal federated learning (FedMS2-v8 style).

It helps you:

fetch raw multimodal datasets
validate dataset roots and expected files
generate FL artifacts (partition, feature, simulation)
load per-client records in Python for Simulation and Deployment workflows

Python requirement: >=3.8

Who This Is For

FL researchers working with multimodal datasets
engineers running FedMS2-style experiments repeatedly
teams that want reproducible alpha / ps / pm artifact generation

What This Package Covers

Raw data bootstrap:

fedops-dataset fetch-raw

Raw path validation:

fedops-dataset check-raw-datasets

Artifact generation:

fedops-dataset create-v8

Runtime loading API:

FedOpsLocalDataset

Supported Datasets

crema_d
hateful_memes
ptb-xl

Default clients:

crema_d: 40
hateful_memes: 40
ptb-xl: 20

Install

pip install fedops-dataset

5-Minute Quickstart

1) Define paths

export REPO_ROOT=/path/to/fed-multimodal
export DATA_ROOT=$REPO_ROOT/fed_multimodal/data
export OUTPUT_DIR=$REPO_ROOT/fed_multimodal/output

2) Fetch raw data

# all supported datasets
fedops-dataset fetch-raw --dataset all --data-root "$DATA_ROOT"

Notes:

hateful_memes default fetch method is direct public git from:
- https://huggingface.co/datasets/neuralcatcher/hateful_memes

3) Validate raw roots

fedops-dataset check-raw-datasets --data-root "$DATA_ROOT"

4) Generate artifacts (example: hateful_memes)

# dry run first
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.2 \
  --repo-root "$REPO_ROOT" \
  --data-root "$DATA_ROOT" \
  --dry-run

# real run
fedops-dataset create-v8 \
  --dataset hateful_memes \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.2 \
  --repo-root "$REPO_ROOT" \
  --data-root "$DATA_ROOT"

5) Load client records in Python

from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="hateful_memes",
    alpha=50,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed-multimodal/fed_multimodal/data",
)

print(ds.is_prepared())
client0 = ds.client_records(0, use_simulation=True)
print(len(client0))

Parameter Semantics

alpha: partition heterogeneity control
sample_missing_rate (ps): sample-level missingness
modality_missing_rate (pm): modality-level missingness

Token naming examples used in artifact filenames:

alpha=0.1 -> alpha01
alpha=5.0 -> alpha50
alpha=50 -> alpha50

So 5.0 and 50 intentionally resolve to the same alpha token.

CLI Guide

`fetch-raw`

Use this to prepare raw datasets under your data root.

fedops-dataset fetch-raw --dataset all --data-root "$DATA_ROOT"

Hateful Memes fetch modes

Default public git mode:

fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root "$DATA_ROOT" \
  --hateful-memes-fetch-method git \
  --hateful-memes-repo-id neuralcatcher/hateful_memes

HF snapshot mode (API-based):

export HF_TOKEN=<optional_token>
fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root "$DATA_ROOT" \
  --hateful-memes-fetch-method hf-snapshot

Archive URL mode:

fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root "$DATA_ROOT" \
  --hateful-memes-fetch-method archive \
  --hateful-memes-archive-url https://<host>/hateful_memes.zip

Manual prepared folder mode:

fedops-dataset fetch-raw \
  --dataset hateful_memes \
  --data-root "$DATA_ROOT" \
  --hateful-memes-source-dir /path/to/hateful_memes_source \
  --hateful-memes-mode symlink

`check-raw-datasets`

fedops-dataset check-raw-datasets --data-root "$DATA_ROOT"

Use this before create-v8 to catch path/file issues early.

`create-v8`

Generates:

partition JSON
feature directories
simulation JSON

fedops-dataset create-v8 \
  --dataset crema_d \
  --alpha 50 \
  --sample-missing-rate 0.2 \
  --modality-missing-rate 0.2 \
  --repo-root "$REPO_ROOT" \
  --data-root "$DATA_ROOT"

Optional controls:

--no-partition
--no-features
--no-simulation
--num-clients <N>
--force
--dry-run

Python API Guide

`FedOpsLocalDataset`

Direct usage

from fedops_dataset import FedOpsLocalDataset

ds = FedOpsLocalDataset(
    dataset="crema_d",
    alpha=50,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    repo_root="/path/to/fed-multimodal",
    data_root="/path/to/fed-multimodal/fed_multimodal/data",
)

if not ds.is_prepared():
    ds.prepare(dry_run=False)

partition = ds.load_partition()
simulation = ds.load_simulation()
records = ds.client_records(0, use_simulation=True)

Runtime config usage (Flower style)

from fedops_dataset import FedOpsLocalDataset

run_config = {
    "repo-root": "/path/to/fed-multimodal",
    "data-root": "/path/to/fed-multimodal/fed_multimodal/data",
}

# Simulation mode
node_config = {"partition-id": 0, "num-partitions": 40}

ds = FedOpsLocalDataset.from_runtime_config(
    dataset="hateful_memes",
    alpha=50,
    sample_missing_rate=0.2,
    modality_missing_rate=0.2,
    run_config=run_config,
    node_config=node_config,
)

mode = ds.node_mode(node_config)  # simulation
records = ds.client_records_from_node_config(node_config, use_simulation=True)

Simulation vs Deployment

Simulation mode:

detect with node_config containing partition-id and num-partitions
use partition-id to resolve client records

Deployment mode:

if node_config has data-path, it is used as data root
each node can point to different local storage

Environment Variables (Optional)

export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed-multimodal/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed-multimodal/fed_multimodal/data/hateful_memes

You can use env vars, CLI args, or runtime config keys. No hardcoded path is required.

Troubleshooting

partition file not found:

run create-v8 first
verify alpha/ps/pm values match existing artifact names

hateful_memes fetch fails in git mode:

ensure git and git-lfs are installed
use hf-snapshot mode as fallback

Raw dataset validation errors:

run check-raw-datasets and follow printed hints

Alpha confusion (5.0 vs 50):

both map to token alpha50
this is intentional for compatibility with existing FedMS2 artifacts

FAQ

Do I need to pass --hateful-memes-root always?

No. By default it resolves to <data-root>/hateful_memes.

Can I use this package without Hugging Face uploads?

Yes. Local-first workflow is the primary mode.

Is FedOpsDatasetClient still available?

Yes. Use it if you also host artifacts in an HF dataset repo.

Maintainer Release

cd fedops_dataset
python -m build
python -m twine check dist/*
python -m twine upload dist/*

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.7

Feb 24, 2026

This version

0.3.6

Feb 24, 2026

0.3.5

Feb 24, 2026

0.3.4

Feb 24, 2026

0.3.3

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedops_dataset-0.3.6.tar.gz (30.5 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fedops_dataset-0.3.6-py3-none-any.whl (30.7 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file fedops_dataset-0.3.6.tar.gz.

File metadata

Download URL: fedops_dataset-0.3.6.tar.gz
Upload date: Feb 24, 2026
Size: 30.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.6.tar.gz
Algorithm	Hash digest
SHA256	`8f7610341ed8b2084d1f9b0f4587a9f9c9185418cdc5fae7909e1c583c3a0804`
MD5	`8f4deb1a929c0935ce2639d63dbcaeab`
BLAKE2b-256	`fb4f20e844187cd374486158f8a795af6ab8fcf40ed1ba4b74d47280ee697495`

See more details on using hashes here.

File details

Details for the file fedops_dataset-0.3.6-py3-none-any.whl.

File metadata

Download URL: fedops_dataset-0.3.6-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for fedops_dataset-0.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c30296baa49f4e27df390e8592c99c76e12ddf3709a8f38f835bf9338d5ecb1d`
MD5	`68606f74cb9cb720e79a34a121439dd1`
BLAKE2b-256	`34c821cc5c46010a21c82112cf7635274c471f6606a8c140a4652bd4a04310d9`

See more details on using hashes here.

fedops-dataset 0.3.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

fedops-dataset

Who This Is For

What This Package Covers

Supported Datasets

Install

5-Minute Quickstart

1) Define paths

2) Fetch raw data

3) Validate raw roots

4) Generate artifacts (example: hateful_memes)

5) Load client records in Python

Parameter Semantics

CLI Guide

fetch-raw

Hateful Memes fetch modes

check-raw-datasets

create-v8

Python API Guide

FedOpsLocalDataset

Direct usage

Runtime config usage (Flower style)

Simulation vs Deployment

Environment Variables (Optional)

Troubleshooting

FAQ

Maintainer Release

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`fetch-raw`

`check-raw-datasets`

`create-v8`

`FedOpsLocalDataset`