Local-first dataset toolkit for multimodal federated learning artifacts (partition/feature/simulation)
Project description
fedops-dataset
fedops-dataset is a local-first dataset toolkit for multimodal FL experiments (FedMS2-v8 style).
It supports:
- raw data bootstrap (
fetch-raw) - partition/feature/simulation generation (
create-v8) - Python API for runtime-driven usage (
FedOpsLocalDataset)
Python requirement: >=3.8
Install
pip install fedops-dataset
Local-First Quickstart
1) Fetch or setup raw data
# CREMA-D
fedops-dataset fetch-raw --dataset crema_d --data-root /path/to/fed_multimodal/data
# PTB-XL
fedops-dataset fetch-raw --dataset ptb-xl --data-root /path/to/fed_multimodal/data
# Hateful Memes: provide a prepared source folder, then symlink/copy into data-root
fedops-dataset fetch-raw \
--dataset hateful_memes \
--data-root /path/to/fed_multimodal/data \
--hateful-memes-source-dir /path/to/hateful_memes_source \
--hateful-memes-mode symlink
2) Validate raw roots
fedops-dataset check-raw-datasets \
--data-root /path/to/fed_multimodal/data \
--hateful-memes-root /path/to/fed_multimodal/data/hateful_memes
3) Generate v8 artifacts (alpha, ps, pm)
# Dry run first
fedops-dataset create-v8 \
--dataset hateful_memes \
--alpha 0.1 \
--sample-missing-rate 0.2 \
--modality-missing-rate 0.8 \
--repo-root /path/to/fed-multimodal \
--data-root /path/to/fed_multimodal/data \
--hateful-memes-root /path/to/fed_multimodal/data/hateful_memes \
--dry-run
# Real run
fedops-dataset create-v8 \
--dataset hateful_memes \
--alpha 0.1 \
--sample-missing-rate 0.2 \
--modality-missing-rate 0.8 \
--repo-root /path/to/fed-multimodal \
--data-root /path/to/fed_multimodal/data \
--hateful-memes-root /path/to/fed_multimodal/data/hateful_memes
Python API (Runtime-Driven)
Direct local usage
from fedops_dataset import FedOpsLocalDataset
ds = FedOpsLocalDataset(
dataset="hateful_memes",
alpha=0.1,
sample_missing_rate=0.2, # ps
modality_missing_rate=0.8, # pm
repo_root="/path/to/fed-multimodal",
data_root="/path/to/fed_multimodal/data",
hateful_memes_root="/path/to/fed_multimodal/data/hateful_memes",
)
ds.prepare(dry_run=False)
partition = ds.load_partition()
simulation = ds.load_simulation()
client0_records = ds.client_records(0, use_simulation=True)
Flower-style runtime config usage
from fedops_dataset import FedOpsLocalDataset
run_config = {
"repo-root": "/path/to/fed-multimodal",
"data-root": "/path/to/fed_multimodal/data",
"hateful-memes-root": "/path/to/fed_multimodal/data/hateful_memes",
}
# Simulation mode example (Flower simulation engine)
node_config = {"partition-id": 0, "num-partitions": 10}
ds = FedOpsLocalDataset.from_runtime_config(
dataset="crema_d",
alpha=0.1,
sample_missing_rate=0.2,
modality_missing_rate=0.2,
run_config=run_config,
node_config=node_config,
)
mode = ds.node_mode(node_config) # "simulation"
records = ds.client_records_from_node_config(node_config, use_simulation=True)
Path Semantics
- Simulation mode:
- detected when
node_confighaspartition-idandnum-partitions - client records can be resolved from
partition-id
- detected when
- Deployment mode:
- if
node_confighasdata-path, it is used as runtime data root - each node can point to its own local data path
- if
- No hardcoded path is required:
- pass
run_config/node_config, CLI args, or env vars
- pass
Environment Variables
export FEDOPS_REPO_ROOT=/path/to/fed-multimodal
export FEDOPS_OUTPUT_DIR=/path/to/fed-multimodal/fed_multimodal/output
export FEDOPS_DATA_ROOT=/path/to/fed_multimodal/data
export HATEFUL_MEMES_ROOT=/path/to/fed_multimodal/data/hateful_memes
Optional HF Artifact Client
FedOpsDatasetClient remains available if you also host artifacts in a Hugging Face dataset repo.
It is optional for local/original-data mode.
Maintainer Release
cd fedops_dataset
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<pypi-token>
./scripts/publish_pypi.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fedops_dataset-0.3.3.tar.gz.
File metadata
- Download URL: fedops_dataset-0.3.3.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56e878344f4feba9026260305aa4a3b60d65d6eefe96e13415a297a167df63b0
|
|
| MD5 |
6e30dd080d1f22b889da3b45f8e619ee
|
|
| BLAKE2b-256 |
73b8c91e4e7bfed813e19fa5a70aad1d3586c94e95184377c9de2e2a45957a10
|
File details
Details for the file fedops_dataset-0.3.3-py3-none-any.whl.
File metadata
- Download URL: fedops_dataset-0.3.3-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f197b4cd55a5dd35e7b55ce979806d020b8e69c23d7bb5e06f9f9f04e4f6aa6
|
|
| MD5 |
93efc32b49caede72610693bd369342b
|
|
| BLAKE2b-256 |
c029fb5cd1568b19c7408ea914a51f22c76a32d62da9e9c5894ece6057d44a20
|