Skip to main content

A pure-Rust multimodal dataset format + dataloader for PyTorch.

Project description

ferroload

A pure-Rust multimodal dataset format + dataloader for PyTorch.

Ferroload stores images, video, audio, tensors, and rich metadata in a self-contained, shardable on-disk format and serves them to training loops with parallel decode in Rust (the GIL released), SQL-style subsetting, and a one-call loader that drops straight into PyTorch.

Install

pip install ferroload

Optional extras:

pip install "ferroload[hf]"      # HuggingFace import tooling (datasets, pillow, hub)
pip install "ferroload[torch]"   # torch + numpy, for the DataLoader glue

In-Rust video decode is feature-gated (it needs system ffmpeg) and is not in the published wheel — build from source for it:

maturin develop --release --features video

Quickstart

from ferroload import make_loader

dl = make_loader("/data/ds", batch_size=64,
                 columns=["image", "video", "label"],   # kinds resolved from the manifest
                 resize=(224, 224), out="torch")
for epoch in range(epochs):
    dl.set_epoch(epoch)                                  # reshuffle (DDP-aware)
    for batch in dl:
        train_step(batch)                                # batch["image"], batch["video"], batch["label"]

Enrich a dataset with an additive, resumable layer — functions bind to inputs positionally and run once per sample by default, so they're generic:

import ferroload

def mean_color(img):                                     # img <- inputs=["image"]
    return img.mean(axis=(0, 1)).astype("float32")

ds = ferroload.Dataset.open("/data/ds")
ds = ds.map(mean_color, inputs=["image"],
            outputs={"emb": ferroload.Modality("npy")}, name="emb")
ds.read_array(0, "emb")

Documentation

Full docs (Python API, quickstart, Rust core, benchmarks) are built with MkDocs and published via GitHub Pages. See the project repository for links.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferroload-0.1.0.tar.gz (67.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ferroload-0.1.0-cp39-abi3-win_amd64.whl (727.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

ferroload-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (960.7 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

ferroload-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (908.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

ferroload-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (820.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file ferroload-0.1.0.tar.gz.

File metadata

  • Download URL: ferroload-0.1.0.tar.gz
  • Upload date:
  • Size: 67.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ferroload-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d13c536f0f89aa5b36e25327577ca9286d12d6551c9319a07398d441ae8d0338
MD5 dc62566dbe154568220ca4d2849a7e68
BLAKE2b-256 257eda3a85650a815cae0f743a1fef0774a0341fb7a603efedef8d29a61fd053

See more details on using hashes here.

Provenance

The following attestation bundles were made for ferroload-0.1.0.tar.gz:

Publisher: release.yml on midhunharikumar/ferroload

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ferroload-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: ferroload-0.1.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 727.0 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ferroload-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 47a968b6c9376d91b22ebba68dbc81c0c9a0297cff590cf6dcfc957061d283b5
MD5 4de306cff8d4824711d05c95289fd3c7
BLAKE2b-256 9dd85fd144bbeacafb17fa92a062058341fb22d0397f0e43eb0f3fa27baa1a37

See more details on using hashes here.

Provenance

The following attestation bundles were made for ferroload-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: release.yml on midhunharikumar/ferroload

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ferroload-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ferroload-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a56714bccb0a197049bf66783f11150e8bd95f885e670c301bbe802eff3cc2a
MD5 3f86ac1c050a85c5ec2bee86928407b6
BLAKE2b-256 8abfa8d81461541a120ec97b23d1a8df40cc5ce0e69dc5468d74ea0f222f7a04

See more details on using hashes here.

Provenance

The following attestation bundles were made for ferroload-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on midhunharikumar/ferroload

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ferroload-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ferroload-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fdaf1f6e3bcf54b461edaf3274b6b49a4c76bd8acf4658f582a5a868aaf81759
MD5 260e836943179a9cd8e817a1aa916a1e
BLAKE2b-256 19cd1d784484f9fe444b7c87105c4d39a715e9c54a99d1a5591d441ae816d4f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ferroload-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on midhunharikumar/ferroload

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ferroload-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ferroload-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 54ca9a831e9943f8c787d57ca1cd7053167955cdbea6b5675b9e0239a34a9fef
MD5 e9835515b76d6ed6bfb2bbb0e90470c9
BLAKE2b-256 2dbc7733aa06553a89723e8e223a80b1943a653753623a12bd8f500d425e2aef

See more details on using hashes here.

Provenance

The following attestation bundles were made for ferroload-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on midhunharikumar/ferroload

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page