Skip to main content

Minimal data loader for Flax

Project description

loaderx

A compact and high-performance single-machine data loader designed for JAX/Flax.

Why Create loaderx?

While JAX/Flax supports multiple data-loading backends—including PyTorch, TensorFlow, Grain, and jax_dataloader—each comes with notable drawbacks:

  1. Installing large frameworks like PyTorch or TensorFlow just for data loading is often undesirable.
  2. Grain provides a clean API, but its real-world performance can be suboptimal.
  3. jax_dataloader defaults to using GPU memory, which may lead to inefficient memory utilization in some workflows.

Design Philosophy

loaderx is built around several core principles:

  1. A pragmatic approach that prioritizes minimal memory overhead and minimal dependencies.
  2. A strong focus on single-machine training workflows.
  3. We implement based on NumPy semantics, supporting both NumPy (for small to medium datasets) and ArrayRecord (for large-scale datasets) backends. Please note that when using ArrayRecord for writing, the group_size must be set to 1.
  4. An immortal (endless) step-based data loader, rather than the traditional epoch-based design—better aligned with modern ML training practices.

Current Limitations

Currently, loaderx only supports single-host environments and does not yet support multi-host training.

Convert a NumPy tensor to Array_record

This will create a directory containing file shards, which helps improve I/O performance.

import numpy as np
from loaderx import converter

train_data = np.load('train_data.npy',mmap_mode='r')
converter(train_data, 'train_data')

Quick Start

import numpy as np
from loaderx import NPDataset, ARDataset, DataLoader

dataset = ARDataset('train_data')
labelset = NPDataset('xsub/train_label.npy')

print(dataset[0])

loader = DataLoader(dataset, labelset)

for i, batch in enumerate(loader):
    if i >= 256:
        break

print(batch['data'].shape)
print(batch['label'].shape)

Integrating with JAX/Flax

For practical integration examples, please refer to the Data2Latent repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loaderx-0.1.7.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loaderx-0.1.7-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file loaderx-0.1.7.tar.gz.

File metadata

  • Download URL: loaderx-0.1.7.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for loaderx-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f30795be8210b396ea445c8ec7b65177ad67bd5d5bc58327b1d38cd67e17626c
MD5 65520cff9e4244215dc2cecf5ad23703
BLAKE2b-256 300cf2ee4aff4eb20fff2cfcd3a2f4d5d6c60c3052439421a6d603bd1854d49b

See more details on using hashes here.

File details

Details for the file loaderx-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: loaderx-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for loaderx-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f83f2ec8d00dd05e8c58e607005165174254f51ccc3c446bd4b0f5f442b7dfe2
MD5 3fe8153ec962633a9d2d8e6b26608889
BLAKE2b-256 337b7dae63e209ef8fb5783b212bd44a13381cb516d4dadbbd247581c710bd7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page