Jittable data loading utilities for JAX.
Project description
Cyreal - Another JAX DataLoader
grainfor the corporations,cyrealfor the people
Pure jax utilities for iterating over finite and infinite datasets without ever touching torch or tensorflow. Dataloaders are fast and support jax.jit, jax.grad, jax.lax.scan, and other function transformations.
Installation
pip install cyreal
The only dependency is jax.
Quick Start
Write fast dataloaders without torch or tensorflow
import jax
import jax.numpy as jnp
from cyreal.transforms import BatchTransform, DevicePutTransform
from cyreal.loader import DataLoader
from cyreal.sources import ArraySource
from cyreal.datasets import MNISTDataset
train_data = MNISTDataset(split="train").as_array_dict()
pipeline = [
# Load dataset into memory-backed array
ArraySource(train_data, ordering="shuffle"),
# Batch it
BatchTransform(batch_size=128),
# Move the batch to the GPU
DevicePutTransform(),
]
loader = DataLoader(pipeline)
state = loader.init_state(jax.random.key(0))
for epoch in range(2):
for _ in range(loader.steps_per_epoch):
batch, state, mask = jax.jit(loader.next)(state)
... # Train your network
Use scan_epoch to jit and avoid boilerplate
model_state = {"params": jnp.array(0)}
def update(model_state, batch, mask):
model_state = {"params": model_state['params'] + 1}
return model_state, None
for epoch in range(2):
state, model_state, _ = loader.scan_epoch(state, model_state, update)
Examples and Documentation
See our documentation for more examples.
- Do you enjoy premature optimization? Why not
jitthe entire training epoch? - For the dirty and impure, we support logging metrics from within a
jitted loader. - Got yourself a huge dataset? Stream from a disk-backed source.
- Afraid of finite datasets? We provide
gymnax-backed data sources for online reinforcement learning. - Starving researcher/temporarily embarrassed hyperscaler? We support continual learning via reservoir sampling and replay buffers.
We also provide full end to end training examples
Speed Test
You can compare the speed to the grain dataloader using this script. This is how long it takes to iterate though one epoch of MNIST
MacBook M4 Pro
| Library | Dataset Device | Batch Device | Method | Time (s) |
|---|---|---|---|---|
grain |
CPU | CPU | Iterator | 1.33 |
cyreal |
CPU | CPU | jit(loader.next) |
0.04 |
cyreal |
CPU | CPU | scan_epoch |
0.09 |
A40 with Wimpy CPU
| Library | Dataset Device | Batch Device | Method | Time (s) |
|---|---|---|---|---|
grain |
CPU | CPU | Iterator | 10.34 |
grain |
CPU | GPU | Iterator | 11.65 |
cyreal |
CPU | CPU | jit(loader.next) |
0.66 |
cyreal |
CPU | GPU | jit(loader.next) |
0.68 |
cyreal |
GPU | GPU | jit(loader.next) |
0.66 |
cyreal |
CPU | CPU | scan_epoch |
3.78 |
cyreal |
CPU | GPU | scan_epoch |
4.00 |
cyreal |
GPU | GPU | scan_epoch |
4.35 |
RTX 5090
| Library | Dataset Device | Batch Device | Method | Time (s) |
|---|---|---|---|---|
grain |
CPU | CPU | Iterator | 3.80 |
grain |
CPU | GPU | Iterator | 4.04 |
cyreal |
CPU | CPU | jit(loader.next) |
0.50 |
cyreal |
CPU | GPU | jit(loader.next) |
0.50 |
cyreal |
GPU | GPU | jit(loader.next) |
0.50 |
cyreal |
CPU | CPU | scan_epoch |
2.71 |
cyreal |
CPU | GPU | scan_epoch |
2.72 |
cyreal |
GPU | GPU | scan_epoch |
2.68 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cyreal-0.2.1.tar.gz.
File metadata
- Download URL: cyreal-0.2.1.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10bfbf1292457ec7399fb523cb76cbd305c5c06ec7967c7939d40183d040d98a
|
|
| MD5 |
3cccb59932bb6089472020100fcc5ce6
|
|
| BLAKE2b-256 |
79bd9c087f6d7c8400ed1cdbfab320df39efddc5515e17a207972a0dc9e9cb51
|
Provenance
The following attestation bundles were made for cyreal-0.2.1.tar.gz:
Publisher:
python-publish.yml on smorad/cyreal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cyreal-0.2.1.tar.gz -
Subject digest:
10bfbf1292457ec7399fb523cb76cbd305c5c06ec7967c7939d40183d040d98a - Sigstore transparency entry: 990932648
- Sigstore integration time:
-
Permalink:
smorad/cyreal@4effb55694de1a38b6ef345c41a02b6f239134eb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/smorad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4effb55694de1a38b6ef345c41a02b6f239134eb -
Trigger Event:
release
-
Statement type:
File details
Details for the file cyreal-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cyreal-0.2.1-py3-none-any.whl
- Upload date:
- Size: 45.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f4d3b658e8aef1503809701ca0d6d29aff10e261b25e8c8eba100eccb1e06e5
|
|
| MD5 |
36328ca304c45242bd79981916e3aec8
|
|
| BLAKE2b-256 |
c61fde6336890006fdf4c542dd023c8f76884651335412c916710d8e6b8751ee
|
Provenance
The following attestation bundles were made for cyreal-0.2.1-py3-none-any.whl:
Publisher:
python-publish.yml on smorad/cyreal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cyreal-0.2.1-py3-none-any.whl -
Subject digest:
7f4d3b658e8aef1503809701ca0d6d29aff10e261b25e8c8eba100eccb1e06e5 - Sigstore transparency entry: 990932653
- Sigstore integration time:
-
Permalink:
smorad/cyreal@4effb55694de1a38b6ef345c41a02b6f239134eb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/smorad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4effb55694de1a38b6ef345c41a02b6f239134eb -
Trigger Event:
release
-
Statement type: