Merlin Dataloader

These details have not been verified by PyPI

Project links

Homepage

Project description

Merlin Dataloader

PyPI - Python Version GitHub License

The merlin-dataloader lets you quickly train recommender models for TensorFlow, PyTorch and JAX. It eliminates the biggest bottleneck in training recommender models, by providing GPU optimized dataloaders that read data directly into the GPU, and then do a 0-copy transfer to TensorFlow and PyTorch using dlpack.

The benefits of the Merlin Dataloader include:

Over 10x speedup over native framework dataloaders
Handles larger than memory datasets
Per-epoch shuffling
Distributed training

Installation

Merlin-dataloader requires Python version 3.7+. Additionally, GPU support requires CUDA 11.0+.

To install using Conda:

conda install -c nvidia -c rapidsai -c numba -c conda-forge merlin-dataloader python=3.7 cudatoolkit=11.2

To install from PyPi:

pip install merlin-dataloader

There are also docker containers on NGC with the merlin-dataloader and dependencies included on them

Basic Usage

# Get a merlin dataset from a set of parquet files
import merlin.io
dataset = merlin.io.Dataset(PARQUET_FILE_PATHS, engine="parquet")

# Create a Tensorflow dataloader from the dataset, loading 65K items
# per batch
from merlin.dataloader.tensorflow import Loader
loader = Loader(dataset, batch_size=65536)

# Get a single batch of data. Inputs will be a dictionary of columnname
# to TensorFlow tensors
inputs, target = next(loader)

# Train a Keras model with the dataloader
model = tf.keras.Model( ... )
model.fit(loader, epochs=5)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

23.8.0

Aug 29, 2023

23.6.0

Jun 22, 2023

23.5.0

May 31, 2023

23.4.0

Apr 26, 2023

23.2.1

Mar 13, 2023

0.0.4

Dec 30, 2022

0.0.3

Nov 23, 2022

0.0.2

Oct 25, 2022

0.0.1

Oct 25, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin-dataloader-23.8.0.tar.gz (46.9 kB view details)

Uploaded Aug 29, 2023 Source

File details

Details for the file merlin-dataloader-23.8.0.tar.gz.

File metadata

Download URL: merlin-dataloader-23.8.0.tar.gz
Upload date: Aug 29, 2023
Size: 46.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for merlin-dataloader-23.8.0.tar.gz
Algorithm	Hash digest
SHA256	`5b2199ab82f9aeaf6cbf728cffe03827547c6af6a780e13e42e81a617f73507b`
MD5	`4326030cf02146e3a4aec433215c4631`
BLAKE2b-256	`b5895a97dceddec86fa1b4510a1e77b41674007d6dfd9dba928bd1f7cc511073`

See more details on using hashes here.

merlin-dataloader 23.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Merlin Dataloader

Installation

Basic Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes