Skip to main content

A faster dataloader for tensor data.

Project description

Tensor Loader

PyPI PyPI - Python Version PyPI - License

TensorLoader is similar to the combination of PyTorch's TensorDataset and DataLoader. It is faster and has better type hints.

Installation

Install from PyPi:

pip install tensorloader

Install from source:

git clone https://github.com/zhb2000/tensorloader.git
cd tensorloader
pip install .

Usage

This package only contains a TensorLoader class.

from tensorloader import TensorLoader

Use a single tensor as data:

X = torch.tensor(...)
dataloader = TensorLoader(X)
for x in dataloader:
    ...

Use a tuple of tensor as data:

X = torch.tensor(...)
Y = torch.tensor(...)
dataloader = TensorLoader((X, Y))
for x, y in dataloader:  # unpack the batch tuple as x, y
    ...

Use a namedtuple of tensor as data:

from collections import namedtuple

Batch = namedtuple('Batch', ['x', 'y'])
X = torch.tensor(...)
Y = torch.tensor(...)
# set unpack_args=True when using a namedtuple as data
dataloader = TensorLoader(Batch(X, Y), unpack_args=True)
for batch in dataloader:
    assert isinstance(batch, Batch)
    assert isinstance(batch.x, torch.Tensor)
    assert isinstance(batch.y, torch.Tensor)
    x, y = batch
    ...

PS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the documentation of namedtuple.

Speed Test

TensorLoader is much faster than TensorDataset + DataLoader, for it uses vectorized operations instead of creating costly Python lists.

import timeit
import torch
from torch.utils.data import TensorDataset, DataLoader
from tensorloader import TensorLoader

def speed_test(epoch_num: int, **kwargs):
    sample_num = int(1e6)
    X = torch.zeros(sample_num, 10)
    Y = torch.zeros(sample_num)
    tensorloader = TensorLoader((X, Y), **kwargs)
    torchloader = DataLoader(TensorDataset(X, Y), **kwargs)

    def loop(loader):
        for _ in loader:
            pass

    t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)
    t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)
    print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')
>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)
TensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.
>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)
TensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)
TensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)
TensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorloader-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

File details

Details for the file tensorloader-0.1.0.tar.gz.

File metadata

  • Download URL: tensorloader-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for tensorloader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e9b8eb224ef90807538fb92fef183430b883592eb47f71086e1935bae058ab5
MD5 133cf138388e44bcce72332231085fe5
BLAKE2b-256 8f40ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page