A faster dataloader for tensor data.
Project description
Tensor Loader
TensorLoader
is similar to the combination of PyTorch's TensorDataset
and DataLoader
. It is faster and has better type hints.
Installation
Install from PyPi:
pip install tensorloader
Install from source:
git clone https://github.com/zhb2000/tensorloader.git
cd tensorloader
pip install .
Usage
This package only contains a TensorLoader
class.
from tensorloader import TensorLoader
Use a single tensor as data:
X = torch.tensor(...)
dataloader = TensorLoader(X)
for x in dataloader:
...
Use a tuple of tensor as data:
X = torch.tensor(...)
Y = torch.tensor(...)
dataloader = TensorLoader((X, Y))
for x, y in dataloader: # unpack the batch tuple as x, y
...
Use a namedtuple of tensor as data:
from collections import namedtuple
Batch = namedtuple('Batch', ['x', 'y'])
X = torch.tensor(...)
Y = torch.tensor(...)
# set unpack_args=True when using a namedtuple as data
dataloader = TensorLoader(Batch(X, Y), unpack_args=True)
for batch in dataloader:
assert isinstance(batch, Batch)
assert isinstance(batch.x, torch.Tensor)
assert isinstance(batch.y, torch.Tensor)
x, y = batch
...
PS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the documentation of namedtuple.
Speed Test
TensorLoader
is much faster than TensorDataset
+ DataLoader
, for it uses vectorized operations instead of creating costly Python lists.
import timeit
import torch
from torch.utils.data import TensorDataset, DataLoader
from tensorloader import TensorLoader
def speed_test(epoch_num: int, **kwargs):
sample_num = int(1e6)
X = torch.zeros(sample_num, 10)
Y = torch.zeros(sample_num)
tensorloader = TensorLoader((X, Y), **kwargs)
torchloader = DataLoader(TensorDataset(X, Y), **kwargs)
def loop(loader):
for _ in loader:
pass
t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)
t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)
print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')
>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)
TensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.
>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)
TensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)
TensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)
TensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file tensorloader-0.1.0.tar.gz
.
File metadata
- Download URL: tensorloader-0.1.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e9b8eb224ef90807538fb92fef183430b883592eb47f71086e1935bae058ab5 |
|
MD5 | 133cf138388e44bcce72332231085fe5 |
|
BLAKE2b-256 | 8f40ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198 |