PyTorch based library focused on data processing and input pipelines in general.
Project description
Version | Docs | Tests | Coverage | Style | PyPI | Python | PyTorch | Docker | Roadmap |
---|---|---|---|---|---|---|---|---|---|
torchdata is PyTorch oriented library focused on data processing and input pipelines in general.
It extends torch.utils.data.Dataset
and equips it with
functionalities known from tensorflow.data
like map
or cache
(with some additions unavailable in aforementioned) .
All of that with minimal interference (single call to super().__init__()
) in original
PyTorch's datasets.
Functionalities overview:
map
orapply
arbitrary functions to datasetcache
allows you to cache data in memory or on disk (even partially, say first20%
)- Full
torch.utils.data.IterableDataset
andtorch.utils.data.Dataset
support - Easy to create custom methods of caching, choosing elements to cache, maps and datasets
- Concrete and base classes designed for file reading and other general tasks
Quick examples
- Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:
# Imports assumed
# Example dataset return all 1 labels
class Labels(torchdata.Dataset):
def __init__(self, length):
self.length = length
super().__init__()
def __getitem__(self, _):
return 1
def __len__(self):
return len(length)
# Convenience class based on torchdata.Dataset
class ImageDataset(torchdata.Files):
def __getitem__(self, index):
return Image.open(self.files[index])
images = (
ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor()).cache()
)
smoothed_labels = Labels(len(images)).map(lambda label: label - 0.1)
# That's how you concatenate sample-wise
for image, label in images | smoothed_labels:
pass
- Cache first
1000
samples in memory, save the rest on disk in folder./cache
:
images = (
ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
# First 1000 samples in memory
.cache(torchdata.modifiers.UpToIndex(torchdata.cachers.Memory(), 1000))
# Sample from 1000 to the end saved with Pickle on disk
.cache(torchdata.modifiers.FromIndex(torchdata.cachers.Pickle("./cache"), 1000))
# You can define your own cachers, modifiers, see docs
)
To see what else you can do please check torchdata documentation
Installation
pip
Latest release:
pip install --user torchdata
Nightly:
pip install --user torchdata-nightly
Docker
CPU standalone and various versions of GPU enabled images are available at dockerhub.
For CPU quickstart, issue:
docker pull szymonmaszke/torchdata:18.04
Nightly builds are also available, just prefix tag with nightly_
. If you are going for GPU
image make sure you have
nvidia/docker installed and it's runtime set.
Contributing
If you find any issue or you think some functionality may be useful to others and fits this library, please open new Issue or create Pull Request.
To get an overview of something which one can done to help this project, see Roadmap
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torchfunc-nightly-1568617362.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbbaf24fe32e5b8ab55c4fa3d9ff16444008dadb094deb020a9367ba9c554be6 |
|
MD5 | 0374a3cebb5b8694c02911b0bbf007a7 |
|
BLAKE2b-256 | f56de122e5da31437f952c9f4f1c09cf763757063cdab59f4a1cffe7ad7b7f1e |
Hashes for torchfunc_nightly-1568617362-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fd656f7dad878b26a7f304f1bcea032e1b6be0313477eb3208ca88b018250d3 |
|
MD5 | 8cb58161dd0fc108aaa10354bd7093f4 |
|
BLAKE2b-256 | 7a10d6aeab61ffd88bf40c81c71d5c287decfba86f15c0d0f30ea87568db40cb |