Skip to main content

PyTorch based library focused on data processing and input pipelines in general.

Project description

torchdata Logo


Version Docs Tests Coverage Style PyPI Python PyTorch Docker Roadmap
Version Documentation Tests PyPI Python PyTorch Docker Roadmap

torchdata is PyTorch oriented library focused on data processing and input pipelines in general.

It extends torch.utils.data.Dataset and equips it with functionalities known from tensorflow.data like map or cache (with some additions unavailable in aforementioned) . All of that with minimal interference (single call to super().__init__()) in original PyTorch's datasets.

Functionalities overview:

  • map or apply arbitrary functions to dataset
  • cache allows you to cache data in memory or on disk (even partially, say first 20%)
  • Full torch.utils.data.IterableDataset and torch.utils.data.Dataset support
  • Easy to create custom methods of caching, choosing elements to cache, maps and datasets
  • Concrete and base classes designed for file reading and other general tasks

Quick examples

  • Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:
# Imports assumed
# Example dataset return all 1 labels
class Labels(torchdata.Dataset):
    def __init__(self, length):
        self.length = length
        super().__init__()

    def __getitem__(self, _):
        return 1

    def __len__(self):
        return len(length)


# Convenience class based on torchdata.Dataset
class ImageDataset(torchdata.Files):
    def __getitem__(self, index):
        return Image.open(self.files[index])


images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor()).cache()
)

smoothed_labels = Labels(len(images)).map(lambda label: label - 0.1)

# That's how you concatenate sample-wise
for image, label in images | smoothed_labels:
    pass
  • Cache first 1000 samples in memory, save the rest on disk in folder ./cache:
images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
    # First 1000 samples in memory
    .cache(torchdata.modifiers.UpToIndex(torchdata.cachers.Memory(), 1000))
    # Sample from 1000 to the end saved with Pickle on disk
    .cache(torchdata.modifiers.FromIndex(torchdata.cachers.Pickle("./cache"), 1000))
    # You can define your own cachers, modifiers, see docs
)

To see what else you can do please check torchdata documentation

Installation

pip

Latest release:

pip install --user torchdata

Nightly:

pip install --user torchdata-nightly

Docker

CPU standalone and various versions of GPU enabled images are available at dockerhub.

For CPU quickstart, issue:

docker pull szymonmaszke/torchdata:18.04

Nightly builds are also available, just prefix tag with nightly_. If you are going for GPU image make sure you have nvidia/docker installed and it's runtime set.

Contributing

If you find any issue or you think some functionality may be useful to others and fits this library, please open new Issue or create Pull Request.

To get an overview of something which one can done to help this project, see Roadmap

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchfunc-nightly-1568617362.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

torchfunc_nightly-1568617362-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file torchfunc-nightly-1568617362.tar.gz.

File metadata

  • Download URL: torchfunc-nightly-1568617362.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for torchfunc-nightly-1568617362.tar.gz
Algorithm Hash digest
SHA256 dbbaf24fe32e5b8ab55c4fa3d9ff16444008dadb094deb020a9367ba9c554be6
MD5 0374a3cebb5b8694c02911b0bbf007a7
BLAKE2b-256 f56de122e5da31437f952c9f4f1c09cf763757063cdab59f4a1cffe7ad7b7f1e

See more details on using hashes here.

File details

Details for the file torchfunc_nightly-1568617362-py3-none-any.whl.

File metadata

  • Download URL: torchfunc_nightly-1568617362-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for torchfunc_nightly-1568617362-py3-none-any.whl
Algorithm Hash digest
SHA256 1fd656f7dad878b26a7f304f1bcea032e1b6be0313477eb3208ca88b018250d3
MD5 8cb58161dd0fc108aaa10354bd7093f4
BLAKE2b-256 7a10d6aeab61ffd88bf40c81c71d5c287decfba86f15c0d0f30ea87568db40cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page