Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.6-py39-none-any.whl (29.4 kB view details)

Uploaded Python 3.9

pytorch_datastream-0.4.6-py38-none-any.whl (29.4 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.6-py37-none-any.whl (29.4 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.6-py36-none-any.whl (28.7 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.6-py39-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.6-py39-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for pytorch_datastream-0.4.6-py39-none-any.whl
Algorithm Hash digest
SHA256 62af2ab2d8781b3175d16f7c2b08abf6d81f7c6e2ddd6da532e973d58077f1e0
MD5 345950c3e2888080aab810a0336d140f
BLAKE2b-256 a3501b4c418479348aaf4fc2fffd5e003d062f19de8a4d919b69703c1a5e74d2

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.6-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.6-py38-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for pytorch_datastream-0.4.6-py38-none-any.whl
Algorithm Hash digest
SHA256 8a3fe464cba45ed235b9b17584f34b14b1dd83bffe564c28bce5baa1d1c36cc9
MD5 90e713c7e003bcb6ab1a5b64814fd839
BLAKE2b-256 903c3ddb9d4147a354cea20442d30bbb187c7e740fd48d59aa9989ae135d21f0

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.6-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.6-py37-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for pytorch_datastream-0.4.6-py37-none-any.whl
Algorithm Hash digest
SHA256 b09d8d1461cfec0f152125d8d88bf931c91cc234479d3ee16e51f4def6d15bf7
MD5 a0c53525ac029dacab1199d9d537a9aa
BLAKE2b-256 7e03f561742603b346bf542f81d52bdaadf14f94acb2560e3bfa4d32d1b7aca8

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.6-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.6-py36-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for pytorch_datastream-0.4.6-py36-none-any.whl
Algorithm Hash digest
SHA256 8e8a1f26d78961b06f701e554b240aa08c6fed94b2869e7507bd315a20ed8ad1
MD5 ca6350acb9bfbf962ffc71bc25b937c6
BLAKE2b-256 747ece437e8a3adc41962ef2582776287bb3c4893e93ccdc1443a0f996f7a615

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page