Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.3.5-py38-none-any.whl (23.8 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.3.5-py37-none-any.whl (23.8 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.3.5-py36-none-any.whl (23.3 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.3.5-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.3.5-py38-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for pytorch_datastream-0.3.5-py38-none-any.whl
Algorithm Hash digest
SHA256 06da8168e2c3722e39871109321969a8a3d8d7b1af6a7cc8fd0d2cc01e20abd8
MD5 fdc62d6bff403eda1fb9a5392e5044d9
BLAKE2b-256 29cfc7c2600a8c90ecfa60f466413b3a53e51946575ef1f508263e22d3af006c

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.3.5-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.3.5-py37-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.9

File hashes

Hashes for pytorch_datastream-0.3.5-py37-none-any.whl
Algorithm Hash digest
SHA256 583ae54e82b2c2163515ea14a8cba6e607cd0fe1bbc90feba03b7cbcb755ba17
MD5 965551a00115cf4fcd95e5aff8392ed9
BLAKE2b-256 3ea5e9ba0e62c833445e80ae26b106fc92575e7a744a598e4dcfda55cd9ead24

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.3.5-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.3.5-py36-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.12

File hashes

Hashes for pytorch_datastream-0.3.5-py36-none-any.whl
Algorithm Hash digest
SHA256 226ee302c9c6188227ec49cee78663aa56eabccdf8feaa2c66da8190e31dc075
MD5 9378e2d2e6bf1e3a18eb656770dc9f9d
BLAKE2b-256 e57279fdc92c104a5be4df046e8122d96c4ac516d07229726cc628358ac0ea25

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page