Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.4-py39-none-any.whl (28.9 kB view details)

Uploaded Python 3.9

pytorch_datastream-0.4.4-py38-none-any.whl (28.9 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.4-py37-none-any.whl (28.9 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.4-py36-none-any.whl (28.2 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.4-py39-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.4-py39-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pytorch_datastream-0.4.4-py39-none-any.whl
Algorithm Hash digest
SHA256 94f1c691fe9e6120e65b3b0a739f6414f8d963f9e6585e2a53dd1755ff566a3a
MD5 a5bdd23ca3e3a2915931937bfd0db14e
BLAKE2b-256 947daa8b601f8bd388f1bd1b8a5f94ccea1edbb7f13ece7107cbcbcd7ef925c4

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.4-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.4-py38-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for pytorch_datastream-0.4.4-py38-none-any.whl
Algorithm Hash digest
SHA256 623bac0f5d5356b9f8421c25a687208cb511e21b5a268994694cc7427780b91e
MD5 2e1723886f8c5188046f5a266f286311
BLAKE2b-256 fb72fd3d3cf127752c60a4183a8131f3bb52f48c9b231f77f9e06da89128386d

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.4-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.4-py37-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for pytorch_datastream-0.4.4-py37-none-any.whl
Algorithm Hash digest
SHA256 de5d288230345c440ac2edc93058586dd5ecdc6bce1cfd753a2f416a0c268aba
MD5 5e9e04dcea6f5a5b849f29c43f5fef33
BLAKE2b-256 3323d30037a905704cf72f23da2772fe8b01d975f43e31d07c3746b0bdb36d9d

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.4-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.4-py36-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for pytorch_datastream-0.4.4-py36-none-any.whl
Algorithm Hash digest
SHA256 7deaea8c3bc93bfce33f9218fde7985cd13ac66bb705cbffbe6a65c5c7efcafb
MD5 bb2fecd7f22201150ad3c08004b04a74
BLAKE2b-256 6459445c22b2426aa2b35b05df9d0211d7576b9d99f4154b18e1732350483c36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page