Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.1-py38-none-any.whl (28.8 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.1-py37-none-any.whl (28.8 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.1-py36-none-any.whl (28.1 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.1-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.1-py38-none-any.whl
  • Upload date:
  • Size: 28.8 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6

File hashes

Hashes for pytorch_datastream-0.4.1-py38-none-any.whl
Algorithm Hash digest
SHA256 fc408b68a86fd623e6183e5caa1eef613ef8b3e6b7988dd45efead67a06aab8a
MD5 2c32b3231db0e3751dda1ccfe7b0f213
BLAKE2b-256 86c60beefef6c515a80d8c89c9ef8a26ca7f2294d390fd22d81cd846cf11d74e

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.1-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.1-py37-none-any.whl
  • Upload date:
  • Size: 28.8 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9

File hashes

Hashes for pytorch_datastream-0.4.1-py37-none-any.whl
Algorithm Hash digest
SHA256 ad14370e3e9590c8b28ae8894cbfd4f56fbba31aa81cde880404c4a13b989a05
MD5 8b40ff956a17b554c2a62fc341ba483f
BLAKE2b-256 e046ae79b0bebf090ca70719032141e158fc6e929cf7c219bfed2171ae4f6f42

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.1-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.1-py36-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.6.12

File hashes

Hashes for pytorch_datastream-0.4.1-py36-none-any.whl
Algorithm Hash digest
SHA256 8a2c6e4683bb2192c07c4dd0bc7a2f9d7517b909807b0e05bc4fb8e9d9bf9296
MD5 a9a760403db03043f7b3f9cc08af15d9
BLAKE2b-256 f6201af79d3dd8180fe99efb8cbaa1b78af5d1fa689fd71ac48e610c08673d3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page