Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

poetry add pytorch-datastream

Or, for the old-timers:

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_datastream-0.4.9.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

pytorch_datastream-0.4.9-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file pytorch_datastream-0.4.9.tar.gz.

File metadata

  • Download URL: pytorch_datastream-0.4.9.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.16 Linux/5.15.0-1036-azure

File hashes

Hashes for pytorch_datastream-0.4.9.tar.gz
Algorithm Hash digest
SHA256 6fea6295f3325a1bcc322bdaaee38aabaa372bb683e33f80c9a09366f4068cb4
MD5 1bc0b3ac730303dc8e47c315ca3f365e
BLAKE2b-256 8475bee62515346259a4dc70d0265abe2cc1838d3905e45b519ce3593e0bfc81

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.9-py3-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.9-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.16 Linux/5.15.0-1036-azure

File hashes

Hashes for pytorch_datastream-0.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 cf6a5d210d0e2d63310dac4086225c366dcd58f619e39cc922dd685d5a8271cc
MD5 5cf49621f02cbed684ae339c51bd239e
BLAKE2b-256 6a38072ebe0a743d93d1a6e144f8f1dc62e77bec753b7a227c8949ee6a8c32b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page