Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.0-py38-none-any.whl (28.5 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.0-py37-none-any.whl (28.5 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.0-py36-none-any.whl (27.8 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.0-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.0-py38-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6

File hashes

Hashes for pytorch_datastream-0.4.0-py38-none-any.whl
Algorithm Hash digest
SHA256 1718c0bc6218090d7f2caf1d3c608c8cbef5f52a053e70755f340ce4884d4171
MD5 87fd1c0a4dfa059329537b8d5e028a88
BLAKE2b-256 63a416957e6acc63cf0d3e5eef107b749c4d86512b883d7fb5b7f0b7056b89f6

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.0-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.0-py37-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9

File hashes

Hashes for pytorch_datastream-0.4.0-py37-none-any.whl
Algorithm Hash digest
SHA256 eea4570d9b40593ba0eac71b97d7aff292dae4232edbe13cbb9a26042d64aa64
MD5 e4823d138ce1b24e3789f8faf328dc8e
BLAKE2b-256 a2e8dbea710fc0e9fe41502d4557745efa465594d9c87f21db6563c090e67243

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.0-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.0-py36-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.6.12

File hashes

Hashes for pytorch_datastream-0.4.0-py36-none-any.whl
Algorithm Hash digest
SHA256 049cb2f6fce2422400dc37b1e9d8af8ce90caf7fe37ad2c78867e7ea9772b999
MD5 3305d438ad03747acd3ad43afcb48c90
BLAKE2b-256 e94b42e7d321e99ebc64b819e00f68a5aeb10f8023cd04ed1f82342e4eb76685

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page