Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.5-py39-none-any.whl (29.6 kB view details)

Uploaded Python 3.9

pytorch_datastream-0.4.5-py38-none-any.whl (29.6 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.5-py37-none-any.whl (29.6 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.5-py36-none-any.whl (28.9 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.5-py39-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.5-py39-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pytorch_datastream-0.4.5-py39-none-any.whl
Algorithm Hash digest
SHA256 9fb3d3a8f159d953ee25ff064141de1609bb9a3501ee04aa18e11fb9e6b911db
MD5 723c446201c91f4721489b86695de781
BLAKE2b-256 7fcbe5e71e4762de77d067f15eb38ec74ac4f365b43bbe298f3864ce8607ac3e

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.5-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.5-py38-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for pytorch_datastream-0.4.5-py38-none-any.whl
Algorithm Hash digest
SHA256 622ece41d09201d7044f0c9a3794f4d5bb531dbb33806bf4799d22df5a754c67
MD5 39a920a1e822c218b839c367cac35199
BLAKE2b-256 d4e45fb4b6bd28b4ab7304a357837220e7206c5eb0044a44661d104d318c6029

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.5-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.5-py37-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for pytorch_datastream-0.4.5-py37-none-any.whl
Algorithm Hash digest
SHA256 b12253fac0504af8be607a051586b907ef94eaa7e5cef109a64abebbab306c7e
MD5 49d8f54820b023e46cbf9ec8d9dc190c
BLAKE2b-256 dd3d4edb552a2402670c8ebbef37fbb0427b2464fa0499cae5c2712d3e8ad46f

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.5-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.5-py36-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for pytorch_datastream-0.4.5-py36-none-any.whl
Algorithm Hash digest
SHA256 194140a4c86adc1c4df18ff751aaabe835726c288fb88a7223a3d9854e54e263
MD5 f76e1c66590d85b487f57c116d65dd72
BLAKE2b-256 d5941e380ad6233fcaeb9243339855ae302ad26e7c22cd3fa9383cc359fa0db2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page