Skip to main content

Simple dataset to dataloader library for pytorch

Project description

https://badge.fury.io/py/pytorch-datastream.svg https://img.shields.io/pypi/pyversions/pytorch-datastream.svg https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest https://img.shields.io/pypi/l/pytorch-datastream.svg

This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.

Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.

Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.

Install

pip install pytorch-datastream

Usage

The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.

Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
    .map
    .subset
    .split
    .cache
    .with_columns

Datastream.merge
Datastream.zip
Datastream
    .map
    .data_loader
    .zip_index
    .update_weights_
    .update_example_weight_
    .weight
    .state_dict
    .load_state_dict

Merge / stratify / oversample datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.merge([
...     (apple_datastream, 2),
...     (pear_datastream, 1),
...     (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams

The fruit datastreams given below repeatedly yields the string of its fruit type.

>>> datastream = Datastream.zip([
...     apple_datastream,
...     Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples

See the documentation for more usage examples.

Install from source

To patch the code locally for Python 3.6 run patch-python3.6.sh.

$ ./patch-python3.6.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pytorch_datastream-0.4.3-py39-none-any.whl (28.9 kB view details)

Uploaded Python 3.9

pytorch_datastream-0.4.3-py38-none-any.whl (28.9 kB view details)

Uploaded Python 3.8

pytorch_datastream-0.4.3-py37-none-any.whl (28.9 kB view details)

Uploaded Python 3.7

pytorch_datastream-0.4.3-py36-none-any.whl (28.2 kB view details)

Uploaded Python 3.6

File details

Details for the file pytorch_datastream-0.4.3-py39-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.3-py39-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for pytorch_datastream-0.4.3-py39-none-any.whl
Algorithm Hash digest
SHA256 704c7d0551f3c6ed17f654164b19713bdd5c7f923bd3daa063a5ba248547e779
MD5 6d350941eb6fc37c4f61c68b7e42a30e
BLAKE2b-256 57fb5c5a962aee0ce863addd8344cc4ea0d9f81e507e60f1a9c8bac95c602efa

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.3-py38-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.3-py38-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pytorch_datastream-0.4.3-py38-none-any.whl
Algorithm Hash digest
SHA256 5aa7bb3b07257ed923d6fe14f340707fc1a520c77d37df44e12de7de6ad020e3
MD5 aae13c62bd193f647eaf90921aa5a81f
BLAKE2b-256 b32c05a30977223bf05ede5409b893159c96f430e4567e7190afd64ed6f117ce

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.3-py37-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.3-py37-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for pytorch_datastream-0.4.3-py37-none-any.whl
Algorithm Hash digest
SHA256 8a8d810146d7dbb7d6fafe43a6a532083c12a17a5735c756f15502d38de8101f
MD5 43e8e543ec70752e3598fa505069c63a
BLAKE2b-256 c70bccce9fccb93449d0e2d686f09131e88fda4677febadcab664948b48280fb

See more details on using hashes here.

File details

Details for the file pytorch_datastream-0.4.3-py36-none-any.whl.

File metadata

  • Download URL: pytorch_datastream-0.4.3-py36-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.13

File hashes

Hashes for pytorch_datastream-0.4.3-py36-none-any.whl
Algorithm Hash digest
SHA256 29fa31b58135942ae2094eed85c36df281118bff3abca8fe2c76bef382eb6134
MD5 061ed9084eaa0b790d2024108d7d766a
BLAKE2b-256 795740988edb5756845f161bc6235f16cf248c5fb0ad97978c96f91f1b311782

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page