Simple dataset to dataloader library for pytorch
Project description
This is a simple library for creating readable dataset pipelines and reusing best practices for issues such as imbalanced datasets. There are just two components to keep track of: Dataset and Datastream.
Dataset is a simple mapping between an index and an example. It provides pipelining of functions in a readable syntax originally adapted from tensorflow 2’s tf.data.Dataset.
Datastream combines a Dataset and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a torch.utils.data.DataLoader.
Install
poetry add pytorch-datastream
Or, for the old-timers:
pip install pytorch-datastream
Usage
The list below is meant to showcase functions that are useful in most standard and non-standard cases. It is not meant to be an exhaustive list. See the documentation for a more extensive list on API and usage.
Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
.map
.subset
.split
.cache
.with_columns
Datastream.merge
Datastream.zip
Datastream
.map
.data_loader
.zip_index
.update_weights_
.update_example_weight_
.weight
.state_dict
.load_state_dict
Merge / stratify / oversample datastreams
The fruit datastreams given below repeatedly yields the string of its fruit type.
>>> datastream = Datastream.merge([
... (apple_datastream, 2),
... (pear_datastream, 1),
... (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
Zip independently sampled datastreams
The fruit datastreams given below repeatedly yields the string of its fruit type.
>>> datastream = Datastream.zip([
... apple_datastream,
... Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
More usage examples
See the documentation for more usage examples.
Install from source
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pytorch_datastream-0.4.9.tar.gz
.
File metadata
- Download URL: pytorch_datastream-0.4.9.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.8.16 Linux/5.15.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fea6295f3325a1bcc322bdaaee38aabaa372bb683e33f80c9a09366f4068cb4 |
|
MD5 | 1bc0b3ac730303dc8e47c315ca3f365e |
|
BLAKE2b-256 | 8475bee62515346259a4dc70d0265abe2cc1838d3905e45b519ce3593e0bfc81 |
File details
Details for the file pytorch_datastream-0.4.9-py3-none-any.whl
.
File metadata
- Download URL: pytorch_datastream-0.4.9-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.8.16 Linux/5.15.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf6a5d210d0e2d63310dac4086225c366dcd58f619e39cc922dd685d5a8271cc |
|
MD5 | 5cf49621f02cbed684ae339c51bd239e |
|
BLAKE2b-256 | 6a38072ebe0a743d93d1a6e144f8f1dc62e77bec753b7a227c8949ee6a8c32b5 |