An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.
Project description
Extended Iterable Dataset for PyTorch
An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.
Installation
Directly from PyPI:
pip install torch-exid
Or using Poetry:
poetry add torch-exid
Usage
Begin by subclassing ExtendedIterableDataset
and implement the generator
method to yield items.
Here's a simple example using an IntegersDataset
:
from torch_exid import ExtendedIterableDataset
class IntegersDataset(ExtendedIterableDataset):
def generator(self) -> Iterator[int]:
n = 0
while True:
yield n
n += 1
# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
print(n)
Constructor Parameters
ExtendedIterableDataset
introduces several parameters to provide additional control:
limit: int
Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1
(return all data).
# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
print(n)
offset: int
Determines the number of initial data points to skip. Default is 0
.
# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
print(n)
shuffle_buffer: int
This specifies the buffer size for shuffling. If greater than 1
, data is buffered and shuffled prior to being returned. If set to 1
(default), no shuffling occurs.
# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
print(n)
# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
print(n)
shuffle_seed: int
Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:
# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
print(n)
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
print(n)
transforms: List[Callable[[Any], Any]]
A list of transformations to apply to the data. Default is an empty list.
ds = IntegersDataset(
limit=3,
transforms=[
lambda n: n + 1,
lambda n: n ** 2,
],
)
# Will print out "1, 4, 9"
for n in ds:
print(n)
In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.
Methods
skip_next: Callable[[None], None]
This method allows the skipping of the next item that would be yielded by the generator
. Using skip_next
will not affect the limit
or offset
.
class EvensDataset(ExtendedIterableDataset):
def generator(self) -> Iterator[int]:
n = 0
while True:
if n % 2 != 0:
self.skip_next()
yield n
n += 1
ds = EvensDataset(limit=5)
# Will print out "0, 2, 4, 6, 8"
for n in ds:
print(n)
In other words, it allows you to bypass the next item without modifying the overall iteration parameters.
Contributing
Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.
Getting started with contributing
Here are the steps to get started with development:
# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid
# Install the project and its dependencies using Poetry:
poetry install
# Spawn a shell within the virtual environment:
poetry shell
# Run tests to ensure everything is working correctly:
pytest tests/
Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch_exid-0.1.5.tar.gz
.
File metadata
- Download URL: torch_exid-0.1.5.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83fbae9699e0fc02d0b3dcb1a048e13173fccd0bea2558f6e4e7098ea967cb02 |
|
MD5 | c1e87f8a806074cb5591fbb2061df046 |
|
BLAKE2b-256 | 9ca4a48bd65de63b506857b5b310ed5ab7dd792ec5c1511191e8c515ae9639b7 |
File details
Details for the file torch_exid-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: torch_exid-0.1.5-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26751ba4933cce04b1561edb3b6f89cdb3fc1d37c4a5ecf4b6f76a54c42acefd |
|
MD5 | 1a89db9f499c353e66cfda68ae9ccf4c |
|
BLAKE2b-256 | 9701cba860ebbb55ff25a87376c33cfa10650c8837b08c02b94e1f3e43d0f004 |