Skip to main content

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Project description

Extended Iterable Dataset for PyTorch

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Installation

Directly from PyPI:

pip install torch-exid

Or using Poetry:

poetry add torch-exid

Usage

Begin by subclassing ExtendedIterableDataset and implement the generator method to yield items.

Here's a simple example using an IntegersDataset:

from torch_exid import ExtendedIterableDataset

class IntegersDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            yield n
            n += 1

# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
    print(n)

Constructor Parameters

ExtendedIterableDataset introduces several parameters to provide additional control:

limit: int

Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1 (return all data).

# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
    print(n)

offset: int

Determines the number of initial data points to skip. Default is 0.

# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
    print(n)

shuffle_buffer: int

This specifies the buffer size for shuffling. If greater than 1, data is buffered and shuffled prior to being returned. If set to 1 (default), no shuffling occurs.

# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

shuffle_seed: int

Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:

# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

transforms: List[Callable[[Any], Any]]

A list of transformations to apply to the data. Default is an empty list.

ds = IntegersDataset(
    limit=3,
    transforms=[
        lambda n: n + 1,
        lambda n: n ** 2,
    ],
)

# Will print out "1, 4, 9"
for n in ds:
    print(n)

In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.

Methods

skip_next: Callable[[None], None]

This method allows the skipping of the next item that would be yielded by the generator. Using skip_next will not affect the limit or offset.

class EvensDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            if n % 2 != 0:
                self.skip_next()

            yield n
            n += 1

ds = EvensDataset(limit=5)

# Will print out "0, 2, 4, 6, 8"
for n in ds:
    print(n)

In other words, it allows you to bypass the next item without modifying the overall iteration parameters.

Contributing

Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.

Getting started with contributing

Here are the steps to get started with development:

# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid

# Install the project and its dependencies using Poetry:
poetry install

# Spawn a shell within the virtual environment:
poetry shell

# Run tests to ensure everything is working correctly:
pytest tests/

Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_exid-0.1.5.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

torch_exid-0.1.5-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file torch_exid-0.1.5.tar.gz.

File metadata

  • Download URL: torch_exid-0.1.5.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.5.tar.gz
Algorithm Hash digest
SHA256 83fbae9699e0fc02d0b3dcb1a048e13173fccd0bea2558f6e4e7098ea967cb02
MD5 c1e87f8a806074cb5591fbb2061df046
BLAKE2b-256 9ca4a48bd65de63b506857b5b310ed5ab7dd792ec5c1511191e8c515ae9639b7

See more details on using hashes here.

File details

Details for the file torch_exid-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: torch_exid-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 26751ba4933cce04b1561edb3b6f89cdb3fc1d37c4a5ecf4b6f76a54c42acefd
MD5 1a89db9f499c353e66cfda68ae9ccf4c
BLAKE2b-256 9701cba860ebbb55ff25a87376c33cfa10650c8837b08c02b94e1f3e43d0f004

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page