Skip to main content

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Project description

Extended Iterable Dataset for PyTorch

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Installation

Directly from PyPI:

pip install torch_exid

Or using Poetry:

poetry add torch_exid

Usage

Begin by subclassing ExtendedIterableDataset and implement the generator method to yield items.

Here's a simple example using an IntegersDataset:

from torch_exid import ExtendedIterableDataset

class IntegersDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            yield n
            n += 1

# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
    print(n)

Constructor Parameters

ExtendedIterableDataset introduces several parameters to provide additional control:

limit: int

Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1 (return all data).

# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
    print(n)

offset: int

Determines the number of initial data points to skip. Default is 0.

# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
    print(n)

shuffle_buffer: int

This specifies the buffer size for shuffling. If greater than 1, data is buffered and shuffled prior to being returned. If set to 1 (default), no shuffling occurs.

# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

shuffle_seed: int

Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:

# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

transforms: List[Callable[[Any], Any]]

A list of transformations to apply to the data. Default is an empty list.

ds = IntegersDataset(
    limit=3,
    transforms=[
        lambda n: n + 1,
        lambda n: n ** 2,
    ],
)

# Will print out "1, 4, 9"
for n in ds:
    print(n)

In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.

Contributing

Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.

Getting started with contributing

Here are the steps to get started with development:

# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid

# Install the project and its dependencies using Poetry:
poetry install

# Spawn a shell within the virtual environment:
poetry shell

# Run tests to ensure everything is working correctly:
pytest tests/

Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_exid-0.1.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

torch_exid-0.1.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file torch_exid-0.1.1.tar.gz.

File metadata

  • Download URL: torch_exid-0.1.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.1.tar.gz
Algorithm Hash digest
SHA256 91d7d5434196d7900f20bc386834caef2896175fb1bfd89b1e6b38d563a6f0bb
MD5 ebb63cfa80adc24978d9a4732b38d0e7
BLAKE2b-256 f46ea2d8993209dceee7c1a300201a4d9839ac307bdb4e487f2a74cddc9331e8

See more details on using hashes here.

File details

Details for the file torch_exid-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: torch_exid-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0ae8741f053c2a3dec8ed579080f6bfd127ab5e3beff7534ea67f83888a6220f
MD5 5ba4c5ddd2fada86d2d5d26900189afa
BLAKE2b-256 105e64d55d82817deb28922c21ecc7aa6dc2e7e2f84bd4b6132df7d2f1a61cfc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page