Skip to main content

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Project description

Extended Iterable Dataset for PyTorch

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Installation

Directly from PyPI:

pip install torch-exid

Or using Poetry:

poetry add torch-exid

Usage

Begin by subclassing ExtendedIterableDataset and implement the generator method to yield items.

Here's a simple example using an IntegersDataset:

from torch_exid import ExtendedIterableDataset

class IntegersDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            yield n
            n += 1

# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
    print(n)

Constructor Parameters

ExtendedIterableDataset introduces several parameters to provide additional control:

limit: int

Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1 (return all data).

# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
    print(n)

offset: int

Determines the number of initial data points to skip. Default is 0.

# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
    print(n)

shuffle_buffer: int

This specifies the buffer size for shuffling. If greater than 1, data is buffered and shuffled prior to being returned. If set to 1 (default), no shuffling occurs.

# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

shuffle_seed: int

Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:

# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

transforms: List[Callable[[Any], Any]]

A list of transformations to apply to the data. Default is an empty list.

ds = IntegersDataset(
    limit=3,
    transforms=[
        lambda n: n + 1,
        lambda n: n ** 2,
    ],
)

# Will print out "1, 4, 9"
for n in ds:
    print(n)

In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.

Contributing

Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.

Getting started with contributing

Here are the steps to get started with development:

# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid

# Install the project and its dependencies using Poetry:
poetry install

# Spawn a shell within the virtual environment:
poetry shell

# Run tests to ensure everything is working correctly:
pytest tests/

Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_exid-0.1.2.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

torch_exid-0.1.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file torch_exid-0.1.2.tar.gz.

File metadata

  • Download URL: torch_exid-0.1.2.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e224ade4ca10744cc92bbb08150e06aebd213c253306847c861cf0b4f6b895bb
MD5 24cda6703f3bdc8e375613cedaaed2c4
BLAKE2b-256 90e797713d287b51a9850cc08ce1a814b78f84d2f9d608d08c4564627b2fdd89

See more details on using hashes here.

File details

Details for the file torch_exid-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: torch_exid-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5bb92c2a397e647c7cc853550d830ce665ff9e5431c0835bf56531529a982a1f
MD5 635956a7a114b4af5724f11a41b66d6c
BLAKE2b-256 2b6b7ec090110437fc8088195cb40815abe6568bdf4be55222f7b3f6ae813f92

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page