Skip to main content

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Project description

Extended Iterable Dataset for PyTorch

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Installation

Directly from PyPI:

pip install torch-exid

Or using Poetry:

poetry add torch-exid

Usage

Begin by subclassing ExtendedIterableDataset and implement the generator method to yield items.

Here's a simple example using an IntegersDataset:

from torch_exid import ExtendedIterableDataset

class IntegersDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            yield n
            n += 1

# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
    print(n)

Constructor Parameters

ExtendedIterableDataset introduces several parameters to provide additional control:

limit: int

Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1 (return all data).

# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
    print(n)

offset: int

Determines the number of initial data points to skip. Default is 0.

# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
    print(n)

shuffle_buffer: int

This specifies the buffer size for shuffling. If greater than 1, data is buffered and shuffled prior to being returned. If set to 1 (default), no shuffling occurs.

# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

shuffle_seed: int

Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:

# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

transforms: List[Callable[[Any], Any]]

A list of transformations to apply to the data. Default is an empty list.

ds = IntegersDataset(
    limit=3,
    transforms=[
        lambda n: n + 1,
        lambda n: n ** 2,
    ],
)

# Will print out "1, 4, 9"
for n in ds:
    print(n)

In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.

Contributing

Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.

Getting started with contributing

Here are the steps to get started with development:

# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid

# Install the project and its dependencies using Poetry:
poetry install

# Spawn a shell within the virtual environment:
poetry shell

# Run tests to ensure everything is working correctly:
pytest tests/

Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_exid-0.1.4.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

torch_exid-0.1.4-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file torch_exid-0.1.4.tar.gz.

File metadata

  • Download URL: torch_exid-0.1.4.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e2eef72b75398a81fa60df2b8eb5178ad93453166c89d12be5b4072313d425b2
MD5 5ec66467979e89431d72bc65c5698361
BLAKE2b-256 4b3ab490b9d18fb041e20381ffc97efa9bc89d4c15b60b13e1fb97a3c2b7d4be

See more details on using hashes here.

File details

Details for the file torch_exid-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: torch_exid-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9dd0ffb079669bc84a9127a05a9467cfb1d5f5ade8d50dbdc4092a9c1f0de5b1
MD5 0d4ab66c5c77dc4ae96e57fa72370f2b
BLAKE2b-256 a33a252b3d6feee5ef534dc55fb716e0f4579821136dabc7763a67483de59db1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page