Skip to main content

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Project description

Extended Iterable Dataset for PyTorch

An extension of PyTorch IterableDataset, this package introduces functionalities for shuffling, limiting, and offsetting data.

Installation

Directly from PyPI:

pip install torch-exid

Or using Poetry:

poetry add torch-exid

Usage

Begin by subclassing ExtendedIterableDataset and implement the generator method to yield items.

Here's a simple example using an IntegersDataset:

from torch_exid import ExtendedIterableDataset

class IntegersDataset(ExtendedIterableDataset):
    def generator(self) -> Iterator[int]:
        n = 0
        while True:
            yield n
            n += 1

# Will print out integers 0, 1, ..., 9:
for n in IntegersDataset(limit=10):
    print(n)

Constructor Parameters

ExtendedIterableDataset introduces several parameters to provide additional control:

limit: int

Sets the maximum number of data points to return. If negative, all data points are returned. Default is -1 (return all data).

# Will print out "0, 1, 2"
for n in IntegersDataset(limit=3)
    print(n)

offset: int

Determines the number of initial data points to skip. Default is 0.

# Will print out "2, 3, 4"
for n in IntegersDataset(limit=3, offset=2)
    print(n)

shuffle_buffer: int

This specifies the buffer size for shuffling. If greater than 1, data is buffered and shuffled prior to being returned. If set to 1 (default), no shuffling occurs.

# Will print out "0, 1, 3, 2" for the first time...
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

# ...and 1, 0, 2, 3 second time
for n in IntegersDataset(limit=4, shuffle_buffer=2)
    print(n)

shuffle_seed: int

Defines the seed for the random number generator used in shuffling. If not provided, a random seed is used:

# Will print out "1, 0, 3, 2" both times:
for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

for n in IntegersDataset(limit=4, shuffle_buffer=2, shuffle_seed=42)
    print(n)

transforms: List[Callable[[Any], Any]]

A list of transformations to apply to the data. Default is an empty list.

ds = IntegersDataset(
    limit=3,
    transforms=[
        lambda n: n + 1,
        lambda n: n ** 2,
    ],
)

# Will print out "1, 4, 9"
for n in ds:
    print(n)

In addition to the above, any arguments or keyword arguments for the IterableDataset superclass can also be passed.

Contributing

Contributions are greatly appreciated! Improvement can be made by submitting issues, proposing new features, or submitting pull requests with bug fixes or new functionalities.

Getting started with contributing

Here are the steps to get started with development:

# Clone the repository:
git clone https://github.com/arlegotin/torch_exid.git
cd torch_exid

# Install the project and its dependencies using Poetry:
poetry install

# Spawn a shell within the virtual environment:
poetry shell

# Run tests to ensure everything is working correctly:
pytest tests/

Please ensure all changes are accompanied by relevant unit tests, and that all tests pass before submitting a pull request. This helps maintain the quality and reliability of the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_exid-0.1.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

torch_exid-0.1.3-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file torch_exid-0.1.3.tar.gz.

File metadata

  • Download URL: torch_exid-0.1.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.3.tar.gz
Algorithm Hash digest
SHA256 897cfef90fa2f5876932de6aa1f16bdae188aa327e295d55af97e17201ec9938
MD5 992425ddc728a32b8cc2e9f9972b9ba2
BLAKE2b-256 11859b410c5cda0eaed2009a1cefe3a267f4cc748821b72411db47b90086658a

See more details on using hashes here.

File details

Details for the file torch_exid-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: torch_exid-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.8 Darwin/21.3.0

File hashes

Hashes for torch_exid-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 96ed38974b5f409f84c569336d73f4f2d1c0058097dedf1e8d7adcd44b5fe630
MD5 7fc2478b6672c2d9c2ecc95b493dbb45
BLAKE2b-256 f7aee6c40f3f5fb1bb0a2c05cf73387f469536e4db4628a2d6311005439751ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page