Skip to main content

A custom iterable dataset wrapper for use with Hugging Face's SFTTrainer.

Project description

Custom Iterable Dataset

A Python package for wrapping iterable datasets to be used with Hugging Face's SFTTrainer. Just pass your iterable dataset to wrapper.

Installation

pip install custom_iterable_dataset

Usage

from custom_iterable_dataset import CustomIterableDataset
from torch.utils.data import IterableDataset

# Example usage
class MyIterableDataset(IterableDataset):
    def __iter__(self):
        yield {"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}

my_dataset = MyIterableDataset()
my_dataset_len = 1000
custom_dataset = CustomIterableDataset(my_dataset,my_dataset_len)

# Pass custom_dataset to SFTTrainer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

custom_iterable_dataset-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

custom_iterable_dataset-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file custom_iterable_dataset-0.1.0.tar.gz.

File metadata

  • Download URL: custom_iterable_dataset-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for custom_iterable_dataset-0.1.0.tar.gz
Algorithm Hash digest
SHA256 010ccb540443c411ef9f0a7d5c428b049bdbfa6fe74833c117e8d72de79b0be4
MD5 d0a722365bbb4fd0aa7bfb74d7362272
BLAKE2b-256 f727aa6fe43771f263e91f69f54b3d201ac6373636ab213e8ea3d60557945813

See more details on using hashes here.

File details

Details for the file custom_iterable_dataset-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for custom_iterable_dataset-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d3508fbe30eab9e923df872b58ebe7e6348cfb6fbdb91a975395e4d6241e832
MD5 a10614012c33500dee6fae6738908b90
BLAKE2b-256 100973b1ef0cb87cc6b4ae6ebfdfd031577955b5a94e7af9fb19627d04b7254c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page