Simple ETL Pipeline for PyTorch
Project description
PyTorch Pipeline: Simple ETL Pipeline for PyTorch
PyTorch Pipeline is a simple ETL framework for PyTorch. It is an alternative to tf.data in TensorFlow
Requirements
- Python 3.6+
- PyTorch 1.2+
Installation
To install PyTorch Pipeline:
pip install pytorch_pipeilne
Basic Usage
import pytorch_pipeilne as pp d = pp.TextDataset('/path/to/your/text') d.shuffle(buffer_size=100).batch(batch_size=10).first()
Usage with PyTorch
from torch.utils.data import DataLoader import pytorch_pipeilne as pp d = pp.Dataset(range(1_000)).parallel().shuffle(100).batch(10) loader = DataLoader(d, num_workers=4, collate_fn=lambda x: x) for x in loader: ...
Usage with LineFlow
You can use PyTorch Pipeline with pre-defined datasets in LineFlow:
from torch.utils.data import DataLoader from lineflow.datasets.wikitext import cached_get_wikitext import pytorch_pipeilne as pp dataset = cached_get_wikitext('wikitext-2') # Preprocessing dataset train_data = pp.Dataset(dataset['train']) \ .flat_map(lambda x: x.split() + ['<eos>']) \ .window(35) \ .parallel() \ .shuffle(64 * 100) \ .batch(64) # Iterating dataset loader = DataLoader(train_data, num_workers=4, collate_fn=lambda x: x) for x in loader: ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch-pipeline-0.0.1.tar.gz
(14.4 kB
view hashes)