Skip to main content

Simple ETL Pipeline for PyTorch

Project description

PyTorch Pipeline: Simple ETL Pipeline for PyTorch

PyTorch Pipeline is a simple ETL framework for PyTorch. It is an alternative to tf.data in TensorFlow

Requirements

  • Python 3.6+
  • PyTorch 1.2+

Installation

To install PyTorch Pipeline:

pip install pytorch_pipeilne

Basic Usage

import pytorch_pipeilne as pp

d = pp.TextDataset('/path/to/your/text')
d.shuffle(buffer_size=100).batch(batch_size=10).first()

Usage with PyTorch

from torch.utils.data import DataLoader
import pytorch_pipeilne as pp


d = pp.Dataset(range(1_000)).parallel().shuffle(100).batch(10)
loader = DataLoader(d, num_workers=4, collate_fn=lambda x: x)
for x in loader:
    ...

Usage with LineFlow

You can use PyTorch Pipeline with pre-defined datasets in LineFlow:

from torch.utils.data import DataLoader
from lineflow.datasets.wikitext import cached_get_wikitext
import pytorch_pipeilne as pp

dataset = cached_get_wikitext('wikitext-2')
# Preprocessing dataset
train_data = pp.Dataset(dataset['train']) \
    .flat_map(lambda x: x.split() + ['<eos>']) \
    .window(35) \
    .parallel() \
    .shuffle(64 * 100) \
    .batch(64)

# Iterating dataset
loader = DataLoader(train_data, num_workers=4, collate_fn=lambda x: x)
for x in loader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-pipeline-0.0.1.tar.gz (14.4 kB view details)

Uploaded Source

File details

Details for the file pytorch-pipeline-0.0.1.tar.gz.

File metadata

  • Download URL: pytorch-pipeline-0.0.1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.7

File hashes

Hashes for pytorch-pipeline-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8c0c421aaf73cb279d5891d3e89f4527fbe144c5d1ee4f6967d4616a9f90a4a2
MD5 95526c1bd0a7d5e5d789140a66b21781
BLAKE2b-256 3dbd4d2d422bdbba7836008ff35bec0b4682d6fd865db0991154d4dcb67862d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page