Skip to main content

Creates dataset builder objects

Project description

DPIPE

Documentation

With dpipe you can create ready to use datasets from paths or list of files. You should specify the type and the location of the input and target. The labels are assumed to be the name of the folder containing the file, if you need a dataset for classification.

The inputs and targets can be a list of paths, a path to be explored containing images or videos. For example:

./dataset
  |
  |--cat/img1.png|
  |--cat/img2.png
  |--dog/img1.png
  |--dog/img2.png

The function make_dataset outputs a dpipe.dataset_builder object that has the method to predefined multiprocessing setups based on the recomendation of tensorflow.

method action
dataset_builder.prefetch() Preloads samples on memory
dataset_builder.batch() Creates a batch dataset
dataset_builder.enumerate() Creates a appends an index to the output
dataset_builder.filter() Applies a filter concurrently
dataset_builder.map() Applies a function to each element concurrently
dataset_builder.repeat() Creates a repeated dataset
dataset_builder.shuffle() Shuffles the dataset after a complete run

The dataset can be specified as:

from dpipe import make_dataset
dataset = make_dataset('image','label',x_path='./dataset',x_size=(128,128)).build()

Creating dataset (more options)

Additionally, we defined the dataset from functions or objects. Two use cases are presented here. A dataset can be created from a function and a list of element to parse, for example a list of files and a reading function. For example, if we need are training a denoising autoencoder, we need image noisy and clean image pairs; this can be handled with the function dpipe.from_function:

import glob # to find the files
import matplotlib.image as mpimg # to read the images (you need to install it.)
import numpy as np
from dpipe import from_function

filelist = glob.glob('./dataset','*.png')
def read_file(filename):
    target = mpimg.imread(filename) # read the image
    noisy_image = np.random.randn(target.shape)
    return noisy_image, target
# undetermined shape is used to define dimentions that vary across shamples, in this case the height and the width of the images
dataset = from_function(read_file, filelist, undetermined_shape=((1,2),(1,2))).build()

If you are accessing your data in an object oriented way, you can use dpipe.from_object. In the next example lets consider you want use consume a list of files with records on it via generator function, this can also be handled with dpipe.from_function though. The code should look like this

import os
import pandas as pd
from dpipe import from_object

class Reader():
    def __init__(self,datapath='./dataset'):
        self.filelist = os.listdir(datapath)
    def __len__(self):
        return len(self.filelist)
    def my_reading_function(self,filename):
        df = pd.read_csv(filename)
        for v, t in zip(df.values, df.targets):
            yield v, t
reader = Reader()
dataset = from_object(reader, 'my_reading_function','filelist').build()

The build() function that creates a dataset with arguments ready to use with the fit() method of and tf.model object. This is used like this:

training_ds = from_object(reader_training, 'my_reading_function').shuffle(len(reader_training), reshuffle_each_iteration=True).batch(32).repeat().build()
validation_ds = from_object(reader_validation, 'my_reading_function',training=False).batch(32).build()
model.fit(x=training_ds,validation_data=validation_ds, epochs=10,**training_ds.built_args,**validation_ds_ds.built_args)

Installation

pip install dapipe

It requires to install FFMPEG (here) to work with video formats.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapipe-0.2.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

dapipe-0.2.1-py3.7.egg (39.8 kB view details)

Uploaded Source

File details

Details for the file dapipe-0.2.1.tar.gz.

File metadata

  • Download URL: dapipe-0.2.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for dapipe-0.2.1.tar.gz
Algorithm Hash digest
SHA256 370d9342b33caccda881e6cc290d898753c5ddce14c0c01986803d6e9fb6d97a
MD5 40431e570e28fe4d1210ce19018f8ff2
BLAKE2b-256 fa131b6b7acc8a69fe4e2d863b65373efc6414965af323bcea70f2b519149c2a

See more details on using hashes here.

File details

Details for the file dapipe-0.2.1-py3.7.egg.

File metadata

  • Download URL: dapipe-0.2.1-py3.7.egg
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for dapipe-0.2.1-py3.7.egg
Algorithm Hash digest
SHA256 39a486a4aa32dcb6cc0df342184498abd6a4f0460f500ca2e8530772fe052c57
MD5 8c15c287acaf2fd87f0943dce0efc724
BLAKE2b-256 27b302936d264075f27f256ca721d1a4037de48697128821ab1a382f52d08901

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page