Skip to main content

Lightweight package meant to simplify data processing for Deep Learning

Project description

build-status coverage-status

Melon

Melon is a lightweight package meant to simplify data processing for Deep Learning.
It removes the need for boilerplate code to pre-process the data prior to (model) training, testing and inference.
It aims at standardizing data serialization and manipulation approaches.

The default formats align with the requirements by frameworks such as Tensorflow / PyTorch.
The tool also provides various level of customizations depending on the use-case.

Installation

Install and update using pip:

$ pip install melon

Supported in Python >= 3.4.0

Examples

Images

With default options:

from melon import ImageReader

def train():
    source_dir = "resources/images"
    reader = ImageReader(source_dir)
    X, Y = reader.read()
    ...
    with tf.Session() as s:
        s.run(..., feed_dict = {X_placeholder: X, Y_placeholder: Y})
source_dir directory should contain images that need to be read. See sample directory for reference.
In the sample directory there is an optional labels.txt file that is described in Labeling.

Since number of images may be too large to fit into memory the tool supports batch-processing.

from melon import ImageReader

def train():
    source_dir = "resources/images"
    options = { "batch_size": 32 }
    reader = ImageReader(source_dir, options)
    while reader.has_next():
        X, Y = reader.read()
        ...
This reads images in the batches of 32 until all images are read. If batch_size is not specified then reader.read() will read all images.

With custom options:

from melon import ImageReader

def train():
    source_dir = "resources/images"
    options = { "data_format": "channels_last", "normalize": False }
    reader = ImageReader(source_dir, options)
    ...
This changes format of data to channels-last (each sample will be Height x Width x Channel) and doesn’t normalize the data. See options for available options.

Options

Images

width

Width of the output (pixels). default: 255

height

Height of the output (pixels). default: 255

batch_size

Batch size of each read. default: All images in a directory

data_format

Format of the images data

channels_first - Channel x Height x Width (default)
channels_last - Height x Width x Channel
label_format

Format of the labels data

one_hot - as a matrix, with one-hot vector per image (default)
label - as a vector, with a single label per image
normalize

Normalize data. default: True

num_threads - number of threads for parallel processing

default: Number of cores of the machine

Labeling

In supervised learning each image needs to be mapped to a label.
While the tool supports reading images without labels (e.g. for inference) it also provides a way to label them.

Generating labels file

To generate labels file use the following command:
$ melon generate
> Source dir:
After providing source directory the tool will generate labels file in that directory with blank labels.
Final step is to add a label to each row in the generated file.

For reference see sample labels:
#legend
pedestrian:0
cat:1
parrot:2
car:3
apple tree:4

#map
img275.jpg:1
img324.jpg:2
img551.jpg:3
img928.jpg:1
img999.png:0
img736.png:4
#legend section is optional but #map section is required to map a label to an image.

Format of the labels

Label’s output format can be specified in Custom options. It defaults to one-hot format.

Roadmap

  • Support for textual data (Q1 2019)

  • Support for video data (Q1 2019)

  • Support for reading from AWS S3 (Q2 2019)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

melon-0.1.2.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

melon-0.1.2-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file melon-0.1.2.tar.gz.

File metadata

  • Download URL: melon-0.1.2.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5

File hashes

Hashes for melon-0.1.2.tar.gz
Algorithm Hash digest
SHA256 726154beeacd3f3d3185ca583898ce7a4cf59a228b63d6c703002f34016f0825
MD5 f5bb6b335a1ba1ee1973f0e515e9c0bb
BLAKE2b-256 0e401a41a7a6a44e4c4d838ddc0f39b1c5e12eead764729ece179a1c81a14b46

See more details on using hashes here.

File details

Details for the file melon-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: melon-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5

File hashes

Hashes for melon-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c68a6e890cbcf6026507f48f38165e106a7835ed36a3b407be8e6045b6f92dc5
MD5 a7767f7090446d080d77b4c5a306ee6d
BLAKE2b-256 f7650695d6f77b637790ed87dab22a122a73e936becf846e44c2c22e36de655e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page