Skip to main content

Lightweight package meant to simplify data processing for Deep Learning

Project description

|Build-Status|

Melon
=====

| Melon is a lightweight package meant to simplify data processing for Deep Learning.

| It removes the need for boilerplate code to pre-process the data prior to (model) training, testing and inference.
| It aims at standardizing data serialization and manipulation approaches.
|
| The default formats align with the requirements by frameworks such as **Tensorflow** / **PyTorch**.
| The tool also provides various level of customizations depending on the use-case.

Installation
------------

Install and update using `pip`_:

.. code-block:: text

$ pip install melon

Supported in Python >= 3.4.0

.. _pip: https://pip.pypa.io/en/stable/quickstart/

Examples
----------------

**Images**

With default options_:

.. code-block:: python

from melon import ImageReader

def train():
source_dir = "resources/images"
reader = ImageReader(source_dir)
X, Y = reader.read()
...
with tf.Session() as s:
s.run(..., feed_dict = {X_placeholder: X, Y_placeholder: Y})

| ``source_dir`` directory should contain images that need to be read. See |sample-directory| for reference.
| In the sample directory there is an optional ``labels.txt`` file that is described in Labeling_.

-------

Since number of images may be too large to fit into memory the tool supports batch-processing.

.. code-block:: python

from melon import ImageReader

def train():
source_dir = "resources/images"
options = { "batch_size": 32 }
reader = ImageReader(source_dir, options)
while reader.has_next():
X, Y = reader.read()
...

| This reads images in the batches of 32 until all images are read. If ``batch_size`` is not specified then ``reader.read()`` will read all images.

---------------

.. _Custom options:

With custom options_:

.. code-block:: python

from melon import ImageReader

def train():
source_dir = "resources/images"
options = { "data_format": "channels_last", "normalize": False }
reader = ImageReader(source_dir, options)
...

| This changes format of data to ``channels-last`` (each sample will be ``Height x Width x Channel``) and doesn't normalize the data. See options_ for available options.

.. _options:

Options
------------------

**Images**

width
Width of the output (pixels). default: ``255``

height
Height of the output (pixels). default: ``255``

batch_size
Batch size of each read. default: All images in a directory

data_format
Format of the images data

| ``channels_first`` - `Channel x Height x Width` (default)
| ``channels_last`` - `Height x Width x Channel`

label_format
Format of the labels data

| ``one_hot`` - as a matrix, with one-hot vector per image (default)
| ``label`` - as a vector, with a single label per image


normalize
Normalize data. default: ``True``

num_threads - number of threads for parallel processing
default: Number of cores of the machine

.. _Labeling:

Labeling
-----------------

| In supervised learning each image needs to be mapped to a label.
| While the tool supports reading images without labels (e.g. for inference) it also provides a way to label them.

-----

**Generating labels file**

| To generate ``labels`` file use the following command:

.. code-block:: text

$ melon generate
> Source dir:

| After providing source directory the tool will generate ``labels`` file in that directory with blank labels.
| Final step is to add a label to each row in the generated file.
|
| For reference see |sample-labels|:

.. code-block:: text

#legend
pedestrian:0
cat:1
parrot:2
car:3
apple tree:4

#map
img275.jpg:1
img324.jpg:2
img551.jpg:3
img928.jpg:1
img999.png:0
img736.png:4

| ``#legend`` section is optional but ``#map`` section is required to map a label to an image.

-----

**Format of the labels**

Label's format can be specified in `Custom options`_. It defaults to ``one-hot`` format.

Roadmap
-------

- Support for video data

- Support for textual data



.. |Build-Status| image:: https://travis-ci.com/evoneutron/melon.svg?branch=master
:target: https://travis-ci.com/evoneutron/melon

.. |sample-directory| raw:: html

<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/" target="_blank">sample directory</a>

.. |sample-labels| raw:: html

<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/labels.txt" target="_blank">sample labels</a>



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

melon-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

melon-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file melon-0.1.0.tar.gz.

File metadata

  • Download URL: melon-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5

File hashes

Hashes for melon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3fcca43f39eff975d06867b04d6d224deb2eedaf8996bf92c2c11b28e9343afb
MD5 173abcb3dea2dff53b206df970bf5afb
BLAKE2b-256 09914616f346a074f94405a7c55b3cec57b8185316a8b4975f440b52cfb56785

See more details on using hashes here.

File details

Details for the file melon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: melon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5

File hashes

Hashes for melon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3a27555f5bb561305f26a28fbec3a3d6dd6ebc41f6cfeca77e01c2dbb05d005
MD5 a30816b19335f02c8f947876789135cd
BLAKE2b-256 681b2e37d9a353c1cee9354cc01ff80e955efe17926ab69c526696e912fd2240

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page