Lightweight package meant to simplify data processing for Deep Learning
Project description
|Build-Status|
Melon
=====
| Melon is a lightweight package meant to simplify data processing for Deep Learning.
| It removes the need for boilerplate code to pre-process the data prior to (model) training, testing and inference.
| It aims at standardizing data serialization and manipulation approaches.
|
| The default formats align with the requirements by frameworks such as **Tensorflow** / **PyTorch**.
| The tool also provides various level of customizations depending on the use-case.
Installation
------------
Install and update using `pip`_:
.. code-block:: text
$ pip install melon
Supported in Python >= 3.4.0
.. _pip: https://pip.pypa.io/en/stable/quickstart/
Examples
----------------
**Images**
With default options_:
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
reader = ImageReader(source_dir)
X, Y = reader.read()
...
with tf.Session() as s:
s.run(..., feed_dict = {X_placeholder: X, Y_placeholder: Y})
| ``source_dir`` directory should contain images that need to be read. See |sample-directory| for reference.
| In the sample directory there is an optional ``labels.txt`` file that is described in Labeling_.
-------
Since number of images may be too large to fit into memory the tool supports batch-processing.
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
options = { "batch_size": 32 }
reader = ImageReader(source_dir, options)
while reader.has_next():
X, Y = reader.read()
...
| This reads images in the batches of 32 until all images are read. If ``batch_size`` is not specified then ``reader.read()`` will read all images.
---------------
.. _Custom options:
With custom options_:
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
options = { "data_format": "channels_last", "normalize": False }
reader = ImageReader(source_dir, options)
...
| This changes format of data to ``channels-last`` (each sample will be ``Height x Width x Channel``) and doesn't normalize the data. See options_ for available options.
.. _options:
Options
------------------
**Images**
width
Width of the output (pixels). default: ``255``
height
Height of the output (pixels). default: ``255``
batch_size
Batch size of each read. default: All images in a directory
data_format
Format of the images data
| ``channels_first`` - `Channel x Height x Width` (default)
| ``channels_last`` - `Height x Width x Channel`
label_format
Format of the labels data
| ``one_hot`` - as a matrix, with one-hot vector per image (default)
| ``label`` - as a vector, with a single label per image
normalize
Normalize data. default: ``True``
num_threads - number of threads for parallel processing
default: Number of cores of the machine
.. _Labeling:
Labeling
-----------------
| In supervised learning each image needs to be mapped to a label.
| While the tool supports reading images without labels (e.g. for inference) it also provides a way to label them.
-----
**Generating labels file**
| To generate ``labels`` file use the following command:
.. code-block:: text
$ melon generate
> Source dir:
| After providing source directory the tool will generate ``labels`` file in that directory with blank labels.
| Final step is to add a label to each row in the generated file.
|
| For reference see |sample-labels|:
.. code-block:: text
#legend
pedestrian:0
cat:1
parrot:2
car:3
apple tree:4
#map
img275.jpg:1
img324.jpg:2
img551.jpg:3
img928.jpg:1
img999.png:0
img736.png:4
| ``#legend`` section is optional but ``#map`` section is required to map a label to an image.
-----
**Format of the labels**
Label's format can be specified in `Custom options`_. It defaults to ``one-hot`` format.
Roadmap
-------
- Support for video data
- Support for textual data
.. |Build-Status| image:: https://travis-ci.com/evoneutron/melon.svg?branch=master
:target: https://travis-ci.com/evoneutron/melon
.. |sample-directory| raw:: html
<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/" target="_blank">sample directory</a>
.. |sample-labels| raw:: html
<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/labels.txt" target="_blank">sample labels</a>
Melon
=====
| Melon is a lightweight package meant to simplify data processing for Deep Learning.
| It removes the need for boilerplate code to pre-process the data prior to (model) training, testing and inference.
| It aims at standardizing data serialization and manipulation approaches.
|
| The default formats align with the requirements by frameworks such as **Tensorflow** / **PyTorch**.
| The tool also provides various level of customizations depending on the use-case.
Installation
------------
Install and update using `pip`_:
.. code-block:: text
$ pip install melon
Supported in Python >= 3.4.0
.. _pip: https://pip.pypa.io/en/stable/quickstart/
Examples
----------------
**Images**
With default options_:
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
reader = ImageReader(source_dir)
X, Y = reader.read()
...
with tf.Session() as s:
s.run(..., feed_dict = {X_placeholder: X, Y_placeholder: Y})
| ``source_dir`` directory should contain images that need to be read. See |sample-directory| for reference.
| In the sample directory there is an optional ``labels.txt`` file that is described in Labeling_.
-------
Since number of images may be too large to fit into memory the tool supports batch-processing.
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
options = { "batch_size": 32 }
reader = ImageReader(source_dir, options)
while reader.has_next():
X, Y = reader.read()
...
| This reads images in the batches of 32 until all images are read. If ``batch_size`` is not specified then ``reader.read()`` will read all images.
---------------
.. _Custom options:
With custom options_:
.. code-block:: python
from melon import ImageReader
def train():
source_dir = "resources/images"
options = { "data_format": "channels_last", "normalize": False }
reader = ImageReader(source_dir, options)
...
| This changes format of data to ``channels-last`` (each sample will be ``Height x Width x Channel``) and doesn't normalize the data. See options_ for available options.
.. _options:
Options
------------------
**Images**
width
Width of the output (pixels). default: ``255``
height
Height of the output (pixels). default: ``255``
batch_size
Batch size of each read. default: All images in a directory
data_format
Format of the images data
| ``channels_first`` - `Channel x Height x Width` (default)
| ``channels_last`` - `Height x Width x Channel`
label_format
Format of the labels data
| ``one_hot`` - as a matrix, with one-hot vector per image (default)
| ``label`` - as a vector, with a single label per image
normalize
Normalize data. default: ``True``
num_threads - number of threads for parallel processing
default: Number of cores of the machine
.. _Labeling:
Labeling
-----------------
| In supervised learning each image needs to be mapped to a label.
| While the tool supports reading images without labels (e.g. for inference) it also provides a way to label them.
-----
**Generating labels file**
| To generate ``labels`` file use the following command:
.. code-block:: text
$ melon generate
> Source dir:
| After providing source directory the tool will generate ``labels`` file in that directory with blank labels.
| Final step is to add a label to each row in the generated file.
|
| For reference see |sample-labels|:
.. code-block:: text
#legend
pedestrian:0
cat:1
parrot:2
car:3
apple tree:4
#map
img275.jpg:1
img324.jpg:2
img551.jpg:3
img928.jpg:1
img999.png:0
img736.png:4
| ``#legend`` section is optional but ``#map`` section is required to map a label to an image.
-----
**Format of the labels**
Label's format can be specified in `Custom options`_. It defaults to ``one-hot`` format.
Roadmap
-------
- Support for video data
- Support for textual data
.. |Build-Status| image:: https://travis-ci.com/evoneutron/melon.svg?branch=master
:target: https://travis-ci.com/evoneutron/melon
.. |sample-directory| raw:: html
<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/" target="_blank">sample directory</a>
.. |sample-labels| raw:: html
<a href="https://github.com/evoneutron/melon/tree/master/tests/resources/images/sample/labels.txt" target="_blank">sample labels</a>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
melon-0.1.0.tar.gz
(7.8 kB
view details)
Built Distribution
melon-0.1.0-py3-none-any.whl
(15.1 kB
view details)
File details
Details for the file melon-0.1.0.tar.gz
.
File metadata
- Download URL: melon-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fcca43f39eff975d06867b04d6d224deb2eedaf8996bf92c2c11b28e9343afb |
|
MD5 | 173abcb3dea2dff53b206df970bf5afb |
|
BLAKE2b-256 | 09914616f346a074f94405a7c55b3cec57b8185316a8b4975f440b52cfb56785 |
File details
Details for the file melon-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: melon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3a27555f5bb561305f26a28fbec3a3d6dd6ebc41f6cfeca77e01c2dbb05d005 |
|
MD5 | a30816b19335f02c8f947876789135cd |
|
BLAKE2b-256 | 681b2e37d9a353c1cee9354cc01ff80e955efe17926ab69c526696e912fd2240 |