Skip to main content

MLDatasetBuilder is a python package which is helping to prepare the image for your ML dataset.

Project description

MLDatasetBuilder

MLDatasetBuilder-Version 1.0.0 - A Python package to build Dataset for Machine Learning Whenever we begin a machine learning project, the first thing that we need is a dataset. Datasets will be the pillar of the training model. You can build the dataset either automatically or manually. MLDatasetBuilder is a python package which is helping to prepare the image for your ML dataset.

python version PyPI Downloads Downloads

Author: Karthick Nagarajan

Email: karthick965938@gmail.com

Installation

We can install MLDatasetBuilder package using this command

pip install MLDatasetBuilder

How to test?

When you run python3 in the terminal, it will produce output like this:

Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Run the following code to you can get the Initialize process output for the MLDatasetBuilder package.

>>> from MLDatasetBuilder import *
>>> MLDatasetBuilder()

package_sample_output

Available Operations

  1. PrepareImage  —  Remove unwanted format images and Rename your images
#PrepareImage(folder_name, image_name)
PrepareImage('images', 'dog')
  1. ExtractImages  —  Extract images from video file
#ExtractImages(video_path, file_name, frame_size)
ExtractImages('video.mp4', 'frame', 10)
#OR
#ExtractImages(video_path, filename)
ExtractImages('video.mp4', 'frame')
#Default FPS will be 5

Step1 — Get images from google

Yes, we can get images from Google. Using the Download All Images browser extension we can easily get images in a few minutes. You can check out here for more details about this extension!

step_1

Step2 — Create a Python file

Once you have downloaded the images using this extension, you can create a python file called test.py the same directory as below.

download_image_folder/
   _14e839ba-9691-11ea-a968-2ed746e9a968.jpg
   5e5f7af12600004018b602c0.jpeg
   A471529_Alice_b-1.jpg
   image1.png
   image2.png
   ...
test.py

Inside the images folder, you can see lots of png images and random filenames.

Step3 — PrepareImage

MLDatasetBuilder provides a method called PrepareImage. Using this method to we can remove the unwanted images and rename your image files which are already you have downloaded from the browser’s extensions.

PrepareImage(folder_name, image_name)
#PrepareImage('images', 'dog')

As per the above code, we need to mention the image folder path and class name.

step_1

After completing the process your image folder structure will look like below

download_image_folder/
   dog_0.jpg
   dog_1.jpg
   dog_2.jpg
   dog_3.png
   dog_4.png
   ...
test.py

This process very helps to annotate your images while labeling. And of course, it will be like one of the standardized things.

Step4 — ExtractImage

MLDatasetBuilder also provides a method called ExtractImages. Using this method we can extract the images from the video files.

download_image_folder/
video.mp4
test.py

As per the below code, we need to mention the video path, folder name, and framesize. Folder name will the class name and framesize’s default value 5 and it’s not mandatory.

ExtractImages(video_path, folder_name, framesize)
#ExtractImages('video.mp4', 'frame', 10)
ExtractImages(video_path, folder_name)
#ExtractImages('video.mp4', 'frame')

step_1

After completing the process your image folder structure will look like below

download_image_folder/
dog/
   dog_0.jpg
   dog_1.jpg
   dog_2.jpg
   dog_3.png
   dog_4.png
   ...
dog.mp4
test.py

Contributing

All issues and pull requests are welcome! To run the code locally, first, fork the repository and then run the following commands on your computer:

git clone https://github.com/<your-username>/ML-Dataset-Builder.git
cd ML-Dataset-Builder
# Recommended creating a virtual environment before the next step
pip3 install -r requirements.txt

When adding code, be sure to write unit tests where necessary.

Contact

MLDatasetBuilder was created by Karthick Nagarajan. Feel free to reach out on Twitter or through Email!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLDatasetBuilder-1.0.0.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

MLDatasetBuilder-1.0.0-py3-none-any.whl (5.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page