Skip to main content

MLDatasetBuilder is a python package which is helping to prepare the image for your ML dataset.

Project description

MLDatasetBuilder

MLDatasetBuilder-Version 1.0.0 - A Python package to build Dataset for Machine Learning Whenever we begin a machine learning project, the first thing that we need is a dataset. Datasets will be the pillar of the training model. You can build the dataset either automatically or manually. MLDatasetBuilder is a python package which is helping to prepare the image for your ML dataset.

python version PyPI Downloads Downloads

Author: Karthick Nagarajan

Email: karthick965938@gmail.com

Installation

We can install MLDatasetBuilder package using this command

pip install MLDatasetBuilder

How to test?

When you run python3 in the terminal, it will produce output like this:

Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Run the following code to you can get the Initialize process output for the MLDatasetBuilder package.

>>> from MLDatasetBuilder import *
>>> MLDatasetBuilder()

package_sample_output

Available Operations

  1. PrepareImage  —  Remove unwanted format images and Rename your images
#PrepareImage(folder_name, image_name)
PrepareImage('images', 'dog')
  1. ExtractImages  —  Extract images from video file
#ExtractImages(video_path, file_name, frame_size)
ExtractImages('video.mp4', 'frame', 10)
#OR
#ExtractImages(video_path, filename)
ExtractImages('video.mp4', 'frame')
#Default FPS will be 5

Step1 — Get images from google

Yes, we can get images from Google. Using the Download All Images browser extension we can easily get images in a few minutes. You can check out here for more details about this extension!

step_1

Step2 — Create a Python file

Once you have downloaded the images using this extension, you can create a python file called test.py the same directory as below.

download_image_folder/
   _14e839ba-9691-11ea-a968-2ed746e9a968.jpg
   5e5f7af12600004018b602c0.jpeg
   A471529_Alice_b-1.jpg
   image1.png
   image2.png
   ...
test.py

Inside the images folder, you can see lots of png images and random filenames.

Step3 — PrepareImage

MLDatasetBuilder provides a method called PrepareImage. Using this method to we can remove the unwanted images and rename your image files which are already you have downloaded from the browser’s extensions.

PrepareImage(folder_name, image_name)
#PrepareImage('images', 'dog')

As per the above code, we need to mention the image folder path and class name.

step_1

After completing the process your image folder structure will look like below

download_image_folder/
   dog_0.jpg
   dog_1.jpg
   dog_2.jpg
   dog_3.png
   dog_4.png
   ...
test.py

This process very helps to annotate your images while labeling. And of course, it will be like one of the standardized things.

Step4 — ExtractImage

MLDatasetBuilder also provides a method called ExtractImages. Using this method we can extract the images from the video files.

download_image_folder/
video.mp4
test.py

As per the below code, we need to mention the video path, folder name, and framesize. Folder name will the class name and framesize’s default value 5 and it’s not mandatory.

ExtractImages(video_path, folder_name, framesize)
#ExtractImages('video.mp4', 'frame', 10)
ExtractImages(video_path, folder_name)
#ExtractImages('video.mp4', 'frame')

step_1

After completing the process your image folder structure will look like below

download_image_folder/
dog/
   dog_0.jpg
   dog_1.jpg
   dog_2.jpg
   dog_3.png
   dog_4.png
   ...
dog.mp4
test.py

Contributing

All issues and pull requests are welcome! To run the code locally, first, fork the repository and then run the following commands on your computer:

git clone https://github.com/<your-username>/ML-Dataset-Builder.git
cd ML-Dataset-Builder
# Recommended creating a virtual environment before the next step
pip3 install -r requirements.txt

When adding code, be sure to write unit tests where necessary.

Contact

MLDatasetBuilder was created by Karthick Nagarajan. Feel free to reach out on Twitter or through Email!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLDatasetBuilder-1.0.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

MLDatasetBuilder-1.0.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file MLDatasetBuilder-1.0.0.tar.gz.

File metadata

  • Download URL: MLDatasetBuilder-1.0.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9

File hashes

Hashes for MLDatasetBuilder-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d3da9bc79b1cb197b3b3ddccfa1d21869859fd03fb9d797a2bc4ce83ed0d3d87
MD5 80a5a9891f9eeddce11a9828bba7ded4
BLAKE2b-256 e97a673dc76911dfe131b440124f5f7529386eef854462411cf624d804677b35

See more details on using hashes here.

File details

Details for the file MLDatasetBuilder-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: MLDatasetBuilder-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9

File hashes

Hashes for MLDatasetBuilder-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db7215d13cdf9d6f2db7918af2ccb2534499c5c1acbc3b69b5cce07252676b74
MD5 f05d0bcd8bcf4e5a28058eadc58ac544
BLAKE2b-256 b9c4fbaf0a137eb260a43099c4717154748f8877c6aacf8027bd142339fcfe28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page