Skip to main content

Tool to download large dataset from a list of url

Project description

Dataset Downloader

Preview

Dataset_downloader allow you to download large dataset from multiple list of url, from image-net for example. You can split the download into 2 folders, one for the training and one for the testing. File are save into their class name, perfect for model training. It looks something like that:

root:.
|
├───test
│   ├───accerola
│   ├───apple
│   └───lemon
├───train
│   ├───accerola
│   ├───apple
│   └───lemon

Installation

Simply install from pip:

pip install dataset_downloader

Config

Create a dataset.json file with the following content:

{
  "outputTrain": "...",
  "outputTest": "...",
  "ratio": ...,
  "classes": {
    "class1": [
      "http://url1",
      "http://url2"
    ],
    "class2": [
      "http://url1",
      "http://url2"
    ],
    "class3": "list_images.txt"
  }
}
  • outputTrain: Output folder of the training images
  • outputTest: Output folder of the testing images
  • ratio: The ratio of training/testing images. 0.8 correspond of 80% of training images.
  • classes: List of classes with their urls. Urls can be a list of url, a file containing a list of urls or an url containing a list of urls

An exemple of file on a windows computer:

  "outputTrain": "D:/dataset/train",
  "outputTest": "D:/dataset/test",
  "ratio": 0.8,
  "classes": {
    "accerola": [
      "http://tiachea.files.wordpress.com/2008/10/acerolas.jpg",
      "http://www.jardimdeflores.com.br/floresefolhas/JPEGS/A56acerola5.JPG",
      "http://farm2.staticflickr.com/1353/4602150961_177e096984_z.jpg",
    ],
    "apple": [
      "http://www.naturalhealth365.com/images/apple.jpg",
      "http://urbanext.illinois.edu/fruit/images/apple1.jpg",
      "https://www.aroma-zone.com/cms//sites/default/files/plante-acerola.jpg"
    ],
    "lemon": "list_images.txt",
    "watermelon": "https://gist.githubusercontent.com/johnrazeur/645787bc08a5aedd82da9573fbfa169a/raw/49cea1ee1438cecef8ac213b20f24e5ae02d4d78/watermelon.txt"
  }

Run

Simple call the dataset_downloader command:

cd yourdirectory
# You must create the dataset.json file before
dataset_downloader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dataset-downloader, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size dataset_downloader-1.0.0-py3-none-any.whl (5.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size dataset_downloader-1.0.0.tar.gz (3.9 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page