Data downloader

These details have not been verified by PyPI

Project links

Homepage

Project description

data-downloader

[toc]

1. Installation

It is recommended to use the latest version of pip when using pip-downloader.

pip install data_downloader

2. Usage

All download functions are in data_downloader.downloader. So import downloader at the beginning.

from data_downloader import downloader

2.1 Netrc

if the website need logging,you can add a record to a .netrc file in your home

To view existing hosts in .netrc file:

netrc = downloader.Netrc()
print(netrc.hosts)

To add a record

netrc.add(host, login, password,account=None)

for NASA data user:

netrc.add('urs.earthdata.nasa.gov','your_username','your_password')

Example:

In [2]: netrc = downloader.Netrc()                                                                                                                    

In [3]: netrc.hosts                                                                                                                                   
Out[3]: {}

In [4]: netrc.add('urs.earthdata.nasa.gov','username','passwd')                                                                            

In [5]: netrc.hosts                                                                                                                                   
Out[5]: {'urs.earthdata.nasa.gov': ('username', None, 'passwd')}

2.1 download_data

Download a single file.

downloader.download_data(url, folder=None, file_name=None, session=None)

Parameters:

url: str
    url of web file
folder: str
    the folder to store output files. Default current folder. 
file_name: str
    the file name. If None, will parse from web response or url
session: requests.Session() object
    session maintaining connection. Default None

Example:

In [6]: url = 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_201
   ...: 41211.geo.unw.tif'
   ...:  
   ...: folder = 'D:\\data'
   ...: downloader.download_data(url,folder)

20141117_20141211.geo.unw.tif:   2%|▌                         | 455k/22.1M [00:52<42:59, 8.38kB/s]

2.2 download_datas

download datas from a list which containing urls

downloader.download_datas(urls, folder=None, file_names=None):

Patameters:

urls:  iterator
    iterator contains urls
folder: str 
    the folder to store output files. Default current folder.
file_names: iterator
    iterator contains names of files. Leaving it None if you want the program 
    to parse them from website

Examples:

In [12]: from data_downloader import downloader 
    ...:  
    ...: urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20
    ...: 141211.geo.unw.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221
    ...: .geo.unw.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.cc.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.unw.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128
    ...: .geo.cc.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317
    ...: .geo.cc.tif', 
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221
    ...: .geo.cc.tif']  
    ...:  
    ...: folder = 'D:\\data' 
    ...: downloader.download_datas(urls,folder)

20141117_20141211.geo.unw.tif:   6%|█▍                     | 1.37M/22.1M [03:09<2:16:31, 2.53kB/s]

2.3 async_download_datas

Download files simultaneously.

downloader.async_download_datas(urls, folder=None, file_names=None, limit=30)

Parameters:

urls:  iterator
    iterator contains urls
folder: str 
    the folder to store output files. Default current folder.
file_names: iterator
    iterator contains names of files. Leaving it None if you want the program 
    to parse them from website 
limit: int
    the number of files downloading simultaneously

Examples:

In [3]: from data_downloader import downloader 
   ...:  
   ...: urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049
   ...: _131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif', 
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif']  
   ...:  
   ...: folder = 'D:\\data' 
   ...: downloader.async_download_datas(urls,folder,limit=3) 

>>> Total:   0%|                                                           | 0/7 [00:00<?, ?it/s]
20141024_20150221.geo.unw.tif:   1%|▏                        | 136k/21.2M [00:39<45:24, 7.75kB/s]
20141024_20150128.geo.cc.tif:   2%|▌                         | 119k/5.42M [01:02<6:47:45, 217B/s]
20141211_20150128.geo.cc.tif:   3%|▊                         | 159k/5.44M [00:36<13:02, 6.75kB/s]
20141117_20141211.geo.unw.tif:   0%|                                 | 0.00/22.1M [00:00<?, ?B/s]
20141117_20150317.geo.cc.tif:   0%|                                  | 0.00/5.44M [00:00<?, ?B/s]
20141117_20150221.geo.cc.tif:   0%|                                  | 0.00/5.47M [00:00<?, ?B/s]
20141024_20150128.geo.unw.tif:   0%|                                 | 0.00/23.4M [00:00<?, ?B/s]

2.4 status_ok

Simultaneously detecting whether the given links are accessable.

downloader.status_ok(urls, limit=200)

Parameters

urls: iterator
    iterator contains urls
limit: int
    the number of urls connecting simultaneously

Return:

a list of results (True or False)

Example:

In [1]:     from data_downloader import downloader 
   ...:     import numpy as np 
   ...:  
   ...:     urls = np.array(['https://www.baidu.com', 
   ...:     'https://www.bai.com/wrongurl', 
   ...:     'https://cn.bing.com/', 
   ...:     'https://bing.com/wrongurl', 
   ...:     'https://bing.com/'] ) 
   ...:  
   ...:     status_ok = downloader.status_ok(urls) 
   ...:     urls_accessable = urls[status_ok] 
   ...:     print(urls_accessable) 

['https://www.baidu.com' 'https://cn.bing.com/' 'https://bing.com/']

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2

Jul 28, 2024

1.1

Apr 14, 2024

1.0

Mar 20, 2024

0.5.2

Mar 20, 2024

0.5.1

Dec 4, 2023

0.4.1

Nov 15, 2022

0.4.0

Nov 11, 2022

0.3.1

Nov 11, 2022

0.3.0

Oct 20, 2021

0.2.6

Aug 30, 2021

0.2.5

May 3, 2021

0.2.4

May 2, 2021

0.2.3

Dec 19, 2020

0.2.2

Dec 7, 2020

0.2.1

Dec 7, 2020

0.2.0

Dec 4, 2020

0.1.4

Oct 30, 2020

0.1.3

Jun 15, 2020

0.1.2

Jun 14, 2020

0.1.1

May 28, 2020

0.1.0

May 27, 2020

0.0.9

May 25, 2020

0.0.8

May 25, 2020

0.0.7

May 25, 2020

0.0.6

May 23, 2020

0.0.5

May 23, 2020

0.0.4

May 18, 2020

0.0.3

May 18, 2020

0.0.2

May 17, 2020

This version

0.0.1

May 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-downloader-0.0.1.tar.gz (6.8 kB view hashes)

Uploaded May 16, 2020 Source

Built Distribution

data_downloader-0.0.1-py3-none-any.whl (7.4 kB view hashes)

Uploaded May 16, 2020 Python 3

Hashes for data-downloader-0.0.1.tar.gz

Hashes for data-downloader-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d595f082c2be91e68925bbb32f226617b06790124486fd724974f96a71799659`
MD5	`d815718a0aff0b0e44653ebc78f59277`
BLAKE2b-256	`b28cef5227edffc5627937fe50730231af0e9c223be460ba5e43fb20633aefb9`

Hashes for data_downloader-0.0.1-py3-none-any.whl

Hashes for data_downloader-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`789ea052911dd4798e2a215b7ac593aead8b1197f1852eb6215bcd278d741cc8`
MD5	`cedc1a9c020619aa13277f3e13a39b1e`
BLAKE2b-256	`5bb54fbcaec6003ac0ceb35e1ba9ae3b6bf6c52465cacd5363148af68cc8f7d0`