Skip to main content

The Satellite Imagery DataSet Toolkit

Project description


DARTH

The Satellite Imagery DataSet Toolkit

Install

pip install d-arth

Addition Huawei OBS Server Package (Optional)

After version 0.0.3, the process will install OBS package automatically.

cd obs-src
python setup.py install

Requirement:

After my test installation, i recommand you install 'GDAL' with conda/miniconda first not pip.

if you want see the detail of requirements or install that manually:

pip install -f requirement.txt

Useage:

Guide to Start Document

The Satellite Imagery DataSet is important part to train, validation the model of different mission. This toolkit work for download different datasources and use specific layer (class) in OSM Vector Data to generate dataset for train or validation model.

Support Vector Datasource type:

  • MBTiles
  • Shapefile
  • Pbf
  • Geojson

First of all , the layer name & class should be know as prior knowledge that mean the same class maybe has different keyword in OSM data and definition. like 'water' as a classname, but same classname in OSM data will be 'waterway','water','lake'...

Support Raster Dataset key:

  • Google
  • Google China,
  • Google Maps,
  • Google Satellite,
  • Google Terrain,
  • Google Terrain Hybrid,
  • Google Satellite Hybrid
  • Stamen Terrain
  • Stamen Toner
  • Stamen Toner Light
  • Stamen Watercolor
  • Wikimedia Map
  • Wikimedia Hike Bike Map
  • Esri Boundaries Places
  • Esri Gray (dark)
  • Esri Gray (light)
  • Esri National Geographic
  • Esri Ocean,
  • Esri Satellite,
  • Esri Standard,
  • Esri Terrain,
  • Esri Transportation,
  • Esri Topo World,
  • OpenStreetMap Standard,
  • OpenStreetMap H.O.T.,
  • OpenStreetMap Monochrome,
  • OpenTopoMap,
  • Strava All,
  • Strava Run,
  • Open Weather Map Temperature,
  • Open Weather Map Clouds,
  • Open Weather Map Wind Speed,
  • CartoDb Dark Matter,
  • CartoDb Positron,
  • Bing VirtualEarth

Usage:

Step 1:

Download the tile file is the first step. But the almost data resources supporter didn't write the projection information to tile file. So the compute tile projection infomation & write to file is most import part in process of download flow.

Optional :

from darth.downloader import downloader
Google=downloader("Google Satellite",thread_count=8)

Ues the 8 thread(if you have more thread could set more) to process download that will speed up whole flow of step1.


from darth.downloader import downloader
Google=downloader("Google Satellite")
# ---------------------------------------------------------------------------- #
#                            MAP Production Toolkit                            #
# ---------------------------------------------------------------------------- #
# ---------------------- MAP Serverv Init Successful by ---------------------- #
# ---------------------- Google Satellite ------------------------------------ #

Demo:

We could choose a position like the Saltlake city, Utah states.

Salt Lake City is located in United States country, in North America continent (or region). DMS latitude longitude coordinates for Salt Lake City are: 40°45'38.81"N, 111°53'27.78"W. • Latitude position: Equator ⇐ 4532km (2816mi) ⇐ Salt Lake City ⇒ 5475km (3402mi) ⇒ North pole. • Longitude position: Salt Lake City ⇐ 8644km (5371mi) ⇐ Prime meridian. GMT: -6h. • Local time in Salt Lake City: Friday 1:35 am, May 22, 2020. [*time info]

We need plan a area that describe by WGS84 lonlat,like:

  • Cord1=(-111.89105,40.76078) # Left Top Lonlat

  • Cord2=(-111.8,40.7)# Right Bottom Lonlat

In addition, we need set the zoom level that mean resolution of each map tile. Relative info:

The data will generate as tiles (256*256 image), you also could use DOWNLOADER_INSTANCE.merge() to merge all the tiles to whole tiff file.

addcord() as a function ,input is WGS cord of left-top point & right-bottom point x1,y1,x2,y2,additional zoom level that mean different level density of data grid.

left, top : left-top coordinate, for example (100.361,38.866)

right, bottom : right-bottom coordinate

z : zoom
filePath : File path for storing results, TIFF format
Google.add_cord(116.3, 39.9, 116.6, 39.7, 13)# WGS Lonlat Form
Google.download()
#Google.merge()#Merge to one tif file 
# -----WGS BoundingBox: (116.27325422704708, 39.90850398784923, 116.62506415757257, 39.67233079805734)
# -----Mercator BoudingBox: (12943479.451629978, 4852655.222148937, 12982642.753946641, 4818439.909406773)
# -----Total tiles number:8 X 7


  0%|          | 0/56 [00:00<?, ?it/s]

# -----Url Queue size: 56


 88%|████████▊ | 49/56 [00:00<00:00, 30.20it/s]
100%|██████████| 56/56 [00:00<00:00, 93877.31it/s]
100%|██████████| 56/56 [00:00<00:00, 57.96it/s]

# ===== Decode Downloading...
# ------------------------------- Download Done ------------------------------ #

# ===== Save description done Google China-Sun Jun 28 10:51:01 2020-(116.3, 39.9, 116.6, 39.7)-13.json

The new version of data toolkit will have dict to save info during process. format:

tileinfo={
    "time":str(time.asctime(time.localtime(time.time()))),
    "left":self.left,
    "top":self.top,
    "right":self.right,
    "bottom":self.bottom,
    "zoom":self.zoom,
    "server":self.server,
    "data":self.result
    }

The each downloaded item will has a self description key like:

{
    'server': 'Google Satellite', 
    'info': [6743, 3104, 13], 
    'url': 'https://mt1.google.com/vt/lyrs=s&x=6743&y=3104&z=13', 
    'path': './images/Google Satellite-6743-3104-13.tif'
}

tiles=[i["path"] for i in Google.result]

The Vector & Raster Class could do some I/O, transform object to raster or vector object.

For instance, we use a shapefile that downloads from https://gis.utah.gov/ as the label to generate ground-truth.

If the timestamp of two data sources (vector & raster) is almost the same, you could get a high-quality dataset.

Additionally, https://www.openstreetmap.org/ has global vector product as below:


from darth.vector import Vector
Building=Vector('/home/winshare/Downloads/2017-07-03_asia_china.mbtiles')# 3.7GB SQLiteBased-MBTiles . The China Main Class Vector Object.
# ---------------------------------------------------------------------------- #
#                                Vector Toolkit                                #
# ---------------------------------------------------------------------------- #
# ---------------------------------------------------------------------------- #
#                            TIFF process Toolkit                              #
# ---------------------------------------------------------------------------- #
# -----Class TIF init without filename
# -----Valid vector format : mbtiles

# ----------------------------- Meta Information ----------------------------- #
# -----ZOOM_LEVEL : 14                                                         #
# -----attribution : <a href="http://www.openmaptiles.org/" target="_blank">&co#
# -----center : 116.4025,39.92,14                                              #
# -----description : Extract from https://openmaptiles.org                     #
# -----maxzoom : 14                                                            #
# -----minzoom : 0                                                             #
# -----name : OpenMapTiles                                                     #
# -----pixel_scale : 256                                                       #
# -----mtime : 1499626373833                                                   #
# -----format : pbf                                                            #
# -----id : openmaptiles                                                       #
# -----version : 3.6.1                                                         #
# -----maskLevel : 5                                                           #
# -----bounds : 115.686,39.414,117.119,40.426                                  #
# -----planettime : 1499040000000                                              #
# -----basename : china_beijing.mbtiles                                        #
# ----------------------------- Meta Information ----------------------------- #

# -----Description :  /workspace/osm-2017-07-03-v3.6.1-china_beijing.mbtiles
# -----LayerCount: 15
# -----Layer : 0  LayerName :  water 
# -----Layer : 1  LayerName :  waterway 
# -----Layer : 2  LayerName :  landcover 
# -----Layer : 3  LayerName :  landuse 
# -----Layer : 4  LayerName :  mountain_peak 
# -----Layer : 5  LayerName :  park 
# -----Layer : 6  LayerName :  boundary 
# -----Layer : 7  LayerName :  aeroway 
# -----Layer : 8  LayerName :  transportation 
# -----Layer : 9  LayerName :  building 
# -----Layer : 10  LayerName :  water_name 

The most of SQLite based mbtiles vector database will have multi-layer, but wkt based shapefile & geojson almost have single layer. Normally , Name of layer is class name that must set as default layer by getDefaultLayerbyName function. So we need choose a default layer by 'LayerName' that will generate binary label for deep learning training.

  • Of course , the function of multi-layer will update in next version.

Building.getDefaultLayerbyName("building")
----- Set Default Layer  building  :  <osgeo.ogr.Layer; proxy of <Swig Object of type 'OGRLayerShadow *' at 0x7fd4b2566660> >
<osgeo.ogr.Layer; proxy of <Swig Object of type 'OGRLayerShadow *' at 0x7fd4b2566660> >

Step 2:

If the data use for model training, we should have label that could be generate by rasterize vector file. Normally, the data will label by artificial work.But human resources has limit in huge object label with high resolution imagery. The OSM Vector data has a worldwide version that save in sqlite based mbtiles file system that could be decode by GDAL library.

The Class Vector and Raster is important part of data I/O. Rasterisation (or rasterization) is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (a series of pixels, dots or lines, which, when displayed together, create the image which was represented via shapes).[1][2] The rasterised image may then be displayed on a computer display, video display or printer, or stored in a bitmap file format. Rasterisation may refer to the technique of drawing 3D models, or the conversion of 2D rendering primitives such as polygons, line segments into a rasterized format.

The map data has better relative accuracy than temporary human label work that mean the vector map has potential to be ground-truth. So, transform the exist vector to raster data that is indispensable method for generate training data in deeplearning-based computer vision mission.

Rasterize:


Notes:

If the vector layers have a huge geography space and enormous amount of data item that will cause the indexing process will extremely inefficient in rasterize process.

We set a rect size like a bounding box of Beijing city from the geographic area of CHINA. Then, we set the same area as filters in the whole space.

That could show the speed of the same mission on with filter and without a filter.

Experiment
item/Sec 100000sample
withoutFilter 34 94 hour
withFilter 3 8.3 hour

Although i am already boost the rasterize flow , but that still run on single process ,i still want to find a way that run GDAL rasterize on python multiprocessing.

Building.crop_default_layer_by_rect(Google.mercator_cord)#FILTER to speed up
label=Building.generate(tiles)
  0%|          | 0/56 [00:00<?, ?it/s]

-----Set filter Rect: (12944152.11792486, 4852834.051769256, 12983287.876406869, 4818590.263097501)
-----Start Generate.....


100%|██████████| 56/56 [00:18<00:00,  2.99it/s]

If we write the 'image' & 'label' to csv / json that could be great dataset for deeplearning training work flow.

And we have finished that on the Process workflow. We could show the label&image like that.

import tifffile as tif 
import matplotlib.pyplot as plt
image=tif.imread(tiles[1])
label=tif.imread(label[1])
plt.imshow(image),plt.show()
plt.imshow(label),plt.show()

png

png


Neural Network for GIS , let's fuck!

Copyright 2020 winshare

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d-arth-0.0.5.tar.gz (85.5 kB view hashes)

Uploaded Source

Built Distribution

d_arth-0.0.5-py3-none-any.whl (123.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page