Skip to main content

Keras Data Generator class for remote spatial data.

Project description

Keras Spatial

Keras Spatial includes data generators and tools designed to simplify the preprocessing of spatial data for deep learning applications.

Keras Spatial provides a data generator that reads samples directly from a raster data source and eliminates the need to create small, individual raster files prior to model execution. The raster data source may be local or remote service. Any necessary reprojections and scaling is handled automatically.

Central to the use of Keras Spatial is a GeoPandas GeoDataFrame which defines a virtual sample set, a list of samples that drives the data generator. The dataframe can also be used to filter samples based on different aspects such as the existance of nodata, handling imbalanced label distributions, and storing sample attributes used in normalization amoung other data augmentation functions.

Features include:

  • Sample extraction from local or remote data sources -- no intermediate files
  • Automatic reprojection and resampling as needed
  • Sample augmentation using user-defined callback system
  • Flexible structure improves organization and data management

Try Keras Spatial is avialble at Binder.

Installation

To install the package from PyPi repository you can execute the following command:

pip install keras-spatial

or directly from GitHub

$ pip install git+https://github.com/ncsa/keras-spatial#egg=keras-spatial --process-dependency-links

Quickstart

  1. Create a SpatialDataGen and set the source raster
  2. Create a geodataframe with 200x200 (in projection units) samples covering the spatial extent of the raster
  3. Create the generator producing arrays with shape [32, 128, 128, 1]
  4. Fit model
from keras_spatial.datagen import SpatialDataGenerator

sdg = SpatialDataGenerator(source='/path/to/file.tif')
geodataframe = sdg.regular_grid(200, 200)
generator = sdg.flow_from_dataframe(geodataframe, 128, 128, batch_size=32)
model(generator, ...)

Usage

Keras Spatial provides a SpatialDataGenerator (SDG) modeled on the Keras ImageDataGenerator. The SDG allows user to work in spatial coorindates rather than pixels and easily integrate data from different coordinates systems. Reprojection and resampling is handled automatically as needed. Because Keras Spatial is based on the rasterio package, raster data source may either local files or remote resources referenced by URL.

Because the SDG reads directly from larger raster data sources rather than small, preprocessed images files, SDG makes use of a GeoDataFrame to identify each sample area. The geometry associated with the datafame is expected to be a polygon but extraction is done using a windowed read based on the bounds. As with the ImageDataGenerator, the flow_from_dataframe method returns the generator that can be passed to the Keras model.

SpatialDataGenerator class

The SDG is similar to the ImageDataGenerator albeit missing the .flow and the .flow_from_directory methods. SDG also moves more configutation and setting to the instance and with the .flow_from_dataframe having few arguments.

Arguments
  • source (path or url): raster source
  • width (int): array size produced by generator
  • height (int): array size produced by generator
  • indexes (int or tuple of ints): one or more raster bands to sampled
  • interleave (str): type of interleave 'band' or 'pixel' (default='pixel')
  • resampling (int): One of the values from rasterio.enums.Resampling (default=Resampling.nearest)

Raises RasterioIOError when the source is set if the file does not exist or remote resource is not available.

Examples
from keras_spatial import SpatialDataGenerator

sdg = SpatialDataGenerator(source='/path/to/file.tif')
sdg.width, sdg.height = 128,128

The source must be set prior to calling flow_from_dataframe. Width and height can set as attributes to the SDG or as arguments to flow_from _dataframe but specifying as arguments to flow_from_dataframe is preferred.

The indexes argument selects bands in a multiband raster. By default all bands are read and the indexes argument is updated when the raster source is set.

In multiband situations, if interleave is set to 'band' (the default) the numpy array will have the shape [batch_size, bands, height, width] and is compatible with TensorFlow. If interleave is set to 'pixel', the shape will be [batch_size, height, width, bands] which is not generally what you want, use with care.

# file.tif is a 5 band raster
sdg = SpatialDataGenerator('/path/to/file.tif')
sdg.interleave, sdg.indexes = 'band', -1
arr = next(sdg.flow_from_dataframe(df, 128, 128, batch_size=1))
print(arr.shape)
> [1, 5, 128, 128]

sdg.interleave, sdg.indexes = 'band', 1
arr = next(sdg.flow_from_dataframe(df, 128, 128, batch_size=1))
print(arr.shape)
> [1, 1, 128, 128]

sdg.interleave, sdg.indexes = 'pixel', [1,2,3]
arr = next(sdg.flow_from_dataframe(df, 128, 128, batch_size=1))
print(arr.shape)
> [1, 128, 128, 3]

sdg.interleave, sdg.indexes = 'pixel', 1
arr = next(sdg.flow_from_dataframe(df, 128, 128, batch_size=1))
print(arr.shape)
> [1, 128, 128]

Because more than one SDG is expected to be used simultaneously and SDGs are expected to having matching spatial requirements, the SDG class provides a profile attribute that can be easily share arguments across instances as shown below. Note: source is not part of the profile.

sdg = SpatialDataGenerator(source='/path/to/file.tif')
sdg2 = SpatialDataGenerator()
sdg2.profile = sdg.profile
sdg2.source = '/path/to/file2.tif'

SpatialDataGenerator methods

flow_from_dataframe

flow_from_dataframe(geodataframe, width, height, batch_size)

Creates a generator that returns a numpy ndarray of samples read from the SDG source.

Arguments
  • geodataframe (GeoDataFrame): a geodataframe with sample boundaries
  • width (int): width of array
  • height (int): height of array
  • batch_size (int): number of samples to returned by generator
Returns

A generator of numpy ndarrays of the shape [batch_size, height, width, bands].

Example
sdg = SpatialDataGenerator(source='/path/to/file.tif')
gen = sdg.flow_from_dataframe(df, 128, 128)
arr = next(gen)

random_grid

random_grid(width, height, count, units='native')

Creates a geodataframe suitable to passing to the flow_from_dataframe method. The grid module provides a similar function using passed using spatial extents.

Arguments
  • width (int): width in pixels
  • height (int): height in pixels
  • count (int): number of samples
  • units (str): units for width and height, either native (projection units) or in pixels
Returns

A GeoDataFrame defining the polygon boundary of each sample.

Example
sdg = SpatialDataGenerator(source='/path/to/file.tif')
df = sdg.random_grid(200, 200, 1000)

regular_grid

regular_grid(width, height, overlap=0.0, units='native')

Creates a geodataframe suitable to passing to the flow_from_dataframe method. The sample module provides a similar function using passed using spatial extents.

Arguments
  • width (int): width in pixels
  • height (int): width in pixels
  • overlap (float): percentage of overlap (default=0.0)
  • units (str): units for width and height, either native or in pixels
Returns

A GeoDataFrame defining the polygon boundary of each sample.

Example
sdg = SpatialDataGenerator(source='/path/to/file.tif')
df = sdg.regular_grid(200, 200)

File Generation and File Caching

Keras Spatial provides tools to utilize SDGs and GeoDataFrames to simplify the generation of individual samples in the form of numpy arrays. A FileCache class provides a flow_from_files method to create a generator that reads directly from numpy files.

The following code will generate 10 samples from the source TIF file and save them as numpy arrays. The filenames is a pandas Series using the same index as the initial GeoDataFrame so that it can be easily joined.

sdg = SpatialDataGenerator(source='/path/to/file.tif')
df = sdg.random_grid(200, 200, count=10)
filenames = keras_spatial.filecache.flow_to_numpy('/local/path')

This second examples finds the files on the numpy files on the local disk and create a generator.

fc = keras_spatial.filecache.FileCache('looal/path')
fc.find_file()
model(fc.flow_from_files())

Full Example

from keras_spatial import SpatialDataGenerator
labels = SpatialDataGenerator()
labels.source = '/path/to/labels.tif'
labels.width, labels.height = 128, 128
df = labels.regular_grid(200,200)

samples = SpatialDataGenerator()
samples.source = 'https://server.com/files/data.tif'
samples.width, samples.height = labels.width, label.height

train_gen = zip(labels.flow_from_dataframe(df), patches.flow_from_dataframe(df))
model(train_gen)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keras-spatial-1.0.7.tar.gz (177.2 kB view details)

Uploaded Source

Built Distribution

keras_spatial-1.0.7-py2.py3-none-any.whl (22.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file keras-spatial-1.0.7.tar.gz.

File metadata

  • Download URL: keras-spatial-1.0.7.tar.gz
  • Upload date:
  • Size: 177.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.5.2

File hashes

Hashes for keras-spatial-1.0.7.tar.gz
Algorithm Hash digest
SHA256 6dead252bc2c4b36bd996321a7e73f7c705ff8f036d9efd0d4692248c2223334
MD5 f7416f81e8eca0c3a0620b3bbeb5cd7d
BLAKE2b-256 257829cadb7877b03315caef4d94555741c1ae1edb30624d0c3d904353e2f944

See more details on using hashes here.

File details

Details for the file keras_spatial-1.0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: keras_spatial-1.0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.5.2

File hashes

Hashes for keras_spatial-1.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a7ab09640f1ccbcd6d53aaa61393fd1466b3ddcd1d0a1cdf650dfeeec5a5b561
MD5 04a7a4167a6d015b87ccfabd69aaab44
BLAKE2b-256 65660271332a691b1739409d7f7d67d45c94e407d6050bb89aef5d11ca57dd74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page