Skip to main content

Dealing with Geopspatial Data, faster computation of ndvi indices using dask

Project description

Geodata-preprocess-IIITB-SCL

This is official documaentation of geodata-preprocess-IIITB-SCL. The package has many useful functions for dealing with geospatial data, also few functions like computation of NDVI, MNDWI are integrated with dask to speed up it's computation.

Installation

This can installed using pip using the following command in both Windows and Linux OS

$ pip install geodata-preprocess-IIITB-SCL

Usage

All Sorted File Names

The geospatial data file names are moslty represented by date and time. For few specific tasks like time series forecasting , it might be necessary to get all files in sequential form. This function returns list of all file names in sorted order.

Function

get_file_names(folder_path)

Parameters

  1. folder_path: Folder path where geospatial Images exist

Return Type

Ordered name list of Geospatial Images


Number of Bands

The functions finds number of bands in the image. Red, Green, Blue, Infrared etc.

Function

number_of_bands(filepath)

Parameters

  1. file path: Path of Image.

Return Type

Integer value, number of bands.


Numpy Array of the Image

The functions converts the file in numpy array format with all it's bands.

Function

numpy_image(filepath):

Parameters

  1. file path: Path of Image.

Return Type

Numpy Array.


Dataframe of the Image

Converts the geospatial file in pandas dataframe.

Function

dataframe_image(filepath)

Parameters

  1. file path: Path of Image.

Return Type

Pandas dataframe.


Min Max Scaling of Dataframe

This functions performs min-max scaling of the dataframe.

Function

min_max_scaled(df_raw)

Parameters

  1. df_raw: Input pandas dataframe.

Return Type

Numpy array representing scaled values.


Convert the numpy to dask array

This function converts numpy array to dask array with specified chunks of the same bandwidth.

Function

numpy_to_dask_array(df,chunk_len)

Parameters

  1. df: Input dataframe
  2. chunk_len:specifies the chunk size

Return Type

Dask array.


One hot to label

Some of the geospatial data may be segmented (each pixel being classified to a label). Generally the open source labelled data is one hot encoded. This functions converts the it in labelled form.

Function

one_hot_to_label(file_path)

Parameters

  1. file path: Path of Image.

Return Type

Numpy array representing labelled data with only one band.


Ordered labels

Some of the labels of an image might not be following a sequential form. For eg there is bunch of images whose pixel labels are from 2,4, 7. To make it sequential this function would be helpful

Function

get_ordered_labels(y)

Parameters

  1. _y: Labelled numpy array.

Return Type

Ordered numpy array.


Normalized difference

This is a key functions used for NDVI and MNDWI indices. With specifying band values as Red and Near Red Infrared bands we can find NDVI index , and by specifying Short Wave Infrared and Green bands whe can get MNDWI index for any geospatial image.

Function

normalized_difference( b1, b2):

NDVI Computation (Returning list)

Functions here are used for finding NDVI indices of list of geospatial image

Without Dask

Function

find_ndvi_list(file_path_list)

Parameters

  1. file path_list: List of path of Images.

Return Type

List of NDVI index (numpy array) in the same order of values in input list.


With Dask

Function

find_ndvi_list_with_dask(worker_nodes,file_path_list)

Parameters

  1. file path_list: List of path of Images.
  2. worker_nodes: Number of dask worker nodes in a cluster

Return Type

List of NDVI index (numpy array) in the same order of values in input list.


NDVI Computation (Saving the values in folder)

Without Dask

Function

find_and_write_ndvi_list(file_path_list,destination_folder)

Parameters

  1. file path_list: List of path of Images.
  2. destination_folder: path where indices will be saved.

Return Type

None


With Dask

Function

find_and_write_ndvi_list_with_dask(worker_nodes,file_path_list,destination_folder)

Parameters

  1. file path_list: List of path of Images.
  2. worker_nodes: Number of dask worker nodes in a cluster
  3. destination_folder: path where indices will be saved.

Return Type

None

Contributing

The following are the core contributors:

  1. Pratyush Upadhyay
  2. Deeksha Agarwal

License

geodata-preprocess-IIITB-SCL was created by IITB-SCL. It is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geodata-preprocess-IIITB-SCL-0.1.1.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page