Skip to main content

Geoprocessing framework for geographical & Earth Observation (EO) data.

Project description

GeodataFlow

Geospatial processing framework for geographical & Earth Observation (EO) data in Python.

GeodataFlow is a Geoprocessing framework for fetching, translating and manipulating Geospatial data (Raster, Vector, EO/STAC collections) by using a Pipeline or sequence of operations on input data. It is very much like the GDAL library which handles raster and vector data.

The project is split up into several namespace packages or components:

  • geodataflow.core

    The main subpackage of GeodataFlow which implements basic building blocks (Pipeline engine & Modules) and commonly used functionalities.

  • geodataflow.api

    WebAPI component using FastAPI which provides access to GeodataFlow backend via API REST calls.

    api

  • workbench/ui

    GeodataFlow Workbench is a static javascript application for users easily draw and run their own Workflows in the Web Browser.

    workbench

    NOTE: There is no any installer for GeodataFlow Workbench yet, but you can test it loading the docker-compose.yml. Please, read related section below.

Backends:

  • spatial

    Installs the geodataflow.spatial backend implementation for GeodataFlow using GDAL/OGR.

  • dataframes

    Installs the geodataflow.dataframes backend implementation for GeodataFlow using Geopandas.

  • pySpark, Geospatial SQL, ... ?

Videos demostrating GeodataFlow:

Workflow examples

Assuming you are using geodataflow.spatial (GDAL/OGR) as active backend implementation, GeodataFlow can run workflows as the following:

  • Converting a Shapefile to GeoPackage:

    # ==============================================================
    # Pipeline sample to convert a Shapefile to GeoPackage.
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
          "connectionString": "input.shp"
        },
        # Extract the Centroid of input geometries.
        {
          "type": "GeometryCentroid"
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
    
  • Fetching metadata of a S2L2A Product (STAC):

    # ==============================================================
    # Pipeline sample to fetch metadata of a S2L2A Product (STAC).
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
    
          # Define the input AOI in an embedded GeoJson.
          "connectionString": {
            "type": "FeatureCollection",
            "crs": {
              "type": "name",
              "properties": { "name": "EPSG:4326" }
            },
            "features": [
              {
                "type": "Feature",
                "properties": { "id": 0, "name": "My AOI for testing" },
                "geometry": {
                  "type": "Polygon",
                  "coordinates": [[
                      [-1.746826,42.773227],
                      [-1.746826,42.860866],
                      [-1.558685,42.860866],
                      [-1.558685,42.773227],
                      [-1.746826,42.773227]
                  ]]
                }
              }
            ]
          }
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Fetch metadata of EO Products that match one SpatioTemporial criteria.
        {
          "type": "EOProductCatalog",
    
          "driver": "STAC",
          "provider": "https://earth-search.aws.element84.com/v0/search",
          "product": "sentinel-s2-l2a-cogs",
    
          "startDate": "2021-09-25",
          "endDate": "2021-10-05",
          "closestToDate": "2021-09-30",
          "filter": "",
    
          "preserveInputCrs": true
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
    

Installation

Because GeodataFlow is composed by several namespace packages, some of them are optional (e.g. Backend implementations). You will need to install the ones you want by adding them as an extra to the command-line that runs the installer.

In order to read and write Cloud Optimized Geotiffs (COG), GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.

Using pypi

To install the latest stable version from pypi, write this in the command-line:

> pip install geodataflow[api,dataframes,eodag,gee]

The geodataflow package installs geodataflow.core and geodataflow.spatial ones by default. You can use namespace package installers as well (e.g. api), they have the same effect than the generic one.

Optional extras for Backends:

  • eodag

    EODAG - Earth Observation Data Access Gateway is a Python package for searching and downloading remotely sensed images while offering an unified API for data access regardless of the data provider.

  • gee

    GEE - Google Earth Engine API is a geospatial processing service. With Earth Engine, you can perform geospatial processing at scale, powered by Google Cloud Platform. GEE requires authentication, please, read available documentation here.

To view all available CLI tool commands and options:

> geodataflow --help

Listing all available modules:

> geodataflow --modules

Run a workflow in the command-line interface:

> geodataflow --pipeline_file "/geodataflow/spatial/tests/data/test_eo_stac_catalog.json"

Using docker-compose

docker-compose.yml builds images and starts GeodataFlow API and Workbench components to easily run Workflows with GeodataFlow.

PACKAGE_WITH_GEODATAFLOW_PIPELINE_CONTEXT in the yml file indicates the backend implementation to load. The default value is geodataflow.spatial. If you prefer to use another backend, please, change it before starting.

Write in the command-line from the root folder of the project:

> docker-compose up

Then, type in your favorite Web Browser:

To remove all resources:

> docker-compose down --rmi all -v --remove-orphans

Testing

Each package provides a collection of tests, run tests on tests folders to validate them.

Contribute

Have you spotted a typo in our documentation? Have you observed a bug while running GeodataFlow? Do you have a suggestion for a new feature?

Don't hesitate and open an issue or submit a pull request, contributions are most welcome!

License

GeodataFlow is licensed under Apache License v2.0. See LICENSE file for details.

Credits

GeodataFlow is built on top of amazingly useful open source projects. See NOTICE file for details about those projects and their licenses.

Thank you to all the authors of these projects!

Authors

GeodataFlow has been created by Alvaro Huarte https://www.linkedin.com/in/alvarohuarte.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geodataflow-0.2.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

geodataflow-0.2.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file geodataflow-0.2.0.tar.gz.

File metadata

  • Download URL: geodataflow-0.2.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for geodataflow-0.2.0.tar.gz
Algorithm Hash digest
SHA256 71e0dd38b64d37c1d560ab67406b9adc42c7b16ff7d25f22cbab17baa96a926b
MD5 05f00b9fefb088c2f3c4b2a6a782aada
BLAKE2b-256 24db5f9c2f76f671af542f959d4008b19391f550e6d3dd77480243c7cdc37ac8

See more details on using hashes here.

File details

Details for the file geodataflow-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: geodataflow-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for geodataflow-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d44a9c6173cacc7b1b6e090c67e3afa439adf7a8d77c24613a3e9f019e0c3e41
MD5 57d8cfdebcfd1e175fe6bf1acf08243c
BLAKE2b-256 1264ef3f28078108066495a959cb2912983ebb100154335c7e2396aa4ef269da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page