Skip to main content

Geoprocessing framework for geographical & Earth Observation (EO) data.

Project description

GeodataFlow

Geospatial processing framework for geographical & Earth Observation (EO) data in Python.

GeodataFlow is a Geoprocessing framework for fetching, translating and manipulating Geospatial data (Raster, Vector, EO/STAC collections) by using a Pipeline or sequence of operations on input data. It is very much like the GDAL library which handles raster and vector data.

The project is split up into several namespace packages or components:

  • geodataflow.core

    The main subpackage of GeodataFlow which implements basic building blocks (Pipeline engine & Modules) and commonly used functionalities.

  • geodataflow.api

    WebAPI component using FastAPI which provides access to GeodataFlow backend via API REST calls.

    api

  • workbench/ui

    GeodataFlow Workbench is a static javascript application for users easily draw and run their own Workflows in the Web Browser.

    workbench

    NOTE: There is no any installer for GeodataFlow Workbench yet, but you can test it loading the docker-compose.yml. Please, read related section below.

Backends:

  • spatial

    Installs the geodataflow.spatial backend implementation for GeodataFlow using GDAL/OGR.

  • dataframes

    Installs the geodataflow.dataframes backend implementation for GeodataFlow using Geopandas.

  • pySpark, Geospatial SQL, ... ?

Videos demostrating GeodataFlow:

Workflow examples

Assuming you are using geodataflow.spatial (GDAL/OGR) as active backend implementation, GeodataFlow can run workflows as the following:

  • Converting a Shapefile to GeoPackage:

    # ==============================================================
    # Pipeline sample to convert a Shapefile to GeoPackage.
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
          "connectionString": "input.shp"
        },
        # Extract the Centroid of input geometries.
        {
          "type": "GeometryCentroid"
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
    
  • Fetching metadata of a S2L2A Product (STAC):

    # ==============================================================
    # Pipeline sample to fetch metadata of a S2L2A Product (STAC).
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
    
          # Define the input AOI in an embedded GeoJson.
          "connectionString": {
            "type": "FeatureCollection",
            "crs": {
              "type": "name",
              "properties": { "name": "EPSG:4326" }
            },
            "features": [
              {
                "type": "Feature",
                "properties": { "id": 0, "name": "My AOI for testing" },
                "geometry": {
                  "type": "Polygon",
                  "coordinates": [[
                      [-1.746826,42.773227],
                      [-1.746826,42.860866],
                      [-1.558685,42.860866],
                      [-1.558685,42.773227],
                      [-1.746826,42.773227]
                  ]]
                }
              }
            ]
          }
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Fetch metadata of EO Products that match one SpatioTemporial criteria.
        {
          "type": "EOProductCatalog",
    
          "driver": "STAC",
          "provider": "https://earth-search.aws.element84.com/v0/search",
          "product": "sentinel-s2-l2a-cogs",
    
          "startDate": "2021-09-25",
          "endDate": "2021-10-05",
          "closestToDate": "2021-09-30",
          "filter": "",
    
          "preserveInputCrs": true
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
    

Installation

Because GeodataFlow is composed by several namespace packages, some of them are optional (e.g. Backend implementations). You will need to install the ones you want by adding them as an extra to the command-line that runs the installer.

In order to read and write Cloud Optimized Geotiffs (COG), GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.

Using pypi

To install the latest stable version from pypi, write this in the command-line:

> pip install geodataflow[api,dataframes,eodag,gee]

The geodataflow package installs geodataflow.core and geodataflow.spatial ones by default. You can use namespace package installers as well (e.g. api), they have the same effect than the generic one.

Optional extras for Backends:

  • eodag

    EODAG - Earth Observation Data Access Gateway is a Python package for searching and downloading remotely sensed images while offering an unified API for data access regardless of the data provider.

  • gee

    GEE - Google Earth Engine API is a geospatial processing service. With Earth Engine, you can perform geospatial processing at scale, powered by Google Cloud Platform. GEE requires authentication, please, read available documentation here.

To view all available CLI tool commands and options:

> geodataflow --help

Listing all available modules:

> geodataflow --modules

Run a workflow in the command-line interface:

> geodataflow --pipeline_file "/geodataflow/spatial/tests/data/test_eo_stac_catalog.json"

Using docker-compose

docker-compose.yml builds images and starts GeodataFlow API and Workbench components to easily run Workflows with GeodataFlow.

PACKAGE_WITH_GEODATAFLOW_PIPELINE_CONTEXT in the yml file indicates the backend implementation to load. The default value is geodataflow.spatial. If you prefer to use another backend, please, change it before starting.

Write in the command-line from the root folder of the project:

> docker-compose up

Then, type in your favorite Web Browser:

To remove all resources:

> docker-compose down --rmi all -v --remove-orphans

Testing

Each package provides a collection of tests, run tests on tests folders to validate them.

Contribute

Have you spotted a typo in our documentation? Have you observed a bug while running GeodataFlow? Do you have a suggestion for a new feature?

Don't hesitate and open an issue or submit a pull request, contributions are most welcome!

License

GeodataFlow is licensed under Apache License v2.0. See LICENSE file for details.

Credits

GeodataFlow is built on top of amazingly useful open source projects. See NOTICE file for details about those projects and their licenses.

Thank you to all the authors of these projects!

Authors

GeodataFlow has been created by Alvaro Huarte https://www.linkedin.com/in/alvarohuarte.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geodataflow-0.2.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

geodataflow-0.2.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file geodataflow-0.2.1.tar.gz.

File metadata

  • Download URL: geodataflow-0.2.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for geodataflow-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7ac7b9a84d9923bbef4dedc4cca21dee31b3a6db90b235dd8b1f59dc93a23d21
MD5 b7e80981956d072eff640b45bbc1ede2
BLAKE2b-256 fb4fb6b451498b1a641f824c8dc0db2b166dda718e0dbf44d500f1b22a8ef631

See more details on using hashes here.

File details

Details for the file geodataflow-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: geodataflow-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for geodataflow-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f814d8394be53cee99d2501609acacc24853df6b7e3078c956972fb8b2bbf9dc
MD5 16cc2fca514ea30f2591954e7d32ea82
BLAKE2b-256 644927672d345092f4abd3c18e675b5a90c800034bb5a9837a1dcfb2302499fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page