Skip to main content

Toolkit to run workflows on Geospatial & Earth Observation (EO) data.

Project description

GeodataFlow (Work in progress)

Toolkit to run workflows on Geospatial & Earth Observation (EO) data.

GeodataFlow is a Python library for fetching, translating and manipulating Geospatial data (Raster, Vector, EO/STAC collections) by using a Pipeline or sequence of operations on input data. It is very much like the GDAL library which handles raster and vector data. This page provides high level overview of the library and its philosophy. Visit docs to list data formats it supports (Readers & Writers), and see Filters for filtering operations that you can apply with GeodataFlow.

In addition to the library code, GeodataFlow provides a command-line application and a REST WebAPI (FastAPI) endpoint that users can conveniently use to process, filter, translate, and query Geospatial data. Applications provide more information on that topic.

GeodataFlow provides as well a Workbench UI designer for users easily draw and run their own workflows.

Developers can extend GeodataFlow with new custom modules as well.

Workflow examples

  • Converting a Shapefile to GeoPackage:

    # ==============================================================
    # Pipeline sample to convert a Shapefile to GeoPackage.
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
          "connectionString": "input.shp"
        },
        # Extract the Centroid of input geometries.
        {
          "type": "GeometryCentroid"
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "connectionString": "output.gpkg"
        }
      ]
    }
    
  • Fetching metadata of a S2L2A Product (STAC):

    # ==============================================================
    # Pipeline sample to fetch metadata of a S2L2A Product (STAC).
    # ==============================================================
    {
      "pipeline": [
        {
          "type": "FeatureReader",
    
          # Define the input AOI in an embedded GeoJson.
          "connectionString": {
            "type": "FeatureCollection",
            "crs": {
              "type": "name",
              "properties": { "name": "EPSG:4326" }
            },
            "features": [
              {
                "type": "Feature",
                "properties": { "id": 0, "name": "My AOI for testing" },
                "geometry": {
                  "type": "Polygon",
                  "coordinates": [[
                      [-1.746826,42.773227],
                      [-1.746826,42.860866],
                      [-1.558685,42.860866],
                      [-1.558685,42.773227],
                      [-1.746826,42.773227]
                  ]]
                }
              }
            ]
          }
        },
        # Fetch metadata of EO Products that match one SpatioTemporial criteria.
        {
          "type": "EOProductCatalog",
    
          "driver": "STAC",
          "provider": "https://earth-search.aws.element84.com/v0/search",
          "product": "sentinel-s2-l2a-cogs",
    
          "startDate": "2021-09-25",
          "endDate": "2021-10-05",
          "closestToDate": "2021-09-30",
          "filter": "",
    
          "preserveInputCrs": true
        },
        # Transform CRS of geometries.
        {
          "type": "GeometryTransform",
          "sourceCrs": 4326,
          "targetCrs": 32630
        },
        # Save features to Geopackage.
        {
          "type": "FeatureWriter",
          "mode": "CREATE",
          "connectionString": "output.gpkg"
        }
      ]
    }
    

Workbench

GeodataFlow Workbench is a javascript application for users easily draw and run their own workflows.

Workbench

Demo videos:

REST WebAPI

GeodataFlow provides a WebAPI based on FastAPI to access to GeodataFlow capabilities via REST calls.

WebAPI

Installation

To install the latest stable version:

> pip install geodataflow[eodag]

From source repository:

> git clone https://github.com/ahuarte47/geodataflow.git
> cd geodataflow
> pip install .
> geodataflow --help
Usage: geodataflow [OPTIONS] COMMAND [ARGS]...

NOTE: In order to read and write Cloud Optimized Geotiffs (COG), GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.

Using docker

Build the container with:

> docker build -f ./Dockerfile -t geodataflow/pipelineapp:1.0.0 .

Getting start:

> docker run --rm --name gdf geodataflow/pipelineapp:1.0.0 --help
> docker run --rm --name gdf geodataflow/pipelineapp:1.0.0 --modules

To run workflows in Linux:

> docker run \
    --rm --name gdf -v "$PWD/tests/data:/tests/data" geodataflow/pipelineapp:1.0.0 \
    --pipeline_file "/tests/data/test_eo_stac_catalog.json"

To run workflows in Windows:

> docker run ^
    --rm --name gdf -v "%cd%/tests/data:/tests/data" geodataflow/pipelineapp:1.0.0 ^
    --pipeline_file "/tests/data/test_eo_stac_catalog.json"

For interactive process:

> docker run --rm -it --entrypoint "bash" geodataflow/pipelineapp:1.0.0

Using docker-compose

docker-compose.yml builds images and starts REST WebAPI and Workbench components to easily run Workflows with GeodataFlow.

> docker-compose up

Type in your favorite Web Browser:

To remove all resources:

> docker-compose down --rmi all -v --remove-orphans

Usage (Command line interface)

Starting with commands of GeodataFlow:

  • To see all the available options and commands::

    > geodataflow --help
    
  • To list all available supported modules::

    > geodataflow --modules
    
  • Run a workflow in the command line interface:

    > geodataflow --pipeline_file "/tests/data/test_eo_stac_catalog.json"
    

Contribute

Have you spotted a typo in our documentation? Have you observed a bug while running GeodataFlow? Do you have a suggestion for a new feature?

Don't hesitate and open an issue or submit a pull request, contributions are most welcome!

License

GeodataFlow is licensed under Apache License v2.0. See LICENSE file for details.

Authors

GeodataFlow has been created by Alvaro Huarte https://www.linkedin.com/in/alvarohuarte.

Credits

GeodataFlow is built on top of amazingly useful open source projects. See NOTICE file for details about those projects and their licenses. Thank you to all the authors of these projects!

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geodataflow-0.1.1.tar.gz (57.2 kB view details)

Uploaded Source

Built Distribution

geodataflow-0.1.1-py3-none-any.whl (114.0 kB view details)

Uploaded Python 3

File details

Details for the file geodataflow-0.1.1.tar.gz.

File metadata

  • Download URL: geodataflow-0.1.1.tar.gz
  • Upload date:
  • Size: 57.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.8.2 requests/2.27.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for geodataflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a1a8a45dbf341d4c6cd5c4f34a571cc7556c29674593e2a202a1e68cebb15ff
MD5 08684d7dd226cd997a089ce67e0e5073
BLAKE2b-256 b835e6c53a759c5198d6165c836519be9cffb52ada067e9f91012b73b27ae003

See more details on using hashes here.

File details

Details for the file geodataflow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: geodataflow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 114.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.8.2 requests/2.27.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for geodataflow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 369840c40b2c5ff1cceb2ef87aa54905a02efb4b37e2e92e530f28c4a98a2b7d
MD5 a3061a8eaae0c3a5aceca810282391d4
BLAKE2b-256 dc53eb74222b3766eb99a8206e6450a9193d11ee57540dd739080962965aac64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page