Toolkit to run workflows on Geospatial & Earth Observation (EO) data.
Project description
GeodataFlow (Work in progress)
Toolkit to run workflows on Geospatial & Earth Observation (EO) data.
GeodataFlow is a Python library for fetching, translating and manipulating Geospatial data (Raster, Vector, EO/STAC collections) by using a Pipeline or sequence of operations on input data. It is very much like the GDAL library which handles raster and vector data. This page provides high level overview of the library and its philosophy. Visit docs to list data formats it supports (Readers & Writers), and see Filters for filtering operations that you can apply with GeodataFlow.
In addition to the library code, GeodataFlow provides a command-line application and a REST WebAPI (FastAPI) endpoint that users can conveniently use to process, filter, translate, and query Geospatial data. Applications provide more information on that topic.
GeodataFlow provides as well a Workbench UI designer for users easily draw and run their own workflows.
Developers can extend GeodataFlow with new custom modules as well.
Workflow examples
-
Converting a Shapefile to GeoPackage:
# ============================================================== # Pipeline sample to convert a Shapefile to GeoPackage. # ============================================================== { "pipeline": [ { "type": "FeatureReader", "connectionString": "input.shp" }, # Extract the Centroid of input geometries. { "type": "GeometryCentroid" }, # Transform CRS of geometries. { "type": "GeometryTransform", "sourceCrs": 4326, "targetCrs": 32630 }, # Save features to Geopackage. { "type": "FeatureWriter", "connectionString": "output.gpkg" } ] }
-
Fetching metadata of a S2L2A Product (STAC):
# ============================================================== # Pipeline sample to fetch metadata of a S2L2A Product (STAC). # ============================================================== { "pipeline": [ { "type": "FeatureReader", # Define the input AOI in an embedded GeoJson. "connectionString": { "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "EPSG:4326" } }, "features": [ { "type": "Feature", "properties": { "id": 0, "name": "My AOI for testing" }, "geometry": { "type": "Polygon", "coordinates": [[ [-1.746826,42.773227], [-1.746826,42.860866], [-1.558685,42.860866], [-1.558685,42.773227], [-1.746826,42.773227] ]] } } ] } }, # Transform CRS of geometries. { "type": "GeometryTransform", "sourceCrs": 4326, "targetCrs": 32630 }, # Fetch metadata of EO Products that match one SpatioTemporial criteria. { "type": "EOProductCatalog", "driver": "STAC", "provider": "https://earth-search.aws.element84.com/v0/search", "product": "sentinel-s2-l2a-cogs", "startDate": "2021-09-25", "endDate": "2021-10-05", "closestToDate": "2021-09-30", "filter": "", "preserveInputCrs": true }, # Save features to Geopackage. { "type": "FeatureWriter", "connectionString": "output.gpkg" } ] }
Workbench
GeodataFlow Workbench is a javascript application for users easily draw and run their own workflows.
Demo videos:
- What is GeodataFlow
- Tranforming Features
- Reading items from a STAC Catalog
- Downloading a NDVI raster from a STAC Catalog
- Filtering Features applying Spatial Relations
- Plotting a NDVI Time Series Graph of a polygon
REST WebAPI
GeodataFlow provides a WebAPI based on FastAPI to access to GeodataFlow capabilities via REST calls.
Installation
To install the latest stable version:
> pip install geodataflow[eodag,gee]
Optional extras:
-
EODAG
EODAG - Earth Observation Data Access Gateway is a Python package for searching and downloading remotely sensed images while offering an unified API for data access regardless of the data provider.
Installing this extra EODAG adds access to more EO Products from different providers to
EOProductCatalog
andEOProductDataset
modules. -
GEE
GEE - Google Earth Engine API is a geospatial processing service. With Earth Engine, you can perform geospatial processing at scale, powered by Google Cloud Platform. GEE requires authentication, please, read available documentation here.
Installing this extra GEE makes possible the access to Google Cloud Platform to
GEEProductCatalog
andGEEProductDataset
modules.
From source repository:
> git clone https://github.com/ahuarte47/geodataflow.git
> cd geodataflow
> pip install .
> geodataflow --help
Usage: geodataflow [OPTIONS] COMMAND [ARGS]...
NOTE: In order to read and write Cloud Optimized Geotiffs (COG), GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.
Using docker
Build the container with:
> docker build -f ./Dockerfile -t geodataflow/pipelineapp:1.0.0 .
Getting start:
> docker run --rm --name gdf geodataflow/pipelineapp:1.0.0 --help
> docker run --rm --name gdf geodataflow/pipelineapp:1.0.0 --modules
To run workflows in Linux:
> docker run \
--rm --name gdf -v "$PWD/tests/data:/tests/data" geodataflow/pipelineapp:1.0.0 \
--pipeline_file "/tests/data/test_eo_stac_catalog.json"
To run workflows in Windows:
> docker run ^
--rm --name gdf -v "%cd%/tests/data:/tests/data" geodataflow/pipelineapp:1.0.0 ^
--pipeline_file "/tests/data/test_eo_stac_catalog.json"
For interactive process:
> docker run --rm -it --entrypoint "bash" geodataflow/pipelineapp:1.0.0
Using docker-compose
docker-compose.yml builds images and starts REST WebAPI and Workbench components to easily run Workflows with GeodataFlow.
> docker-compose up
Type in your favorite Web Browser:
- http://localhost:9630/docs to check the REST WebAPI service.
- http://localhost:9640/workbench.html to access to the Workbench UI designer, you can design and run Workflows there!
To remove all resources:
> docker-compose down --rmi all -v --remove-orphans
Usage (Command line interface)
Starting with commands of GeodataFlow:
-
To see all the available options and commands::
> geodataflow --help
-
To list all available supported modules::
> geodataflow --modules
-
Run a workflow in the command line interface:
> geodataflow --pipeline_file "/tests/data/test_eo_stac_catalog.json"
Contribute
Have you spotted a typo in our documentation? Have you observed a bug while running GeodataFlow? Do you have a suggestion for a new feature?
Don't hesitate and open an issue or submit a pull request, contributions are most welcome!
License
GeodataFlow is licensed under Apache License v2.0. See LICENSE file for details.
Authors
GeodataFlow has been created by Alvaro Huarte
https://www.linkedin.com/in/alvarohuarte.
Credits
GeodataFlow is built on top of amazingly useful open source projects. See NOTICE file for details about those projects and their licenses. Thank you to all the authors of these projects!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geodataflow-0.1.5.tar.gz
.
File metadata
- Download URL: geodataflow-0.1.5.tar.gz
- Upload date:
- Size: 73.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4362476a5b0c04285f20290e16273791ca887fd2faa9f561b4d32320e90cc8fb |
|
MD5 | a26628fac3b0846887012c17f2cfba88 |
|
BLAKE2b-256 | ec31a6c299b3951df5aa48df89af3a03a9b989fcf62a7486e4b7e0ea87f83be4 |
File details
Details for the file geodataflow-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: geodataflow-0.1.5-py3-none-any.whl
- Upload date:
- Size: 138.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f025e4f24cd4f3b1785dc311aaaba41f4626bddeada228dfabf083b7bc2b43e5 |
|
MD5 | 51e3ed8e71585584ea80ce5c9d66c53e |
|
BLAKE2b-256 | 7e7152472f91b3eb77e9b6c99a7fe17b5822fb5c4dfe8a627f94fa5c9632f273 |