Skip to main content

Batch process Open Data Cube datasets

Project description

Datacube Alchemist - ODC Dataset to Dataset Converter

Scan Test Push codecov

PURPOSE

Datacube Alchemist is a command line application for performing Dataset to Dataset transformations in the context of an Open Data Cube system.

It uses a configuration file which specifies an input Product or Products, a Transformation to perform, and output parameters and destination.

Features

  • Writes output to Cloud Optimised GeoTIFFs
  • Easily runs within a Docker Container
  • Parallelism using AWS SQS queues and Kubernetes
  • Write output data to S3 or a file system
  • Generates eo3 format dataset metadata, along with processing information
  • Generates STAC 1.0.0.beta2 dataset metadata
  • Configurable thumbnail generation
  • Pass any command line options as Environment Variables

INSTALLATION

You can build the docker image locally with Docker or Docker Compose. The commands are docker build --tag opendatacube/datacube-alchemist . or docker-compose build.

There's a Python setup file, so you can do pip3 install . in the root folder. You will need to ensure that the Open Data Cube and all its dependencies happily install though.

USAGE

Development environment

To run some example processes you can use the Docker Compose file to create a local workspace. To start the workspace and run an example, you can do the following:

  • Export the environment variables ODC_ACCESS_KEY and ODC_SECRET_KEY with valid AWS credentials
  • Run make up or docker-compose up to start the postgres and datacube-alchemist Docker containers
  • make initdb to initialise the ODC database (or see the Makefile for the specific command)
  • make metadata will add the metadata that the Landsat example product needs
  • make product will add the Landsat product definitions
  • make index will index a range of Landsat scenes to test processing with
  • make wofs-one or make fc-one will process a single Fractional Cover or Water Observations from Space scene and output the results to the ./examples folder in this project directory

Commands

Note that the --config-file can be a local path or a URI.

datacube-alchemist run-one

Note that --dryrun is optional, and will run a 1/10 scale load and will not write output to the final destination.

datacube-alchemist run-one \
  --config-file ./examples/c3_config_wo.yaml \
  --uuid 7b9553d4-3367-43fe-8e6f-b45999c5ada6 \
  --dryrun \

datacube-alchemist run-many

Note that the final argument is a datacube expression , see Datacube Search documentation.

datacube-alchemist run-many \
  --config-file ./examples/c3_config_wo.yaml \
  --limit=2 \
  --dryrun \
  time in 2020-01

datacube-alchemist run-from-queue

Notes on queues. To run jobs from an SQS queue, good practice is to create a deadletter queue as well as a main queue. Jobs (messages) get picked up off the main queue, and if they're successful, then they're deleted. If they aren't successful, they're not deleted, and they go back on the main queue after a defined amount of time. If this happens more than the defined number of times then the message is moved to the deadletter queue. In this way, you can track work completion.

datacube-alchemist run-from-queue \
  --config-file ./examples/c3_config_wo.yaml \
  --queue example-queue-name \
  --limit=1 \
  --queue-timeout=600 \
  --dryrun

datacube-alchemist add-to-queue

The --limit is the total number of datasets to limit to, whereas the --product-limit is the number of datasets per product, in the case that you have multiple input products.

datacube-alchemist add-to-queue \
  --config-file ./examples/c3_config_wo.yaml \
  --queue example-queue-name \
  --limit=300 \
  --product-limit=100

datacube-alchemist redrive-to-queue

This will get items from a deadletter queue and push them to an alive queue. Be careful, because it doesn't know what queue is what. You need to know that!

datacube-alchemist redrive-to-queue \
  --queue example-from-queue \
  --to-queue example-to-queue

License

Apache License 2.0

Copyright

© 2021, Open Data Cube Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacube-alchemist-0.6.7.tar.gz (50.1 kB view hashes)

Uploaded Source

Built Distribution

datacube_alchemist-0.6.7-py2.py3-none-any.whl (27.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page