Skip to main content

Batch process Open Data Cube datasets

Project description

Datacube Alchemist - ODC Dataset to Dataset Converter

Scan Test Push codecov

PURPOSE

Datacube Alchemist is a command line application for performing Dataset to Dataset transformations in the context of an Open Data Cube system.

It uses a configuration file which specifies an input Product or Products, a Transformation to perform, and output parameters and destination.

Features

  • Writes output to Cloud Optimised GeoTIFFs
  • Easily runs within a Docker Container
  • Parallelism using AWS SQS queues and Kubernetes
  • Write output data to S3 or a file system
  • Generates eo3 format dataset metadata, along with processing information
  • Generates STAC 1.0.0.beta2 dataset metadata
  • Configurable thumbnail generation
  • Pass any command line options as Environment Variables

INSTALLATION

You can build the docker image locally with Docker or Docker Compose. The commands are docker build --tag opendatacube/datacube-alchemist . or docker-compose build.

There's a Python setup file, so you can do pip3 install . in the root folder. You will need to ensure that the Open Data Cube and all its dependencies happily install though.

USAGE

Development environment

To run some example processes you can use the Docker Compose file to create a local workspace. To start the workspace and run an example, you can do the following:

  • Export the environment variables ODC_ACCESS_KEY and ODC_SECRET_KEY with valid AWS credentials
  • Run make up or docker-compose up to start the postgres and datacube-alchemist Docker containers
  • make initdb to initialise the ODC database (or see the Makefile for the specific command)
  • make metadata will add the metadata that the Landsat example product needs
  • make product will add the Landsat product definitions
  • make index will index a range of Landsat scenes to test processing with
  • make wofs-one or make fc-one will process a single Fractional Cover or Water Observations from Space scene and output the results to the ./examples folder in this project directory

Commands

Note that the --config-file can be a local path or a URI.

datacube-alchemist run-one

Note that --dryrun is optional, and will run a 1/10 scale load and will not write output to the final destination.

datacube-alchemist run-one \
  --config-file ./examples/c3_config_wo.yaml \
  --uuid 7b9553d4-3367-43fe-8e6f-b45999c5ada6 \
  --dryrun \

datacube-alchemist run-many

Note that the final argument is a datacube expression , see Datacube Search documentation.

datacube-alchemist run-many \
  --config-file ./examples/c3_config_wo.yaml \
  --limit=2 \
  --dryrun \
  time in 2020-01

datacube-alchemist run-from-queue

Notes on queues. To run jobs from an SQS queue, good practice is to create a deadletter queue as well as a main queue. Jobs (messages) get picked up off the main queue, and if they're successful, then they're deleted. If they aren't successful, they're not deleted, and they go back on the main queue after a defined amount of time. If this happens more than the defined number of times then the message is moved to the deadletter queue. In this way, you can track work completion.

datacube-alchemist run-from-queue \
  --config-file ./examples/c3_config_wo.yaml \
  --queue example-queue-name \
  --limit=1 \
  --queue-timeout=600 \
  --dryrun

datacube-alchemist add-to-queue

The --limit is the total number of datasets to limit to, whereas the --product-limit is the number of datasets per product, in the case that you have multiple input products.

datacube-alchemist add-to-queue \
  --config-file ./examples/c3_config_wo.yaml \
  --queue example-queue-name \
  --limit=300 \
  --product-limit=100

datacube-alchemist redrive-to-queue

This will get items from a deadletter queue and push them to an alive queue. Be careful, because it doesn't know what queue is what. You need to know that!

datacube-alchemist redrive-to-queue \
  --queue example-from-queue \
  --to-queue example-to-queue

License

Apache License 2.0

Copyright

© 2021, Open Data Cube Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacube-alchemist-0.6.7.tar.gz (50.1 kB view details)

Uploaded Source

Built Distribution

datacube_alchemist-0.6.7-py2.py3-none-any.whl (27.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file datacube-alchemist-0.6.7.tar.gz.

File metadata

  • Download URL: datacube-alchemist-0.6.7.tar.gz
  • Upload date:
  • Size: 50.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for datacube-alchemist-0.6.7.tar.gz
Algorithm Hash digest
SHA256 33c17a7a4491c681f9fa1f3dbeb8a8a2412cd370ff669c577c176d3d771cff2a
MD5 18655792ccfd7a3f5b412a3de00d5d17
BLAKE2b-256 ca7728e7e3c746d7f44ae8e80754de17596ee6d0ddd1e5ed390f69b1fd548654

See more details on using hashes here.

File details

Details for the file datacube_alchemist-0.6.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for datacube_alchemist-0.6.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6748eedbdfef3feaa3b2a32d844c1a6cb40062128716e1e52ae300819a7c4325
MD5 4fd44e0a179800a3ab791b13132dea3e
BLAKE2b-256 79a9f874af4520a30ed9305a03dba942d662f9296b94e0f469200eb06f1ec940

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page