Batch process Open Data Cube datasets
Project description
Datacube Alchemist - ODC Dataset to Dataset Converter
PURPOSE
Datacube Alchemist is a command line application for performing Dataset to Dataset transformations in the context of an Open Data Cube system.
It uses a configuration file which specifies an input Product or Products, a Transformation to perform, and output parameters and destination.
Features
- Writes output to Cloud Optimised GeoTIFFs
- Easily runs within a Docker Container
- Parallelism using AWS SQS queues and Kubernetes
- Write output data to S3 or a file system
- Generates
eo3format dataset metadata, along with processing information - Generates STAC 1.0.0.beta2 dataset metadata
- Configurable thumbnail generation
- Pass any command line options as Environment Variables
INSTALLATION
You can build the docker image locally with Docker or Docker Compose. The commands are
docker build --tag opendatacube/datacube-alchemist . or docker-compose build.
There's a Python setup file, so you can do pip3 install . in the root folder. You will
need to ensure that the Open Data Cube and all its dependencies happily install though.
USAGE
Development environment
To run some example processes you can use the Docker Compose file to create a local workspace. To start the workspace and run an example, you can do the following:
- Export the environment variables
ODC_ACCESS_KEYandODC_SECRET_KEYwith valid AWS credentials - Run
make upordocker-compose upto start the postgres and datacube-alchemist Docker containers make initdbto initialise the ODC database (or see the Makefile for the specific command)make metadatawill add the metadata that the Landsat example product needsmake productwill add the Landsat product definitionsmake indexwill index a range of Landsat scenes to test processing withmake wofs-oneormake fc-onewill process a single Fractional Cover or Water Observations from Space scene and output the results to the ./examples folder in this project directory
Commands
Note that the --config-file can be a local path or a URI.
datacube-alchemist run-one
Note that --dryrun is optional, and will run a 1/10 scale load and will not
write output to the final destination.
datacube-alchemist run-one \
--config-file ./examples/c3_config_wo.yaml \
--uuid 7b9553d4-3367-43fe-8e6f-b45999c5ada6 \
--dryrun \
datacube-alchemist run-many
Note that the final argument is a datacube expression , see Datacube Search documentation.
datacube-alchemist run-many \
--config-file ./examples/c3_config_wo.yaml \
--limit=2 \
--dryrun \
time in 2020-01
datacube-alchemist run-from-queue
Notes on queues. To run jobs from an SQS queue, good practice is to create a deadletter queue as well as a main queue. Jobs (messages) get picked up off the main queue, and if they're successful, then they're deleted. If they aren't successful, they're not deleted, and they go back on the main queue after a defined amount of time. If this happens more than the defined number of times then the message is moved to the deadletter queue. In this way, you can track work completion.
datacube-alchemist run-from-queue \
--config-file ./examples/c3_config_wo.yaml \
--queue example-queue-name \
--limit=1 \
--queue-timeout=600 \
--dryrun
datacube-alchemist add-to-queue
The --limit is the total number of datasets to limit to, whereas the --product-limit is
the number of datasets per product, in the case that you have multiple input products.
datacube-alchemist add-to-queue \
--config-file ./examples/c3_config_wo.yaml \
--queue example-queue-name \
--limit=300 \
--product-limit=100
datacube-alchemist redrive-to-queue
This will get items from a deadletter queue and push them to an alive queue. Be careful, because it doesn't know what queue is what. You need to know that!
datacube-alchemist redrive-to-queue \
--queue example-from-queue \
--to-queue example-to-queue
License
Apache License 2.0
Copyright
© 2021, Open Data Cube Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacube-alchemist-0.6.7.tar.gz.
File metadata
- Download URL: datacube-alchemist-0.6.7.tar.gz
- Upload date:
- Size: 50.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33c17a7a4491c681f9fa1f3dbeb8a8a2412cd370ff669c577c176d3d771cff2a
|
|
| MD5 |
18655792ccfd7a3f5b412a3de00d5d17
|
|
| BLAKE2b-256 |
ca7728e7e3c746d7f44ae8e80754de17596ee6d0ddd1e5ed390f69b1fd548654
|
File details
Details for the file datacube_alchemist-0.6.7-py2.py3-none-any.whl.
File metadata
- Download URL: datacube_alchemist-0.6.7-py2.py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6748eedbdfef3feaa3b2a32d844c1a6cb40062128716e1e52ae300819a7c4325
|
|
| MD5 |
4fd44e0a179800a3ab791b13132dea3e
|
|
| BLAKE2b-256 |
79a9f874af4520a30ed9305a03dba942d662f9296b94e0f469200eb06f1ec940
|