cli for cirrus, a severless STAC-based processing pipeline

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Cirrus

Cirrus is a STAC-based processing pipeline. As input, Cirrus takes a GeoJSON FeatureCollection with 1 or more STAC Items. This input is run through workflows that generate 1 or more STAC Items as output. These output Items are added to the Cirrus static STAC catalog, and are also broadcast via an SNS topic that can be subscribed to for triggering additional workflows, such as keeping a dynamic STAC catalog up to date (for example, STAC-server).

Cirrus workflows can be as simple as containing no processing at all, where the input is passed through and published. It could be more complex where the STAC Items and underlying data are transformed, and then those are published. The current state (QUEUED, PROCESSING, COMPLETED, FAILED) is tracked during processing, preventing inputs from getting ingested more than once and allows for a user to follow the state of any input through the pipeline.

As shown in this high-level overview of Cirrus, users input data to Cirrus through the user of feeders. Feeders are simply programs that get/generate some type of STAC metadata, combine it with processing parameters and passes it into Cirrus in the format Cirrus expects.

Because Cirrus output is published via SNS, a Feeder can be configured to subscribe to that SNS and thus workflows can be chained, such that the output of one workflow becomes the input to another workflow and creates multiple levels of products, all with published STAC metadata and clear links showing data provenance.

Cirrus Quickstart

A Cirrus project is managed via the cirrus cli tool. Here's everything required to create, modify, and deploy a new project:

# Make a new directory for a project
❯ mkdir cirrus-project; cd cirrus-project

# Create a python virtual environment for isolation
❯ python -m venv .venv

# Activate our venv
❯ . .venv/bin/activate

# Install cirrus-geo
❯ pip install cirrus-geo
...

# Now we should have cirrus on our path
❯ cirrus
Usage: cirrus [OPTIONS] COMMAND [ARGS]...

  cli for cirrus, a severless STAC-based processing pipeline

Options:
  --cirrus-dir DIRECTORY
  -v, --verbose           Increase logging level. Can be specified multiple
                          times.  [x>=0]
  --help                  Show this message and exit.

Commands:
  build             Build the cirrus configuration into a serverless.yml.
  clean             Remove all files from the cirrus build directory.
  create            Create a new component in the project.
  init              Initialize a cirrus project in DIRECTORY.
  serverless (sls)  Run serverless within the cirrus build directory.
  show              Multifunction command to list/show components,
                    component...

# Fantastic!
# We can init our new project and see what all was created
❯ cirrus init
Succesfully initialized project in '/Users/jkeifer/cirrus-project'.

❯ ls
.venv/	cirrus.yml  feeders/  functions/  outputs/  package.json  resources/  tasks/  workflows/

# The cirrus.yml is almost good to go for a minimal install,
# but it does require a few parameters either set in the
# config or as environment variables:
#
#   custom:
#     batch:
#       SecurityGroupIds:
#         - ${env:SECURITY_GROUP_1}
#       Subnets:
#         - ${env:SUBNET_1}
#         - ${env:SUBNET_2}
#         - ${env:SUBNET_3}
#         - ${env:SUBNET_4}
#
# Use your favorite editor to set these values approriately
# based on your existing AWS resources.

# As we do have node.js dependencies from serverless,
# let's install those with the generated configuration
❯ npm install
...

# We can see all the built in feeders, tasks, and workflows (among others)
❯ cirrus show feeders
feed-rerun (built-in): Rerun items in the database
feed-s3-inventory (built-in): Feed Sentinel AWS inventory data to Cirrus for cataloging
feed-stac-api (built-in): Feed data from a STAC API to Cirrus for processing
feed-stac-crawl (built-in): Crawl static STAC assets

❯ cirrus show tasks
add-preview (built-in, lambda): Create a preview and/or thumbnail from one or more assets
convert-to-cog (built-in, lambda): Convert specified assets into Cloud Optimized GeoTIFFs
copy-assets (built-in, lambda): Copy specified assets from Item(s) to an S3 bucket
post-batch (built-in, lambda): Post process batch job by copying input from S3
pre-batch (built-in, lambda): Pre process batch job by copying input to S3
publish (built-in, lambda): Publish resulting STAC Collections and Items to catalog, and optionally SNS

❯ cirrus show workflows
cog-archive (built-in): Create mirror with some cogified assets
mirror (built-in): Mirror items with selected assets
mirror-with-preview (built-in): Mirror items with selected assets
publish-only (built-in): Simple example that just published input Collections and items

# To create a new task, for example, we can do this
❯ cirrus create task a_task "A task that doesn't do much yet"
task a_task created

❯ cirrus show tasks
add-preview (built-in, lambda): Create a preview and/or thumbnail from one or more assets
convert-to-cog (built-in, lambda): Convert specified assets into Cloud Optimized GeoTIFFs
copy-assets (built-in, lambda): Copy specified assets from Item(s) to an S3 bucket
post-batch (built-in, lambda): Post process batch job by copying input from S3
pre-batch (built-in, lambda): Pre process batch job by copying input to S3
publish (built-in, lambda): Publish resulting STAC Collections and Items to catalog, and optionally SNS
a_task (lambda): A task that doesn't do much yet

# We can see that created a task and its
# associated config inside the tasks directory
❯ tree tasks
tasks
└── a_task
    ├── README.md
    ├── definition.yml
    └── lambda_function.py

# To build our configuration in to something
# compatible with serverless, we use the build command
❯ cirrus build

# The output of build is in the .cirrus directory
❯ ls .cirrus
lambdas/  serverless.yml

# To deploy with serverless, we can simply do the following
# (optionally set the stage with `--stage <stage_name>`)
❯ cirrus serverless deploy

Cirrus Project Structure

A Cirrus project, most basically, is a directory containing a cirrus.yml configuration file. However, several subfolders are used to organize additional object definitions for custom implementations.

Folder	Purpose
feeders	Feeder Lambda functions used to add data to Cirrus
functions	Misc Lambda functions required by a project
outputs	Cloudformation output templates
resources	Cloudformation resource templates
tasks	Task Lambda function used within workflows
workflows	AWS Step Function definitions describing data processing workflows

Cirrus Repositories

Cirrus is divided up into several repositories, all under the cirrus-geo organization on GitHub, with this repository (cirrus-geo) the main one of interest to users.

Repository	Purpose
cirrus-geo	Main Cirrus repo implementing the `cirrus` cli tool for managing Cirrus projects. Also provides the base set of lambda functions and workflows.
cirrus-lib	A Python library of convenience functions to interact with Cirrus. Lambda functions are kept lightweight
cirrus-task-images	Dockerfiles and code for publishing Cirrus Docker images to Docker Hub that are used in Cirrus Batch tasks

The cirrus cli utilitiy is what is used to create, manage, and deploy Cirrus projects, and is pip-installable. The pip-installable python library cirrus-lib is used from all Cirrus Lambdas and tasks and is available to developers for writing their own tasks.

Documentation

Documentation for deploying, using, and customizing Cirrus is contained within the docs directory:

Understand the architecture of Cirrus and key concepts
Deploy Cirrus to your own AWS account
Use Cirrus to process input data and publish resulting STAC Items
Customize Cirrus by adding tasks, workflows, and compute environments

About

Cirrus is an Open-Source pipeline for processing geospatial data in AWS. Cirrus was developed by Element 84 originally under a NASA ACCESS project called Community Tools for Analysis of NASA Earth Observation System Data in the Cloud.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.14.0

Apr 26, 2024

0.13.0

Mar 4, 2024

0.12.1

Feb 15, 2024

0.12.0

Feb 14, 2024

0.11.4

Feb 14, 2024

0.11.3

Feb 13, 2024

0.11.2

Feb 13, 2024

0.11.1

Feb 12, 2024

0.11.0

Feb 5, 2024

0.10.2rc2024042601 pre-release

Apr 26, 2024

0.10.1

Jan 11, 2024

0.10.0

Jul 19, 2023

0.9.0

Jan 26, 2023

0.9.0rc0 pre-release

Nov 16, 2022

0.9.0b0 pre-release

Nov 15, 2022

0.9.0a0 pre-release

Nov 10, 2022

0.8.0

Nov 2, 2022

0.7.0

Sep 13, 2022

0.6.0

Feb 18, 2022

0.5.4

Feb 10, 2022

0.5.3

Feb 10, 2022

0.5.2

Feb 9, 2022

0.5.1

Jan 28, 2022

0.5.0

Jan 13, 2022

0.5.0a5 pre-release

Jan 7, 2022

0.5.0a4 pre-release

Nov 19, 2021

0.5.0a3 pre-release

Nov 19, 2021

0.5.0a2 pre-release

Oct 6, 2021

0.5.0a1 pre-release

Oct 5, 2021

This version

0.5.0a0 pre-release

Oct 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cirrus-geo-0.5.0a0.tar.gz (45.5 kB view hashes)

Uploaded Oct 5, 2021 Source

Built Distribution

cirrus_geo-0.5.0a0-py3-none-any.whl (89.2 kB view hashes)

Uploaded Oct 5, 2021 Python 3

Hashes for cirrus-geo-0.5.0a0.tar.gz

Hashes for cirrus-geo-0.5.0a0.tar.gz
Algorithm	Hash digest
SHA256	`75f28f963768d90df1170c90dffea2d35e6c7ca5df88270b474334063283d837`
MD5	`ae2881639236334d525ea422e553ae2b`
BLAKE2b-256	`2809c875535b74e1457a16c1bfac70e7bd1210ab9d7bae23b44a55db656001a1`

Hashes for cirrus_geo-0.5.0a0-py3-none-any.whl

Hashes for cirrus_geo-0.5.0a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa83bef49ab8ee9640e8736b7b39d6d5e92c2923b5517257788e1c333792a7c0`
MD5	`a9a660d1bb1f68b394f46b685891325d`
BLAKE2b-256	`0e213e65de464d2d56348e9c526ff4066994ed74aaaba06da733747ed878b8c3`