Command-line interface (CLI) to rate and crawl STAC
Project description
heystac
A command-line utility (CLI) for rating and crawling STAC catalogs. heystac generates the ratings for https://www.gadom.ski/heystac/.
Usage
python -m pip install heystac
heystac --help
To rate a STAC catalog, collection, or item:
$ heystac rate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st/items/LC09_L2SP_090091_20241118_20241119_02_T2_ST
5.0 ★★★★★
Any issues will be printed to standard output:
$ heystac rate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st
1.7 ★★
High importance issues
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Rule id | Message |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| validate-core | Validation failed for Collection with ID landsat-c2l2-st against schema at https://schemas.stacspec.org/v1.0.0/collection-spec/json-schema/collection.json |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
To run json-schema validation on a STAC value:
$ heystac validate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st 2>&1 | tail -n7
Failed validating 'pattern' in schema['allOf'][0]['properties']['license']:
{'title': 'Collection License Name',
'type': 'string',
'pattern': '^[\\w\\-\\.\\+]+$'}
On instance['license']:
'https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/Landsat_Data_Policy.pdf'
To crawl a catalog and save the crawl to a directory:
heystac crawl https://landsatlook.usgs.gov/stac-server usgs-landsat
Definitions
We've made some opinionated decisions about behavior in this CLI.
Rate
A Rating
is generated by applying a set of Rules
to a STAC value.
This produces one Check
per rule.
Each Check
has a score between zero and one:
0
: the STAC value failed the check1
: the STAC value passed the check- Something between
0
and1
: the STAC value partially failed the check, e.g. if the check was for valid links and some links were valid and some were not
Each rule also has an Importance
:
high
medium
low
heystac applies a configurable weight to each check based on its importance to produce a score
for the STAC value.
That score is converted to stars
by the following formula: 5 * score / total
, where total
is the maximum possible score.
Crawl
When heystac crawls a STAC API, it gets every collection and one item from each collection. The catalog is saved to the local filesystem in the following layout:
catalog.json
collection-a/collection.json
collection-a/item-from-collection-a.json
collection-b/collection.json
collection-b/item-from-collection-b.json
The item file names are generated from the item ID, with all /
characters replaced by _
.
Configuration
heystac comes with a default configuration that should work for most use-cases.
If you want to customize anything, such as the importance weights or the rule descriptions, save the default configuration to a file called heystac.toml
:
heystac config > heystac.toml
You can then edit that file to your heart's content.
By default, the CLI will read heystac.toml
in your current working directory.
To specify a config file in another location:
heystac --config a/nother/path/config.toml
License
MIT