Skip to main content

A package for parsing Earth Observation science granule structure and extracting relations between science variables and their associated metadata, such as coordinates.

Project description

earthdata-varinfo

A Python package developed as part of the NASA Earth Observing System Data and Information System (EOSDIS) for parsing Earth Observation science granule structure and extracting relations between science variables and their associated metadata, such as coordinates. This package also includes the capability to generate variable (UMM-Var) metadata records that are compatible with the NASA EOSDIS Common Metadata Repository (CMR).

For general usage of classes and functions in earthdata-varinfo, see: https://github.com/nasa/earthdata-varinfo/blob/main/docs/earthdata-varinfo.ipynb.

Features:

CFConfig

A class that takes a JSON file and retrieves all related configuration based on the supplied mission name and collection shortname. The JSON file is optional, and if not supplied, a CFConfig class will be constructed with largely empty attributes.

from varinfo import CFConfig

cf_config = CFConfig('ICESat2', 'ATL03', config_file='config/0.0.1/sample_config_0.0.1.json')
metadata_attributes = cf_config.get_metadata_attributes('/full/variable/path')

VarInfo

A group of classes that contain metadata attributes for all groups and variables in a single granule, and the relations between all variables within that granule. Current classes include:

  • VarInfoBase: An abstract base class that contains core logic and methods used by the child classes that parse different sources of granule information.
  • VarInfoFromDmr: Child class that maps input from a .dmr file downloaded from Hyrax in the cloud. This inherits all the methods and logic of VarInfoBase.
  • VarInfoFromNetCDF4: Child class that maps input directly from a NetCDF-4 file. Thus inherits all the methods and logic of VarInfoBase.
from varinfo import VarInfoFromDmr

var_info = VarInfoFromDmr('/path/to/local/file.dmr',
                          config_file='config/0.0.1/sample_config_0.0.1.json')

# Retrieve a set of variables with coordinate metadata:
var_info.get_science_variables()

# Retrieve a set of variables without coordinate metadata:
var_info.get_metadata_variables()

# Augment a set of desired variables with all variables required to support
# the requested set. For example coordinate variables.
var_info.get_required_variables({'/path/to/science/variable'})

# Retrieve an ordered list of dimensions associated with all specified variables.
var_info.get_required_dimensions({'/path/to/science/variable'})

# Retrieve all spatial dimensions associated with the specified set of science
# variables.
var_info.get_spatial_dimensions({'/path/to/science/variable'})

The VarInfoFromDmr and VarInfoFromNetCDF4 classes also have an optional argument short_name, which can be used upon instantiation to specify the short name of the collection to which the granule belongs. This option is the preferred way to specify a collection short name, and particularly encouraged for use when a granule does not contain the collection short name within its metadata attributes (e.g., ABoVE collections from ORNL).

var_info = VarInfoFromDmr('/path/to/local/file.dmr', short_name='ATL03')

Note: as there are now two optional parameters, short_name and config_file, it is best to ensure that both are specified as named arguments upon instantiation.

UMM-Var generation

earthdata-varinfo can generate variable metadata records compatible with the CMR UMM-Var schema:

from varinfo import VarInfoFromNetCDF4
from varinfo.umm_var import export_all_umm_var_to_json, get_all_umm_var

# Instantiate a VarInfoFromNetCDF4 object for a local NetCDF-4 file.
var_info = VarInfoFromNetCDF4('/path/to/local/file.nc4', short_name='ATL03')

# Retrieve a dictionary of UMM-Var JSON records. Keys are the full variable
# paths, values are UMM-Var schema-compatible, JSON-serialisable dictionaries.
umm_var = get_all_umm_var(var_info)

# Write each UMM-Var dictionary to its own JSON file:
export_all_umm_var_to_json(list(umm_var.values()), output_dir='local_dir')

End-to-end UMM-Var generation and publication:

from cmr import CMR_OPS
from varinfo.generate_umm_var import generate_collection_umm_var

# Defaults to UAT, and not to publish:
umm_var_json = generate_collection_umm_var(<UAT collection concept ID>,
                                           <authorization header>)

# To use a production collection:
umm_var_json = generate_collection_umm_var(<Production collection concept ID>,
                                           <authorization header>,
                                           cmr_env=CMR_OPS)

# To generate and publish records for a UAT collection (note the authorization
# header must contain a LaunchPad token):
umm_var_json = generate_collection_umm_var(<UAT collection concept ID>,
                                           <authorization header>,
                                           publish=True)

Expected outputs:

  • publish=False, or not specifying a value will result in JSON output containing the UMM-Var JSON for each identified variable.
  • publish=True will return a list of strings. Each string is either the concept ID of a new UMM-Var record, or a string including the full path of a variable that failed to publish and the error messages returned from CMR.

Native IDs for generated UMM-Var records will be of format:

<collection concept ID>-<variable Name>

For variables that are hierarchical, slashes will be converted to underscores, to ensure the native ID is compatible with the CMR API.

Configuration file schema:

The configuration file schema is defined as a JSON schema file in the config directory. Each new iteration to the schema should be placed in its own semantically versioned subdirectory, and a sample configuration file should be provided. Additionally, notes on the schema changes should be provided in config/CHANGELOG.md.

Installing

Using pip

Install the latest version of the package from PyPI using pip:

$ pip install earthdata-varinfo

Other methods:

For local development, it is possible to clone the repository and then install the version being developed in editable mode:

$ git clone https://github.com/nasa/earthdata-varinfo
$ cd earthdata-varinfo
$ pip install -e .

Contributing

Contributions are welcome! For more information see CONTRIBUTING.md.

Developing

Development within this repository should occur on a feature branch. Pull Requests (PRs) are created with a target of the main branch before being reviewed and merged.

Releases are created when a feature branch is merged to main and that branch also contains an update to the VERSION file.

Development Setup:

Prerequisites:

  • Python 3.9+, ideally installed in a virtual environment, such as pyenv or conda.
  • A local copy of this repository.

Set up conda virtual environment:

conda create --name earthdata-varinfo python=3.11 --channel conda-forge \
    --channel defaults -y
conda activate earthdata-varinfo

Install dependencies:

$ make develop

or

pip install -r requirements.txt -r dev-requirements.txt

Run a linter against package code (preferably do this prior to submitting code for a PR review):

$ make lint

Run unittest suite (run via pytest, but written using unittest classes):

$ make test

pre-commit hooks:

This repository uses pre-commit to enable pre-commit checking the repository for some coding standard best practices. These include:

  • Removing trailing whitespaces.
  • Removing blank lines at the end of a file.
  • JSON files have valid formats.
  • ruff Python linting checks.
  • black Python code formatting checks.

To enable these checks:

# Install pre-commit Python package as part of test requirements:
pip install -r dev-requirements.txt

# Install the git hook scripts:
pre-commit install

# (Optional) Run against all files:
pre-commit run --all-files

When you try to make a new commit locally, pre-commit will automatically run. If any of the hooks detect non-compliance (e.g., trailing whitespace), that hook will state it failed, and also try to fix the issue. You will need to review and git add the changes before you can make a commit.

It is planned to implement additional hooks, possibly including tools such as mypy.

pre-commit.ci is configured such that these same hooks will be automatically run for every pull request.

Releasing:

All CI/CD for this repository is defined in the .github/workflows directory:

  • run_tests.yml - A reusable workflow that runs the unit test suite under a matrix of Python versions.
  • run_tests_on_pull_requests.yml - Triggered for all PRs against main. It runs the workflow in run_test.yml to ensure all tests pass on the new code.
  • publish_to_pypi.yml - Triggered either manually or for commits to the main branch that contain changes to the VERSION file.

The publish_to_pypi.yml workflow will:

  • Run the full unit test suite, to prevent publication of broken code.
  • Extract the semantic version number from VERSION.
  • Extract the release notes for the most recent version from CHANGELOG.md.
  • Build the package to be published to PyPI.
  • Publish the package to PyPI.
  • Publish a GitHub release under the semantic version number, with associated git tag.

Before triggering a release, ensure the VERSION and CHANGELOG.md files are updated accordingly.

Get in touch:

You can reach out to the maintainers of this repository via email:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

earthdata_varinfo-3.0.1.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

earthdata_varinfo-3.0.1-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file earthdata_varinfo-3.0.1.tar.gz.

File metadata

  • Download URL: earthdata_varinfo-3.0.1.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for earthdata_varinfo-3.0.1.tar.gz
Algorithm Hash digest
SHA256 4b6cf81bc2cb392beadc58dc3a26e77f3aa13c7a2520296b274edac0920d6049
MD5 f97d5791994ab778866a318eed7d6c0f
BLAKE2b-256 18d51b7939c933048183be3988b80b9ddd24f9742f1fa8d4431d1a503a7690cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for earthdata_varinfo-3.0.1.tar.gz:

Publisher: publish_to_pypi.yml on nasa/earthdata-varinfo

Attestations:

File details

Details for the file earthdata_varinfo-3.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for earthdata_varinfo-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afc3c55298bcf0234d2f80f3cca24d2a6b458a2e00e89f645269839a5c703327
MD5 da61ff9a6c640794918208a85c1242e7
BLAKE2b-256 9f16b0e365ce33961c2e2fefb3db6a6a12df6fd256a6a10a153b9fa261518f8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for earthdata_varinfo-3.0.1-py3-none-any.whl:

Publisher: publish_to_pypi.yml on nasa/earthdata-varinfo

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page