Skip to main content

Zarr Store class for working with EOSDIS cloud data

Project description

.. warning::

DEPRECATED: This repository is no longer maintained. Please use VirtualiZarr <https://github.com/zarr-developers/VirtualiZarr>_ instead.


.. image:: https://img.shields.io/badge/maintenance-no--longer--maintained-red :alt: No Longer Maintained

zarr-eosdis-store

The zarr-eosdis-store library allows NASA EOSDIS Collections to be accessed efficiently by the Zarr Python library <https://zarr.readthedocs.io/en/stable/index.html>_, provided they have a sidecar DMR++ metadata file generated.

Installation

This module requires Python 3.8 or greater::

$ python --version
Python 3.8.2

Install from PyPI::

$ pip install zarr-eosdis-store

To install the latest development version::

$ pip install pip install git+https://github.com/nasa/zarr-eosdis-store.git@main#egg=zarr-eosdis-store

Earthdata Login

To access EOSDIS data, you need to sign in with a free NASA Earthdata Login account, which you can obtain at <https://urs.earthdata.nasa.gov/>_.

Once you have an account, you will need to add your credentials to your ~/.netrc file::

machine urs.earthdata.nasa.gov login YOUR_USERNAME password YOUR_PASSWORD

If you are accessing test data, you will need to use an account from the Earthdata Login test system at <https://uat.urs.earthdata.nasa.gov/>_ instead, adding a corresponding line to your ~/.netrc file::

machine uat.urs.earthdata.nasa.gov login YOUR_USERNAME password YOUR_PASSWORD

Usage

To use the library, simply instantiate eosdis_store.EosdisStore with the URL to the data file you would like to access, pass it to the Zarr library as you would with any other store, and use the Zarr API as with any other read-only Zarr file. Note: the URL to the data file will typically end with an HDF5 or NetCDF4 extension, not .zarr.

.. code-block:: python

from eosdis_store import EosdisStore import zarr

Assumes you have set up .netrc with your Earthdata Login information

f = zarr.open(EosdisStore('https://example.com/your/data/file.nc4'))

Read metadata and data from f using the Zarr API

print(f['parameter_name'][0:0:0])

If the data has _FillValue (to flag nodata), scale_factor, or add_offset set (defined in metadata using CF-conventions) they can be retrieved from the parameter attributes.

.. code-block:: python

import numpy as np

scale_factor = f['parameter_name].scale_factor add_offset = f['parameter_name].add_offset nodata = f['parameter_name]._FillValue

arr = f['parameter_name'][] * scale_factor + add_offset

nodata_locs = np.where(arr == nodata)

A better way to handle these is to use XArray. Rather than reading the data immediately when a slice is requested, XArray defers the read until the data is actually accessed. With the Zarr backend to XArray, the scale and offset can be set so that when the data is accessed it will apply those values. This is more efficient if the data is going to be used in other operations.

The scale_factor and get_offset will be used if specified in the NetCDF/HDF5 file.

.. code-block:: python

import xarray

store = EosdisStore('https://example.com/your/data/file.nc4')

f = xarray.open_zarr(store)

the data is not read yet

xa = f['parameter_name'][]

convert to numpy array, data is read

arr = xa.values

The resulting array will have had scale and offset applied, and any element that is equal to the _FillValue attribute will be set to numpy nan. To use XArray without apply the scale and offset or setting the nodata to nan, supply the mask_and_scale keyword to xarray.open_zarr to False:

.. code-block:: python

store = EosdisStore('https://example.com/your/data/file.nc4')

f = xarray.open_zarr(store, mask_and_scale=False)

Technical Summary

We make use of a technique to read NetCDF4 and some HDF5 files that was prototyped by The HDF Group and USGS, described here <https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314)>_.

To allow the technique to work with EOSDIS data, we have extended it and optimized access in the following key ways:

  • The EosdisStore reads a DMR++ file generated by OPeNDAP to present its metadata and determine byte offsets to the Zarr library. By reusing these, we avoid needing to generate new metadata sidecar files to support new data.

  • The store uses HTTPS and authenticates with a .netrc entry, rather than the S3 API, making it compatible with EOSDIS access patterns and requirements

  • The store caches redirect URLs for a period of time set by the Cache-Control header. Doing this avoids the overhead of repeated redirects when accessing parts of files.

  • The store uses a parallel API that allows it to make more efficient access optimizations:

    • When the Zarr library accesses data that requires reading multiple near-sequential bytes in the file, the store combines these smaller requests into a single larger request.

    • After an initial request to cache any authentication and redirect information, the store runs subsequent requests in parallel.

Development

Clone the repository, then pip install its dependencies::

pip install -r requirements.txt
pip install -r requirements-dev.txt

To check code coverage and run tests::

coverage run -m pytest

To check coding style::

flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

To build documentation, generated at docs/_build/html/index.html::

cd docs && make html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarr_eosdis_store-0.1.6.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zarr_eosdis_store-0.1.6-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file zarr_eosdis_store-0.1.6.tar.gz.

File metadata

  • Download URL: zarr_eosdis_store-0.1.6.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for zarr_eosdis_store-0.1.6.tar.gz
Algorithm Hash digest
SHA256 38e9b71f8e6075366f470acf7b2729938860fb0df9e8f1e587f277c512efaeba
MD5 753db25d2608bf7e12c9ab9d17e8c8fc
BLAKE2b-256 b0ac7ffe384c9c76c16fc37a57b498d0d4cde0429f1540299ae02eafdc810bb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarr_eosdis_store-0.1.6.tar.gz:

Publisher: publish-release.yml on nasa/zarr-eosdis-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarr_eosdis_store-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for zarr_eosdis_store-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e3b7798f3ed3885bd494966ce7b0051e6a39806526dcec069b2fc068f41952ec
MD5 1acdf9301b63a4eeab314ef90ccc9e6b
BLAKE2b-256 1161b00ac93556523f7a653246f8d3394088098dec804c84b7fdcbdabe3cfcd0

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarr_eosdis_store-0.1.6-py3-none-any.whl:

Publisher: publish-release.yml on nasa/zarr-eosdis-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page