pdr · PyPI

Planetary Data Reader

These details have not been verified by PyPI

Project links

Homepage

Project description

README.md

The Planetary Data Reader (pdr)

This tool provides a single command---read(‘/path/to/file’)---for ingesting all common planetary data types. It is currently in development. Almost every kind of "primary observational data" product currently archived in the PDS (under PDS3 or PDS4) should be covered eventually. Currently-supported datasets are listed here.

If the software fails while attempting to read from datasets that we have listed as supported, please submit an issue with a link to the file and information about the error (if applicable). There might also be datasets that work but are not listed. We would like to hear about those too. If a dataset is not yet supported that you would like us to consider prioritizing, please fill out this request form.

Installation

pdr is not yet on pip or conda. We recommend cloning the repository and installing it into a condaenvironment made from the provided environment.yml file, like:

git clone https://github.com/MillionConcepts/pdr.git
cd pdr
conda env create -f environment.yml
conda activate pdr 
pip install -e .

The minimum supported version of Python is 3.9.

Using environment.yml during installation will install all dependencies (both required and optional) for pdr. If you'd prefer to forego the optional dependencies please use minimal_environment.yml. Optional dependencies and their added functions are listed below:

pvl: allows Data.load("LABEL", as_pvl=True) which will load your label as a pvl object instead of plain text
astropy: allows reading of .fits files
jupyter: allows usage of the Example Jupyter Notebook (and other jupyter notebooks you create)
pillow: allows reading of TIFF files and rendering browse images
matplotlib: allows usage of save_sparklines, an experimental browse function

Usage

(You can check out our example Notebook on Binder for a quick interactive demo of functionality: )

Just run import pdr and then pdr.read(filename) where filename is the full path to a data file or a metadata / label file (extensions .LBL, .lbl, or .xml). read() will look for corresponding data or metadata files in the same path, or read metadata directly from the data file if it has an attached label.

The function will return a pdr.Data object whose attributes include all of the data and metadata. These attributes are named according to the names of the data objects as given in the label. They can be accessed either as attributes or using dict-style [] index notation. For example, PDS3 image objects are often named "IMAGE", so you could examine a PDS3 image as an array with:

>>> data = pdr.read("/path/to/cr0_398560467edr_f0030004ccam02012m1.LBL")
>>> data['IMAGE']
array([[21, 21, 20, ..., 19, 19, 20],
       [21, 21, 21, ..., 19, 20, 20],
       [21, 21, 20, ..., 20, 20, 20],
       ...,
       [25, 25, 25, ..., 26, 26, 26],
       [25, 25, 25, ..., 27, 26, 26],
       [24, 25, 25, ..., 26, 26, 26]], dtype=int16)

The primary metadata is stored within the pdr.Data object as a pdr.Metadata object. The values within the metadata can be accessed using dict-style [] index notation. For example:

>>> data.metadata['INSTRUMENT_HOST_NAME']
'MARS SCIENCE LABORATORY'

Some PDS products (like this one) contain multiple data objects. You can look at all of the objects associated with a product with .keys():

>>> data.keys()
['LABEL',
 'IMAGE_HEADER',
 'ANCILLARY_TABLE',
 'CMD_REPLY_FRAME_SOHB_TABLE',
 'SOH_BEFORE_CHECKSUM_TABLE',
 'TAKE_IMAGE_TIME_TABLE',
 'CMD_REPLY_FRAME_SOHA_TABLE',
 'SOH_AFTER_CHECKSUM_TABLE',
 'MUHEADER_TABLE',
 'MUFOOTER_TABLE',
 'IMAGE_REPLY_TABLE',
 'IMAGE',
 'MODEL_DESC']

Output data types

In general:

Image data are presented as numpy arrays.
Table data are presented as pandas DataFrames.
Header and label data are presented as plain text objects.
Metadata is read from the label and presented as a pdr.Metadata class (behaves as a dict)
Other data are presented as simple python types (str, tuple, dict).
Data loaded from PDS4 .xml labels are presented as whatever object pds4-tools returns. We plan to normalize this behavior in the future.
There might be rare exceptions.

Notes and Caveats

Additional processing

Some data, especially calibrated image data, require the application of additional offsets or scale factors to convert the storage units to meaningful physical units. The information on how and when to apply such adjustments is typically stored (as plain text) in the instrument SIS, and the scale factors themselves are often (but not always) stored in the label. Many calibrated image files also contain special constants (like missing or invalid data), which are often not explicitly specified in the label. pdr is therefore not guaranteed to know anything about or correctly apply these constants.

pdr.Data objects offer a convenience method that attempts to mask invalid values and apply any scaling and offset specified in the label. Use it like: scaled_image = data.get_scaled('IMAGE'). However, we do not perform science validation of these outputs, so do not trust that they are ready for analysis without further processing or validation. Contributions towards making this more effective for specific data product types are very much welcomed.

If you'd like to visualize the outputs that this creates the dump_browse feature creates a separate browse product (.jpg, .txt., or .csv) in the folder you execute from. Use it like: data.dump_browse(). This uses the get_scaled feature for images and will also output browse products for tables and labels.

PDS4 products

All valid PDS4 products should be fully supported. pdr.Data simply wraps pds4-tools. They may not, however, behave in exactly the same way as objects loaded using pdr's native functionality. In general, if a PDS3 label is available for a product, we recommend loading the product from it rather than the PDS4 label. We plan to implement a unified interface for PDS3 and PDS4 metadata later on in the project.

.FMT files

Some PDS3 table data are defined in external reference files (usually with a .FMT extension). You can often find these in the LABEL or DOCUMENT subdirectories of the data archive volumes. If you place the relevant format files in the same directory as the data files you are trying to read, pdr will be able to find them. Otherwise, it will not and a warning will be thrown with the name of the file needed to assist in locating it. Future functionality may make this smoother.

Data attribute naming

The observational and metadata attributes (or keys) of pdr.Data objects take their names directly from the metadata files. We believe that maintaining this strong correlation between the representation of the data in-language and the representation of the data in-file is important, even when it causes us to break strict PEP-8 standards for attribute capitalization. There are three exceptions at present:

Some table formats include repeated column names. For usability and compatibility, we force these to be unique by suffixing 0-indexed increasing integers. So a table definition with two separate columns named "COLUMN" will return a pandas DataFrame with columns named "COLUMN_0" and "COLUMN_1."
PDS3 data object names sometimes contain spaces. pdr replaces the spaces with underscores in order to make them usable as attributes.
PDS4 labels loaded by pds4-tools are renamed "LABEL" for internal consistency. We plan to deprecate this behavior in the future.

Lazy loading

pdr.Data.read has lazy loading as default and will only load data objects from a file when that object is referenced. For example, calling data.IMAGE will load the IMAGE object at that time. You can alternatively load objects by using the load method, like data.load("IMAGE"). You can also pass the 'all' argument to load all data objects, like data.load('all'). Lazy-loading variety of reasons, but one common use case is accessing products with multiple large files (like Chandrayaan-1 M3 L1B and L2 products). It is likely that in many cases you will only want to reference one or two of those files, and not waste time and memory loading all of them on initialization.

Missing files

If a file referenced by a label is missing, pdr will throw warnings and populate the associated attribute from the portion of the label that mentions that file. You are most likely to encounter this for DESCRIPTION files in document formats (like .TXT). These warnings do not prevent you from using objects loaded from files that are actually present in your filesystem.

Big files (like HiRISE)

pdr currently performs no special memory management, so use caution when attempting to read very large files. We intend to implement memory management in the future.

tests

Our testing methodology for pdr currently focuses on end-to-end integration testing to ensure consistency, coverage of supported datasets, and (to the extent we can verify it) correctness of output.

the test suite for pdr lives in a different repository: https://github.com/MillionConcepts/pdr-tests. Its core is an application called ix. It should be considered a fairly complete alpha; we are actively using it both as a regression test suite and an active development tool.

This work is supported by NASA grant No. 80NSSC21K0885.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.3.0

Oct 18, 2024

1.2.3

Sep 30, 2024

1.2.2

Sep 30, 2024

1.2.1

Sep 18, 2024

1.2.0

Sep 18, 2024

1.1.2

Jun 18, 2024

1.1.1

Jun 14, 2024

1.1.0

May 21, 2024

1.0.7

Apr 23, 2024

1.0.6

Mar 28, 2024

1.0.5

Dec 8, 2023

1.0.4

Oct 23, 2023

1.0.3

Oct 3, 2023

1.0.2

Aug 1, 2023

1.0.1

Jun 20, 2023

1.0.0

Jun 5, 2023

0.7.5

Mar 16, 2023

0.7.4

Mar 13, 2023

0.7.3

Jan 18, 2023

0.7.2

Oct 30, 2022

0.7.1

Sep 28, 2022

0.7.0

Jul 29, 2022

0.6.3

Jul 14, 2022

0.6.2

Jun 27, 2022

0.6.1

Apr 27, 2022

This version

0.6.0

Apr 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdr-0.6.0.tar.gz (43.4 kB view hashes)

Uploaded Apr 26, 2022 Source

Built Distribution

pdr-0.6.0-py3-none-any.whl (45.8 kB view hashes)

Uploaded Apr 26, 2022 Python 3

Hashes for pdr-0.6.0.tar.gz

Hashes for pdr-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`f6d3e96898083a9c908d0a2125a58562f464e8afcb3c88da79048752c4afba0d`
MD5	`f5de8c7888bedce723b96e093e57f0e7`
BLAKE2b-256	`f1a67a4670485b35dc01fd55b0ff6cc8e9bf6789750a1471760968fc0c865443`

Hashes for pdr-0.6.0-py3-none-any.whl

Hashes for pdr-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38842f3564b1efefa47216b721823912f99594508c3ebb2e8f9d40fbbcb77aa1`
MD5	`25726c36f73d4d8131be4a1a88d3221f`
BLAKE2b-256	`33769cbd8c3b5408114dd8b7cdaf4192577fbad8953fdd7a7256dfdd0bf9201a`