Skip to main content

A collection of PDS4 utilities

Project description


Utilities for working with NASA Planetary Data System v4 (PDS4) data files


The following dependencies must be met:

  • python 3
  • pandas
  • pyyaml
  • lxml
  • PDS4 tools


First, clone this repository. If you are using conda, the dependencies can be installed in a new environment using the provided environment file:

conda env create -f environment.yml

The newly created environment can be activated with:

conda activate pds4utils

Otherwise, please make sure the dependencies are installed with your system package manager, or a tool like pip. Use of a conda environment or virtualenv is recommended!

The package can then be installed with:

python install


The module contains a few simple functions and a class. A brief overview is given here:


  • reads 2D tables from PDS4 products
  • one level of group fields are supported
  • returns a pandas dataframe
    • group field data are returned as an array in each pandas cell
    • if table_name is not given, the first table is returned
    • the DataFrame is indexed by the first time field, if any
      • this can be set using the index_col parameter


  • reads multiple tables using read_table
  • useful for building a large dataframe from many similar data products
  • set add_filename=True to add the product name to each row, to track which product the data came from

index_products(directory='.', pattern='*.xml')

  • searches for PDS4 labels recursively in directory matching pattern
  • returns a pandas DataFrame with one row per product
  • returned data include:
    • LID + VID
    • bundle, collection and product identifier
    • start and stop time, if present


  • this class builds one or more DataFrames containing custom meta-data from a set of PDS4 products
  • a YAML formatted configuration file is required to determine which attributes to read
    • the Xpath to each attribute must be known
    • see example.yml for more information
    • if no config file is specified when instantiating the class, a default is looked for
      • pds_dbase.yml in the user's home directory, or pointed to by APPDATA or XDG_CONFIG_HOME
  • each entry in the configuration file produces one database table (one Pandas dataframe)
    • to see which tables have been loaded, use list_tables()
    • to return a table, use get_table(table)
    • to save or restore a database using save_dbase() or load_dbase()


The Jupyter notebook included with this repository shows an example of pds4_utils in use. To view the notebook, click here.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pds4_utils-0.3.1.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

pds4_utils-0.3.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page