Skip to main content

CDX Client

Project description

**********
cdx.client
**********

.. contents::

CDX Client provides an API library and command-line tools for
accessing CDX data. CDX is the Climate Data Exchange, an effort of the Jet
Propulsion Laboratory to create a virtual environment for the sharing of
climate data.

Installation
************

This document tells you how to install cdx.client.


Quick Instructions
==================

As a user with administrative privileges, run::

easy_install cdx.client

That's it.


Full Instructions
=================

cdx.client requires the Python_ programming language. We recommend version 2.4
or later. As of this writing, 2.6 is the latest stable version. If Python is
not yet installed on your system, you can find binary and and source
distributions from the Python website.

To test if a correct version of Python is available on your system, run::

python -V

You should see output similar to::

Python 2.6

indicating the version of Python installed. cdx.client also requires `Agile
OODT`_. OODT_ is Object Oriented Data Technology, a framework for metadata
and data grids. Agile OODT is a Python version of OODT that supports higher
performance and easier integration than the Java_ version.

By far the easiest, recommended, and encouraged way to install cdx.client is
to use EasyInstall_. If your Python installation has EasyInstall available to
it, then this one command is all you need to run in order to download, build,
install, and generate command-line tools all in one go for all users on your
system::

easy_install cdx.client

Be sure to run that command as an administrative user. For example, on Mac OS
X and other Unix systems, you might need to run::

sudo easy_install cdx.client

That will also download and install all dependencies, including Agile OODT.


Executables
-----------

The commands ``cdxls`` and ``cdxget`` will be generated and placed with your
standard installation directory for Python commands. Usually, this is the
same location as the ``python`` executable itself. For example, on Mac OS X
10.5, the directory is::

/Library/Frameworks/Python.framework/Versions/Current/bin

You may want to add that directory to your shell's PATH variable, as well as
forcing your shell to re-scan the PATH variable for new executables.


Installing EasyInstall
----------------------

If you happen to be on a system where your Python installation lacks easy
install, fret not. Upgrading your system to gain EasyInstall's abilities is
quite simple. Follow these instructions:

1. Download http://peak.telecommunity.com/dist/ez_setup.py
2. As an administrative user, run the freshly-downloaded ez_setup.py file
using your system's Python.

EasyInstall and its necessary libraries will be downloaded, built, and
installed for you, and the ``easy_install`` executable generated. The
location of the ``easy_install`` executable is as described above.


Installing Without EasyInstall
------------------------------

If EasyInstall is not available on your system, you can still make a proper
installation of cdx.client. Follow these instructions:

1. Download the Agile OODT source distribution from
http://oodt.jpl.nasa.gov/dist/agile-oodt/oodt-0.0.1.tar.gz.
Substitute version numbers as appropriate.
2. Download the cdx.client source distribution.
Substitute version numbers as appropriate.
3. Unpack each archive.
4. Change the current working directory to each newly-created subdirectory,
``oodt-0.0.1`` and ``cdx.client-0.0.2``, again substituting version
numbers as appropriate.
5. As an administrative user, run: ``python setup.py install`` in each
subdirectory.


Issues and Questions
====================

To report any problems with or ask for help about cdx.client, visit our
contact_ web page.


.. References:
.. _Agile OODT: http://agility.jpl.nasa.gov/
.. _contact: http://cdx.jpl.nasa.gov/contact-info
.. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall
.. _Java: http://tinyurl.com/5kng2h
.. _OODT: http://oodt.jpl.nasa.gov/
.. _Python: http://python.org/

Using CDX Client
****************

Installing the CDX Client package makes available three things on your
computer:

``cdxls`` command
The ``cdxls`` command lets you list the contents of a CDX server from your
terminal prompt or a shell script.
``cdxget`` command
The ``cdxget`` command lets you retrieve data from CDX from either your
terminal prompt or a shell script.
CDX Library
The CDX Library is a Python-based API for using CDX servers.

This document describes how to use the above three items, with special
attention to the CDX Library.


Commands
========

After installing the CDX Client package, two new command are made available on
your system, ``cdxls`` and ``cdxget``. These commands enable you to list the
contents of the data on a CDX server and retrieve selected files from the
server.

To use these commands from your interactive prompt, you just need to make sure
your shell's PATH environment variable includes the directory where the
commands are installed. On most systems, these two commands are installed in::

/usr/local/bin

However, on Mac OS X, the installation location may be::

/Library/Frameworks/Python.framework/Versions/Current/bin

And on Windows, it may be::

c:\Program Files\Python

Note also that some interactive shells create a cache of commands in order to
execute your requests more quickly. You may need to force your shell to
re-build that cache. The csh and tcsh shells are two such examples; you can
make these shells rebuild their caches by running the ``rehash`` command.


Use from Shell Scripts
----------------------

The ``cdxls`` and ``cdxget`` commands may be used from shell scripts as well. The
only requirement for making these commands available to shell scripts is the
same as for interactive sessions: the shell's PATH environment variable must
include the directory that contains the ``cdxls`` and ``cdxget`` commands.

Here is a sample shell script that retrieves the MLS Aura L2GP data files (and
metadata) files for HO2 and HOCl from day 325 in 2008::

#!/bin/sh
PATH=/usr/local/bin:/usr/bin:/bin; export PATH
CDX_SERVER=http://mlscdx.jpl.nasa.gov:8080/cdx/prod; export CDX_SERVER

for kind in HO2 HOCl; do
for extension in he5 he5.met; do
cdxget 2008/325/MLS-Aura_L2GP-${kind}_v02-23-c01_2008d325.${extension}
done
done

The above shell script assumes that ``cdxget`` will be found in
``/usr/local/bin``, ``/usr/bin``, or ``/bin``. It also sets the ``CDX_SERVER``
environment variable to set what CDX server to talk to. It then loops through
two kinds of data (``HO2`` and ``HOCl``), and loops through two kinds of file
extensions (``he5`` and ``he5.met``). The results is it retrieves four files to
the current working directory, specifically:

* 2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5
* 2008/325/MLS-Aura_L2GP-HO2_v02-23-c01_2008d325.he5.met
* 2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5
* 2008/325/MLS-Aura_L2GP-HOCl_v02-23-c01_2008d325.he5.met


The ``cdxsubset`` command may also be used from a shell script. It is
configured by two environment variables:

* CDX_SUBSET_MODE - if set then local data wrapper mode will be used (remote
is assumed as default)
* CDX_SERVER - set to the product server to talk to for subsetting

Some example working commands are:

Subset spatial bounding box from NCAR CCSM model output::

cdxsubset -b
/esg/data18/commit/atm/da/hfls/ncar_ccsm3_0/run1/hfls_A2.Commit_1.CCSM.atmd.2000-01-
01_cat_2039-12-31.nc

Subset time range from NCAR CCSM model output::

cdxsubset -t
/esg/data18/commit/atm/da/hfls/ncar_ccsm3_0/run1/hfls_A2.Commit_1.CCSM.atmd.2000-01-
01_cat_2039-12-31.nc

Get time array variable data from the MLS L2 granule::

cdxsubset -p Time /mls/2005/100/MLS-Aura_L2GP-BrO_v01-51-c01_2005d100.he5

Get spatial bounding box from AIRS level 2 granule::

cdxsubset -b
/airs/data/s4pa/Aqua_AIRS_Level2/AIRX2RET.003/2007/005/AIRS.2007.01.05.240.L2.RetStd.v4.0.9.0.
G07007180718.hdf

Subset by lat lon and variable for an AIRS level 2 granule::

cdxsubset -p TAirStd --latitude-range=67.35:78.40 -longitude-range=172.226:176.10
/airs/2009/01/01/airx2ret/AIRS.2009.01.01.001.L2.RetStd.v5.2.2.0.G09002135510.hdf

CDX Library
===========

The CDX Library is a Python-based application programming interface (API) for
communicating with CDX servers. In fact, the two commands ``cdxls`` and
``cdxget`` are implemented using the CDX Library. If shell-script programming
is not to your taste, and you know Python, then using the CDX Library may be
right for you.

The CDX Library uses an object-oriented approach to model the contents of a
CDX server. Objects represent CDX files and directories, and you call methods
on those objects to determine file attributes, directory contents, or retrieve
a file's contents.

The remainder of this document describes the modules, classes, and functions
that comprise the CDX Library. If you don't know Python, you may wish to skip
the rest.


The ``cdx`` Module
------------------

The ``cdx`` module is a namespace module. It provides no classes or functions.
Rather, it contains a single, nested module called ``client``.


The ``cdx.client`` Module
-------------------------

The ``cdx.client`` module contains nested modules that provide the CDX Library.
It also contains implementations of the ``cdxls`` and ``cdxget`` commands.


The ``cdx.client.cdxfile`` Module
---------------------------------

The ``cdx.client.cdxfile`` module is where all the action is. It contains
classes and functions for communicating with and modeling the contents of CDX
servers. It contains the following items:

``CDXDirectory``
Objects of this class represent directories on a CDX server. You can use
Python's iterator, length, and containment protocols to examine the
contents of the directory. They can also be sorted.
``CDXFile``
Objects of this class represent files on a CDX server. While you can
instantiate objects of this class, you'd typically instantiate a
CDXDirectory and examine its contents which will include CDXFile objects
for files in the directory and nested CDXDirectory objects for
subdirectories. A CDXFile object also provides a method to let you
retrieve its data.
``findFile``
The ``findFile`` function is a utility function that, given a starting
CDXDirectory and a path name, yields the matching CDXDirectory or CDXFile
on a CDX server.


CDXDirectory Objects
~~~~~~~~~~~~~~~~~~~~

``CDXDirectory`` objects represent directories in a CDX server. You can
create these objects directly or you can use the ``findFile`` method in the
``cdx.client.cdxfile`` module.

class CDXDirectory(*path*, *cdxURL* = None)
Create a ``CDXDirectory`` object with the given *path*. You can also
specify the URL to a CDX server to use by passing in a string for
*cdxURL*.
sort(*cmp* = cmp, *reverse* = False)
Return the contents of the directory, sorted, using the a comparison
function *cmp*, defaulting to Python's built-in ``cmp``. If *reverse* is
True, reverse the order of the sort. Comparison with *cmp* on ``CDXFile``
and ``CDXDirectory`` objects is by CDX server URL and by name. You can
pass in your own *cmp* that, for example, sorts by file size.
isFile()
Always returns False.
path
The path name of the directory.
name
The name of the directory; this is the last element of the path.
size
By convention sizes for directories are always zero.

``CDXDirectory`` objects obey Python's protocols for hashing, comparison,
containment testing, iteration, indexing, and length query. Containment
testing with directories with with ``CDXDirectory`` objects, ``CDXFile``
objects, or plain strings::

>>> from cdx.client.cdxfile import CDXDirectory
>>> root = CDXDirectory('/', 'http://localhost:8192/cdx/prod')
>>> len(root)
3
>>> subdir = root['2005']
>>> subdir
CDXDirectory(path=/2005)
>>> subdir in root
True
>>> '2005' in root
True
>>> subdir < root
False
>>> subdir > root
True
>>> for i in root:
... print i
...
/2008
/2007
/2005
>>> root.sort()
[CDXDirectory(path=/2005), CDXDirectory(path=/2007), CDXDirectory(path=/2008)]


``CDXFile`` Objects
~~~~~~~~~~~~~~~~~~~

TBD.


Changelog
*********

0.0.9 - 03/24/2010
==================

This release includes improvements to cdxsubset, specifically the ability
to print out the full numpy array returned from a DataWrapper. See
CDX-82 for specific details. Additionally cdxsubset has been updated to expose
the subset by LatLon functionality per CDX-84 and CDX-85. Subset by range query
allowing constraints to be specified was also included in this release (see
CDX-86 for more information).

For the issue tracker, see
http://oodt.jpl.nasa.gov/jira/browse/CDX.


0.0.8 - Inclusion of improvements to cdxcd, virtual roots and new tools
================================================================
=======

This release includes improvements to cdxcd to make it work nicely
with cdx virtual roots, and includes integration with the other cdx client
toolkit including cdxls, cdxsubset and cdxget. See CDX-70 and CDX-71 for
further details.

For the issue tracker, see
http://oodt.jpl.nasa.gov/jira/browse/CDX.


0.0.7 - Add Resource Files
==========================

Release 0.0.6 was mis-configured and didn't include some important resource
files. This emergency release includes them!


0.0.6 - Inclusion of cdxsubset and other tools, and some minor bug fixes
================================================================
========

This release includes the cdxsubset tool, as described in CDX-56.
This release also includes the cdxcd tool, as described in CDX-69.
This release also includes minor aestetic bug fixes that address pathing
issues in cdxls, e.g., CDX-29.

For the issue tracker, see
http://oodt.jpl.nasa.gov/jira/browse/CDX.


0.0.5 - Repaired Unit Tests
===========================

This release updates the unit tests and test data based on the changes in
0.0.4 and the new behavior of actual product servers. In addition, it
fixes some documentation problems (incorrect package name cdx-client
which should've been cdx.client) in the INSTALL.txt file.

The sole bug report addressed in this release is CDX-45, "Unit tests in
cdx-client failing". For the issue tracker, see
http://oodt.jpl.nasa.gov/jira/browse/CDX.


0.0.4 - Bugfix to 0.0.3 release
===============================

This is a bugfix release to 0.0.3, which includes some error checking to
deal with some data format inconsistencies on the OODT OFSN product server
end.

JIRA issues addressed (see http://oodt.jpl.nasa.gov/jira/browse/CDX):

* CDX-43 Directory structure shouldn't be preserved if cdxget is called
without the -r parameter
* CDX-42 cdxget -r fails to retrieve MLS data
* CDX-41 cdxls -R chokes if dir size not provided


0.0.3 - Directory caching
=========================

The major feature of this release is the ``cdx.client.dircache`` module which
enables local-disk caching of a subset of a remote CDX product server's
contents. It also introduces the concept of a ``cdx:`` scheme URL. Such a
URL has this form::

cdx://hostname[:port]/endpoint/prod/path/to/a/directory

where hostname is the name or IP address of a CDX product server, port is an
optional port number on which the server is listening, endpoint is the
WebGrid_ service identifier (typically just the string ``cdx``), prod is the
fixed keyword ``prod``, and ``path/to/a/directory`` is an absolute path to a
directory within that product server.

Such caching is intended to support the CCMValDiag_ software.


0.0.2 - Bug fix for cdxls
=========================

This release repairs a bug in cdxls that caused directories with only one item
in them to not be listed properly.


0.0.1 - URL specification
=========================

This release provides support for a (-u url, --url=url) pair of command-line
options that enable specification of a specific URL to use, falling back to
the URL specified in the CDX_SERVER environment variable (and, if that's
unset, then http://mlscdx.jpl.nasa.gov:8080/cdx/prod). This supports two
ideas suggested in CDX-16 (the first two, not the third with a cdx: style
URL).


0.0.0 - Initial
===============

This is an initial release of cdx-client supporting minimal ``cdxls`` and
``cdxget`` function.


.. References:
.. _WebGrid: http://agility.jpl.nasa.gov/products/agile-oodt/
.. _CCMValDiag: http://www.pa.op.dlr.de/CCMVal/CCMVal_DiagnosticTool.html

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page