Skip to main content

A simple python library to apply NcML logic to NetCDF files

Project description

# pyncml [![Build Status](https://travis-ci.org/axiom-data-science/pyncml.svg?branch=master)](https://travis-ci.org/axiom-data-science/pyncml)

#### A simple python library to apply NcML logic to NetCDF files


## Installation

##### Stable

pip install pyncml

##### Development

pip install git+https://github.com/axiom-data-science/pyncml.git

## Supported

* Adding things
* Attributes: `<attribute name="some_new_attribute" type="string" value="some_standard_name" />`

* Renaming things
* Variables: `<variable name="new_var" orgName="old_var" />`
* Attributes: `<attribute name="new_attr" orgName="old_attr" />`
* Dimensions: `<dimension name="new_dim" orgName="old_dim" />`

* Removing things
* Variables: `<remove name="some_variable" type="variable" />`
* Attributes: `<remove name="some_variable" type="variable" />`

* Aggregating things
* Scans: `<scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />`

## Not supported

* Adding variables (could be implemented in the future)
* Groups (could be implemented in the future)
* Setting actual data values on variables (could be implemented in the future)
* Creating a file from scratch (could be implemented in the future)
* Removing Dimensions (not implemented in the C library)
* Aggregation scans that utilize the `dateFormatMark` attribute (most likely will never be implemented)

## Usage

### Apply

The `apply` function takes in a path to the `input_file` NetCDF file, an `ncml` object (string, file path, or python etree object), and an optional `output_file`. **If an output_file is not specified, the `input_file` will be edited in place**. The object returned from the `apply` function is a netcdf4-python object, ready to be used.

Any `location` attributes in the NcML are **ignored** and the NcML is applied against the file specified as the `input_file`.

##### Editing a file in place
```python
netcdf = '/some/file/path/in.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
```

##### Using an NcML file
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

##### Using an NcML string
```python
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = """<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
<attribute name="new_history" orgName="history" />
<attribute name="new_file_format" orgName="file_format" value="New Format" />
<remove name="source" type="attribute" />
</netcdf>
"""
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

##### Using an `etree` object
```python
import pyncml
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
</netcdf>
""")
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
```

### Scan

The `scan` function takes in a path to an `ncml` object (string, file path, or python etree object). The object returned from the `scan` function is a metadata object describing the scan aggregation **it is not a netcdf4-python object of the aggregation**. You can create a `netcdf4-python` object from the scan aggregation (example below).


##### Obtaining aggregation metadata
```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)

print agg.starting
2014-06-20 00:00:00+00:00

print agg.ending
2014-07-19 23:00:00+00:00

print agg.timevar_name
u'time'

print agg.standard_names
[
u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'
]

print agg.members # These are already sorted by the 'starting' date
[
{
'starting': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/first_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
{
'starting': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/second_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
...
]
```


##### Creating `netcdf4-python` Aggregation object

<sup>**Note: This will not work with aggregations whose members overlap in time!**</sup>

```python
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)

print time
<class 'netCDF4._Variable'>
float64 time('time',)
long_name: date time
units: hours since 1970-01-01 00:00:00
_CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)

print time[:]
[ 389784. 389785. 389786. 389787. 389788. 389789. 389790. 389791.
389792. 389793. 390500. 390501. 390502. 390503.]

print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
datetime.datetime(2014, 7, 19, 20, 0)
datetime.datetime(2014, 7, 19, 21, 0)
datetime.datetime(2014, 7, 19, 22, 0)
datetime.datetime(2014, 7, 19, 23, 0)]
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyncml-0.0.7.tar.gz (9.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page