numpy array with labelled dimensions and axes, dimension, NaN handling and netCDF I/O
Project description
Numpy array with dimensions
dimarray is a package to handle numpy arrays with labelled dimensions and axes. Inspired from pandas, it includes advanced alignment and reshaping features and as well as missing-value (NaN) handling.
The main difference with pandas is that it is generalized to N dimensions, and behaves more closely to a numpy array. The axes do not have fixed names (‘index’, ‘columns’, etc…) but are given a meaningful name by the user (e.g. ‘time’, ‘items’, ‘lon’ …). This is especially useful for high dimensional problems such as sensitivity analyses.
A natural I/O format for such an array is netCDF, common in geophysics, which relies on the netCDF4 package, and supports metadata.
License
dimarray is distributed under a 3-clause (“Simplified” or “New”) BSD license. Parts of basemap which have BSD compatible licenses are included. See the LICENSE file, which is distributed with the dimarray package, for details.
Getting started
A DimArray can be defined just like a numpy array, with additional information about its dimensions, which can be provided via its axes and dims parameters:
>>> from dimarray import DimArray >>> a = DimArray([[1.,2,3], [4,5,6]], axes=[['a', 'b'], [1950, 1960, 1970]], dims=['variable', 'time']) >>> a dimarray: 6 non-null elements (0 null) 0 / variable (2): a to b 1 / time (3): 1950 to 1970 array([[ 1., 2., 3.], [ 4., 5., 6.]])
Indexing now works on axes
>>> a['b', 1970] 6.0
Or can just be done a la numpy, via integer index:
>>> a.ix[0, -1] 3.0
Basic numpy transformations are also in there:
>>> a.mean(axis='time') dimarray: 2 non-null elements (0 null) 0 / variable (2): a to b array([ 2., 5.])
Can export to pandas for pretty printing:
>>> a.to_pandas() time 1950 1960 1970 variable a 1 2 3 b 4 5 6
Useful links
Documentation |
|
Code on github (bleeding edge) |
|
Code on pypi (releases) |
|
Mailing List |
|
Issues Tracker |
Install
Requirements:
python 2.7
numpy 1.7
Optional:
netCDF4 1.0.8 (netCDF archiving) (see notes below)
matplotlib 1.1 (plotting)
pandas 0.11 (interface with pandas, plotting)
Download the latest version from github and extract from archive Then from the dimarray repository type (possibly preceded by sudo):
python setup.py install
Alternatively, you can use pip to download and install the version from pypi (could be slightly out-of-date):
pip install dimarray
Notes on installing netCDF4
Installing the netCDF4 python module from source can be cumbersome, because it depends on netCDF4 and (especially) HDF5 C libraries that need to be compiled with specific flags (http://unidata.github.io/netcdf4-python).
For windows binaries are available, which is handy. On Ubuntu, I tried anaconda and it worked well (Enthought and xyPython might work as well). Download anaconda (full version) (http://continuum.io/downloads) or miniconda executable (http://conda.pydata.org/miniconda.html). This should make the conda command available. Then just do:
conda install netCDF4
The drawback is that everything then needs to happen within the anaconda/miniconda folder. I was not successful in using conda with a simple pip install conda and conda init.
Contributions
All suggestions for improvement or direct contributions are very welcome. You can ask a question or start a discussion on the mailing list or open an issue on github for precise requests. See links.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.