xarray MSv4 views over MSv2 Measurement Sets
Project description
xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.
>>> import xarray_ms
>>> from xarray.backends.api import datatree
>>> dt = open_datatree("/data/L795830_SB001_uv.MS/",
chunks={"time": 2000, "baseline": 1000})
>>> dt
<xarray.DataTree>
Group: /
└── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0
│ Dimensions: (time: 28760, baseline: 2775, frequency: 16,
│ polarization: 4, uvw_label: 3)
│ Coordinates:
│ antenna1_name (baseline) object 22kB ...
│ antenna2_name (baseline) object 22kB ...
│ baseline_id (baseline) int64 22kB ...
│ * frequency (frequency) float64 128B 1.202e+08 ... 1.204e+08
│ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ * time (time) float64 230kB 1.601e+09 ... 1.601e+09
│ Dimensions without coordinates: baseline, uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline) float64 638MB ...
│ FLAG (time, baseline, frequency, polarization) uint8 5GB ...
│ TIME_CENTROID (time, baseline) float64 638MB ...
│ UVW (time, baseline, uvw_label) float64 2GB ...
│ VISIBILITY (time, baseline, frequency, polarization) complex64 41GB ...
│ WEIGHT (time, baseline, frequency, polarization) float32 20GB ...
│ Attributes:
│ version: 0.0.1
│ creation_date: 2024-09-18T10:49:55.133908+00:00
│ data_description_id: 0
└── Group: /DATA_DESC_ID=0,FIELD_ID=0,OBSERVATION_ID=0/ANTENNA
Dimensions: (antenna_name: 74,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
baseline_antenna1_name (baseline) object 22kB ...
baseline_antenna2_name (baseline) object 22kB ...
baseline_id (baseline) int64 22kB ...
* frequency (frequency) float64 128B 1.202e+08 1.202e+08 ... 1.204e+08
* polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
* time (time) float64 230kB 1.601e+09 1.601e+09 ... 1.601e+09
* antenna_name (antenna_name) object 592B 'CS001HBA0' ... 'IE613HBA'
mount (antenna_name) object 592B 'X-Y' 'X-Y' ... 'X-Y' 'X-Y'
station (antenna_name) object 592B 'LOFAR' 'LOFAR' ... 'LOFAR'
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 2kB ...
Measurement Set v4
NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:
xarray is used to define the specification.
MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.
xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.
xradio
casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.
Why xarray-ms?
By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.
xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:
xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.
Similarly xarray’s lazy loading mechanism dispatches through the backend.
Automatic access to any chunked array types supported by xarray including, but not limited to dask.
Arbitrary chunking along any xarray dimension.
xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.
Some limited support for irregular MSv2 data via padding.
Work in Progress
The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.
However, the most important parts of the MSv2 MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for xarray_ms-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 818f241d02c98305552de34f738374f9032e4c1a136343a9e6a9ffe5c483632d |
|
MD5 | 2ae79a50bcdcaa11c9e6c1bd88213eae |
|
BLAKE2b-256 | ec024de1814dd9c373d09e6b9f06772fb12a90cb842ec819937896fabd40c506 |