Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None))
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None))

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.9.tar.gz (120.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

arcae-0.2.9-cp312-cp312-manylinux_2_28_x86_64.whl (33.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

arcae-0.2.9-cp312-cp312-manylinux_2_28_aarch64.whl (31.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

arcae-0.2.9-cp312-cp312-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

arcae-0.2.9-cp312-cp312-macosx_13_0_x86_64.whl (15.0 MB view details)

Uploaded CPython 3.12macOS 13.0+ x86-64

arcae-0.2.9-cp311-cp311-manylinux_2_28_x86_64.whl (33.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

arcae-0.2.9-cp311-cp311-manylinux_2_28_aarch64.whl (31.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

arcae-0.2.9-cp311-cp311-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

arcae-0.2.9-cp311-cp311-macosx_13_0_x86_64.whl (15.0 MB view details)

Uploaded CPython 3.11macOS 13.0+ x86-64

arcae-0.2.9-cp310-cp310-manylinux_2_28_x86_64.whl (33.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

arcae-0.2.9-cp310-cp310-manylinux_2_28_aarch64.whl (30.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

arcae-0.2.9-cp310-cp310-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

arcae-0.2.9-cp310-cp310-macosx_13_0_x86_64.whl (15.0 MB view details)

Uploaded CPython 3.10macOS 13.0+ x86-64

File details

Details for the file arcae-0.2.9.tar.gz.

File metadata

  • Download URL: arcae-0.2.9.tar.gz
  • Upload date:
  • Size: 120.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arcae-0.2.9.tar.gz
Algorithm Hash digest
SHA256 51ad1061e7edaf7a83373041972b84f1431e3a5f6c2b2a3b2a1e1686a4b9f2df
MD5 ec97612362b1f05956b7783e2cf47314
BLAKE2b-256 90ec29ed5cb7f002328bdc1c781cc99e537152384dd28c1d04b014fa2888fa04

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9.tar.gz:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7edbf7fd1fb672c8e452146ea6a847683ab247e6571d19622085b491364d5101
MD5 1fa8e25433d793b7407f029700b237a8
BLAKE2b-256 3661f19b84fe0181b0ca5ee7733ddeee6244527254eecbb3a29d24b5e347462c

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7d301a1252513cd84cb156c4cb00c652de800c5ab818f65c00fe4ce5e8a5ca13
MD5 f2479280f5375749468a20d8ddb90872
BLAKE2b-256 43bcc003c84c894128f51b4f960872f0bd05660f65d5ceef2fce58be51650d57

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp312-cp312-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 e6fbbe4f48c00887429c0caf78f05cf12b4abed15120c00c0354ea63a60611d8
MD5 dfc3ac826b5fb9403d16acdd0bc1035d
BLAKE2b-256 af76abed8d51a84f2608d7145e570d1076b50ec26caf6c7bffd375270723835a

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp312-cp312-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 b28cd9269a92971005c30fe51b73e1c1acf7ca8f841faa086a19e33e3e3c1b59
MD5 80c2e8832184bf2da28eb0836d155ff4
BLAKE2b-256 8dcf7e1d6314b881c65a1bf764610ee873d1a2345bf0240af4deb33f026a8686

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp312-cp312-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 38343951ddccada61e7f5f61451aec452e52189a544a0b7e9ac8ac5505f55851
MD5 275fba4e4a93ffeaff3ae1ce222a2aba
BLAKE2b-256 fdf38a91e0cfa04bf153df5739a77cbe0fa1e15fb4e5d59be3a2b3a3e6b34c27

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2c1db64e7526317952ab2e5267080cf84c63f8db8673dadf4f17315d77d6b021
MD5 afabbdd0580205cad0a321f7e67d34b8
BLAKE2b-256 9e8258622a211bc511c9f7edd5fa289e0ca069c4faf18db8ac5321d1b65c6d9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b342cbd02f383327eec200a74eee14d9a9fc541be3f54d08d223ad2c9ba6374e
MD5 ef50e5daeaa8625c41d4614fd8f8b3ee
BLAKE2b-256 d07379ca50e04ae047494de1748ec44e77fa110a9ac3f7e97bcb226dbcd92fbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp311-cp311-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 060b0a5c1826610a47bb62403d69e3d011e0f4278acd6f1bac2c265fe5337033
MD5 39ff3f83bd9ff4b5099787f7fb2eb0b6
BLAKE2b-256 fdaee16878ed4a8cf5bae0ec73e0d3509a25f0873f33f2c0e226c07634d5b3b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp311-cp311-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 586035348a4405e97af104a305a093cf2d6ee8eb2b52b9fdd282361c0865ca05
MD5 45bf290e65c2142ad56a748c5ff187ad
BLAKE2b-256 d4c0bc18e9baf0e9c452eb3c5a33aef3b229be96c36f36e71b4e6469e516f32b

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 521323825a56f297662aec7ca179fe9f6d0ba9174f7e07dd9ff3102296d208a7
MD5 4cacb07f172fcdf0920d5a36ad635e7d
BLAKE2b-256 ea6c747a5ec2a325c3046510be5d1f6c2cca307be8bfd6c34345fd41c996ecf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp310-cp310-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 e72c29f98c29253568700dfe7c3593216541db794e400ea00a49cdea08fa4b39
MD5 c70f1bf0cd1c12ac8d709a0792a9f138
BLAKE2b-256 c6966d018754ec00d74006c304f9a4c1c5e6e1a9251ca075944372dfcd3bf1ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.9-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.9-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 ef753670511b6b2c44f68bbe95322af8aa93b83d14a734bc0dcfeb745549b757
MD5 7c992a7e6c7a1b019b7f04cb556ed873
BLAKE2b-256 a66305c9d0ba6ce1e64a418e406a9da3caf1efa049225993bbd535dd3eb7887b

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.9-cp310-cp310-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page