Skip to main content

Arrow bindings for casacore

Project description

arcae implements a limited subset of functionality from the more mature python-casacore package. It bypasses some existing limitations in python-casacore to provide safe, multi-threaded access to CASA formats, thereby enabling export into newer cloud native formats such as Apache Arrow and Zarr.

Rationale

casacore and the python-casacore Python bindings provide access to the CASA Table Data System (CTDS) and Measurement Sets created within this system. The CTDS, as of casacore 3.5.0 is subject to the following limitations:

Resolving these concerns is potentially a major effort, involving invasive changes across the CTDS system.

In the time since the CTDS was developed, newer, open-source formats such as Apache Arrow and Zarr have been developed that are suitable for representing Radio Astronomy data.

  • The Apache Arrow project defines a programming language portable in-memory columnar storage format.

  • Translating CTDS data to Arrow is relatively simple, with some limitations mentioned below.

  • It’s easy to convert Arrow Tables between many different languages

  • Once in Apache Arrow format, it is easy to store data in modern, cloud-native disk formats such as parquet and Zarr.

  • Converting CASA Tables to Arrow in the C++ layer avoids the GIL

  • Access to non thread-safe CASA Tables is constrained to a ThreadPool containing a single thread

  • It also allows us to write astrometry routines in C++, potentially side-stepping thread-safety and GIL issues with the CASA Measures server.

Limitations

Arrow supports both 1D arrays and nested structures:

  1. Fixed shape multi-dimensional data (i.e. visibility data) is currently represented as nested FixedSizeListArrays .

  2. Variably-shaped multi-dimensional (i.e. subtable data) is currently represented as nested ListArrays.

  3. Complex values are represented as an extra FixedSizeListArray nesting of two floats.

  4. Currently, it is not trivially trivial (repetition intended here) to convert between the above and numpy via to_numpy calls on Arrow Arrays, but it is relatively trivial to reinterpret the underlying data buffers from either API. This is done transparently in getcol and putcol functions (see usage below).

Going forward, FixedShapeTensorArray and VariableShapeTensorArray will provide more ergonomic structures for representing multi-dimensional data. First class support for complex values in Apache Arrow will require implementing a C++ extension type within Arrow itself:

Some other edge cases have not yet been implemented, but could be with some thought.

  • Columns with unconstrained rank (ndim == -1) whose rows, in practice, have differing dimensions. Unconstrained rank columns whose rows actually have the same rank are catered for.

  • Not yet able to handle TpRecord columns. Probably simplest to convert these rows to json and store as a string.

  • Not yet able to handle TpQuantity columns. Possible to represent as a run-time parametric Arrow DataType.

Installation

Binary wheels are providing for Linux and MacOSX for both x86_64 and arm64 architectures

$ pip install arcae

Usage

Example usage with Arrow Tables:

import json
from pprint import pprint

import arcae
import pyarrow as pa
import pyarrow.parquet as pq

# Obtain (partial) Apache Arrow Table from a CASA Table
casa_table = arcae.table("/path/to/measurementset.ms")
arrow_table = casa_table.to_arrow()        # read entire table
arrow_table = casa_table.to_arrow(index=(slice(10, 20),)
assert isinstance(arrow_table, pa.Table)

# Print JSON-encoded Table and Column keywords
pprint(json.loads(arrow_table.schema.metadata[b"__arcae_metadata__"]))
pprint(json.loads(arrow_table.schema.field("DATA").metadata[b"__arcae_metadata__"]))

pq.write_table(arrow_table, "measurementset.parquet")

Some reading and writing functionality from python-casacore is replicated, with added support for some NumPy Advanced Indexing.

casa_table = arcae.table("/path/to/measurementset.ms", readonly=False)
# Get rows 10 and 2, and channels 16 to 32, and all correlations
data = casa_table.getcol("DATA", index=([10, 2], slice(16, 32), None)
# Write some modified data back
casa_table.putcol("DATA", data + 1*1j, index=([10, 2], slice(16, 32), None)

See the test cases for further use cases.

Exporting Measurement Sets to Arrow Parquet Datasets

Install the applications optional extra.

pip install arcae[applications]

Then, an export script is available:

$ arcae export /path/to/the.ms --nrow 50000
$ tree output.arrow/
output.arrow/
├── ANTENNA
   └── data0.parquet
├── DATA_DESCRIPTION
   └── data0.parquet
├── FEED
   └── data0.parquet
├── FIELD
   └── data0.parquet
├── MAIN
   └── FIELD_ID=0
       └── PROCESSOR_ID=0
           ├── DATA_DESC_ID=0
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=1
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           ├── DATA_DESC_ID=2
              ├── data0.parquet
              ├── data1.parquet
              ├── data2.parquet
              └── data3.parquet
           └── DATA_DESC_ID=3
               ├── data0.parquet
               ├── data1.parquet
               ├── data2.parquet
               └── data3.parquet
├── OBSERVATION
   └── data0.parquet

This data can be loaded into an Arrow Dataset:

>>> import pyarrow as pa
>>> import pyarrow.dataset as pad
>>> main_ds = pad.dataset("output.arrow/MAIN")
>>> spw_ds = pad.dataset("output.arrow/SPECTRAL_WINDOW")

Etymology

Noun: arca f (genitive arcae); first declension A chest, box, coffer, safe (safe place for storing items, or anything of a similar shape)

Pronounced: ar-ki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcae-0.2.8.tar.gz (120.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

arcae-0.2.8-cp312-cp312-manylinux_2_28_x86_64.whl (33.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

arcae-0.2.8-cp312-cp312-manylinux_2_28_aarch64.whl (30.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

arcae-0.2.8-cp312-cp312-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

arcae-0.2.8-cp312-cp312-macosx_13_0_x86_64.whl (14.9 MB view details)

Uploaded CPython 3.12macOS 13.0+ x86-64

arcae-0.2.8-cp311-cp311-manylinux_2_28_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

arcae-0.2.8-cp311-cp311-manylinux_2_28_aarch64.whl (30.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

arcae-0.2.8-cp311-cp311-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

arcae-0.2.8-cp311-cp311-macosx_13_0_x86_64.whl (14.9 MB view details)

Uploaded CPython 3.11macOS 13.0+ x86-64

arcae-0.2.8-cp310-cp310-manylinux_2_28_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

arcae-0.2.8-cp310-cp310-manylinux_2_28_aarch64.whl (30.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

arcae-0.2.8-cp310-cp310-macosx_14_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

arcae-0.2.8-cp310-cp310-macosx_13_0_x86_64.whl (14.9 MB view details)

Uploaded CPython 3.10macOS 13.0+ x86-64

File details

Details for the file arcae-0.2.8.tar.gz.

File metadata

  • Download URL: arcae-0.2.8.tar.gz
  • Upload date:
  • Size: 120.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arcae-0.2.8.tar.gz
Algorithm Hash digest
SHA256 33832f34eaa156df105dd087e3894c549006061ed9bd5514e302f3a918492e3b
MD5 a63457555abbbd9e6a9237a4493f87fb
BLAKE2b-256 a48199f03c6439e6048eff3bd023e0e6407b280ec8514352333ffb45563f43d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8.tar.gz:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 435badd3f262890592b0ff7109b79df5fcf25ab54aaf211e99594c5aa427959c
MD5 6c97bc3645d11aa9a57b9571e1fdffbe
BLAKE2b-256 6c8b430d7cc99cff551dca2aa713bb8e583e69b3b46723692a7862ed0a8dc9d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b4ed8b01f33cd9565a66589608ac346200ec80f4df891f03ebe8b9ce3492532f
MD5 337d3863347010b9e83fb499fbc5af83
BLAKE2b-256 442f93dfaf8b5b77823df776c0e8ecb97764fa7dbb03dc3675c18f815e861b3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp312-cp312-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 c545edb1b66c9f8af4098c1cc102db85326c7beeb1b9ced34ac7df478d587c99
MD5 2f7884fa17b94d7786b5d0d523a8b96f
BLAKE2b-256 d8bb388b3297717df5298f254735690e959689995a48e13b9e8e91ba451f83ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp312-cp312-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 6700d73ff791a430fd5c3af2acc54946622e2862afac298b8f5be7b753c095b4
MD5 ab5b25085bd436b34daaa11e74f608e1
BLAKE2b-256 979d32a5e726adccba98dded0d8eaf030e7f501a5880079c2745936d87d14852

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp312-cp312-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4437a8f34ece97457736d14b17b1c099f2e1a967a9addd69da22436906055fd8
MD5 fc319d7ab740bc93c62dc6801aaaf715
BLAKE2b-256 7de8915994c960fcb9756085d8ff14c4d0142ba4943b59ca1fb7b3f8645943b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4770cac999a90752f78a1850d94708a1821cd3683f13677d9e396b5fd906d9c4
MD5 e9765c9e0c2f18377fba94b0fd26a4a9
BLAKE2b-256 aef693480fac410d87c35e7683e26fd7289369470b785b6078d7b91b9149ab6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b88197793756713d80502a124b7196f625d7926c51ef186fd0be1c44c2650720
MD5 70b29fdbb301a63d1dbd49938408b569
BLAKE2b-256 b2ff84cbbd8145a6f85f5cf9282ce3afd3114e35e60fab153da7ade571a0cdef

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp311-cp311-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 ec200e2ea889f257308d34e03dfef4c92a92d64d7ae545cce719e12e080d5030
MD5 ac16a03fbec9aba89924793544f7678a
BLAKE2b-256 110fbfcf73ec2a0adee4c7d5cad42d9ae75baf4e170a197ca4a4bb52d046c0ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp311-cp311-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a7350a72770dac133513153bcec0dcf8fba8f4cb6bffb6318c96b54bb3292a9b
MD5 a6ed2b4cfe3b2ee680c7e0b11d9915ff
BLAKE2b-256 ae6578d0f0712185e959b0e132e0b16f5010fe9b42900a56ae5498812f3018c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2259e6aa36a77a91c40c9e53ba6d8e5e4468155f075db74c1ffe8f403f970582
MD5 ec8e6440835a6267dfbc7e1a407dd833
BLAKE2b-256 cc8d7c64d6dbd3879e54f56bb428b0bcd22849540e7685125d25ca055a8c33df

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp310-cp310-manylinux_2_28_aarch64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 63b6170d47b64ca881a6b0580c4d8fa1eb757563b99d77c698a87ec6bc5421cd
MD5 68d7f548d18707c01f78314b377066a3
BLAKE2b-256 3434f52ef2e14111970b77b832904adfa21fc73e56ab672d78978ae5da8e5aa7

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcae-0.2.8-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for arcae-0.2.8-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 0c0dbdb3dbbab98e86c53020b6b55fcdc44c6f436960e5b6cdfda2a2d3a930d4
MD5 8db3fcdcc96dac8aaa17a599e1506134
BLAKE2b-256 b8a1d4145ef662757f3943b04cdfa36b53fcffd515388cf9c02a10e13fee28ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcae-0.2.8-cp310-cp310-macosx_13_0_x86_64.whl:

Publisher: ci.yml on ratt-ru/arcae

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page