Skip to main content

Parallel CLEAN imaging using Dask and CASA tools

Project description

pclean — Parallel CLEAN Imaging with Dask

tests codecov docs

pclean is a modular, Dask-accelerated radio-interferometric imaging package that wraps CASA's synthesis imaging C++ tools (casatools) to provide transparent parallelism for cube (channel-distributed) and continuum (row-distributed) imaging workflows.

Features

Feature Description
Cube parallelism Channels are distributed across Dask workers; each worker runs a complete imaging and deconvolution cycle on its sub-cube.
Continuum parallelism Visibility rows are partitioned across Dask workers for major-cycle gridding; minor cycles run on the gathered, normalized image.
tclean-compatible API Drop-in pclean() function accepting the same parameters as CASA tclean.
Hierarchical config Pydantic v2 YAML-based configuration with presets, layered merging, and CASA bridge methods.
CLI support Run imaging from the command line via python -m pclean.
SLURM clusters Native Dask-Jobqueue integration for HPC batch scheduling.
Modular internals Every building block — imager, deconvolver, normalizer, partitioner, cluster manager — is independently importable.
ADIOS2 support Convert MeasurementSet columns to Adios2StMan for I/O benchmarking. Requires the casatools openmpi variant from conda-forge.

Quick start

from pclean import pclean

# Parallel cube imaging (channels distributed across workers)
pclean(
    vis='my.ms',
    imagename='cube_out',
    specmode='cube',
    imsize=[512, 512],
    cell='1arcsec',
    niter=1000,
    deconvolver='hogbom',
    parallel=True,
    nworkers=8,
    cube_chunksize=1,       # one sub-cube per channel (max parallelism)
)

# Parallel continuum imaging (visibility rows chunked)
pclean(
    vis='my.ms',
    imagename='cont_out',
    specmode='mfs',
    imsize=[2048, 2048],
    cell='0.5arcsec',
    niter=5000,
    deconvolver='mtmfs',
    nterms=2,
    parallel=True,
    nworkers=4,
)

Command-line interface

python -m pclean --vis my.ms --imagename out --specmode cube \
    --imsize 512 512 --cell 1arcsec --niter 1000 \
    --parallel --nworkers 8

Additional parameters

Beyond the standard tclean parameters, pclean accepts:

Parameter Default Description
parallel False Enable Dask-distributed parallelism.
nworkers None Number of Dask workers. None defaults to the available CPU count.
scheduler_address None Address of an existing Dask scheduler; when set, no local cluster is created.
threads_per_worker 1 Threads per Dask worker. Kept at 1 because CASA tools are not thread-safe.
memory_limit '0' Per-worker memory cap. '0' disables Dask memory management, preventing CASA C++ allocations from being paused or killed.
local_directory None Scratch directory for Dask spill-to-disk.
cube_chunksize -1 Channels per sub-cube task. -1 assigns one sub-cube per worker; 1 assigns one per channel.
keep_subcubes False Retain intermediate sub-cube images after concatenation.
keep_partimages False Retain partial images after continuum gather.
concat_mode 'auto' Concatenation strategy: 'auto' (derive from keep_subcubes), 'paged' (physical copy), 'virtual' (reference catalog), 'movevirtual' (rename into output).

Architecture

pclean/
├── src/pclean/
│   ├── __init__.py                # Package init, exposes pclean()
│   ├── __main__.py                # CLI entry point (python -m pclean)
│   ├── pclean.py                  # Top-level tclean-like interface
│   ├── params.py                  # Parameter container & validation
│   ├── imaging/
│   │   ├── serial_imager.py       # Single-process imager (base engine)
│   │   ├── deconvolver.py         # Deconvolution wrapper
│   │   └── normalizer.py          # Image normalization (gather/scatter)
│   ├── parallel/
│   │   ├── cluster.py             # Dask cluster lifecycle management
│   │   ├── cube_parallel.py       # Channel-parallel cube imaging
│   │   ├── continuum_parallel.py  # Row-parallel continuum imaging
│   │   └── worker_tasks.py        # Serialisable functions for workers
│   └── utils/
│       ├── partition.py           # Data / image partitioning helpers
│       ├── image_concat.py        # Sub-cube image concatenation
│       ├── memory_estimate.py     # Worker RAM estimation heuristics
│       ├── check_adios2.py        # Adios2StMan availability check
│       └── convert_adios2.py      # MS → ADIOS2 conversion utility

Documentation

Full documentation is hosted at pclean.readthedocs.io.

Requirements

  • Python ≥ 3.10
  • casatools ≥ 6.5
  • dask + distributed
  • numpy
  • pydantic ≥ 2.0

Pixi environments

The project uses pixi for reproducible environment management. Four environments are defined in pyproject.toml:

Environment Features Description
default casa Runtime with casatools/casatasks from PyPI.
default-forge casa-forge Runtime with casatools/casatasks from conda-forge (includes the openmpi variant required for Adios2StMan).
dev casa, dev Runtime plus pytest, pytest-cov, and ruff.
test dev Linting and testing only (no casatools).

Common tasks are exposed as pixi scripts:

pixi run -e dev test          # pytest -v
pixi run -e dev test-cov      # pytest with coverage
pixi run -e dev lint          # ruff check
pixi run -e dev fmt           # ruff format

References and acknowledgements

pclean builds on the imaging and calibration infrastructure developed by the CASA team at NRAO / ESO / NAOJ. The scientific algorithms — gridding, deconvolution, self-calibration — are the product of decades of CASA development; pclean is purely a computing-engineering effort that re-orchestrates those mature tools with a modern distributed runtime.

If this package contributes to published research, please cite the CASA software:

CASA Team, Bean, B., Bhatnagar, S., et al. 2022, "CASA, the Common Astronomy Software Applications for Radio Astronomy," PASP, 134, 114501. doi:10.1088/1538-3873/ac9642

McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, "CASA Architecture and Applications," ASP Conf. Ser., 376, 127. ads:2007ASPC..376..127M

Relation to CASA's built-in parallel imaging

pclean's parallel design closely follows the Python orchestration layer that CASA's tclean task already provides through the casatasks.private.imagerhelpers module:

CASA Python class pclean equivalent role
PySynthesisImager SerialImager serial imaging loop (init → PSF → major/minor → restore)
PyParallelCubeSynthesisImager ParallelCubeImager each worker runs an independent SerialImager on a frequency sub-cube
PyParallelContSynthesisImager ParallelContinuumImager row-partitioned gridding across workers; minor cycles run serially on the coordinator
PyParallelImagerHelper DaskClusterManager cluster lifecycle, job dispatch, and result collection

The structural decomposition is the same: partition → image → normalize → deconvolve → iterate, with the same split between embarrassingly-parallel cube channels and gather/scatter continuum cycles. Both code-bases use polymorphic dispatch — task_tclean.py picks between PySynthesisImager, PyParallelCubeSynthesisImager, or PyParallelContSynthesisImager based on specmode and MPI availability; pclean makes the same choice based on its own parallel and is_cube flags.

The key difference is the parallelism transport. CASA's PyParallelImagerHelper sends Python code strings to MPI workers via casampi.MPIInterface, requiring mpicasa and a shared filesystem. pclean replaces this with Dask Distributed futures and actors, eliminating the MPI dependency in exchange for Dask scheduling overhead.

See also CASA Memo 13 (Sekhar, Rau & Xue 2024) for benchmarking of per-channel cube imaging distributed via SLURM job arrays that motivated this work (benchmarking scripts).

License

Copyright 2026 the pclean authors.

GPL-3.0-or-later — see LICENSE for details.

Disclaimer

This project is an independent, personal effort developed on the authors' own time. It is not affiliated with, endorsed by, or conducted as part of any employer's projects or responsibilities.

AI Disclosure

This project was developed with the assistance of AI coding agents (GitHub Copilot, Claude). The AI contributed to code generation, debugging, and documentation under human direction and review.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

casa_pclean-0.2.0.tar.gz (184.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

casa_pclean-0.2.0-py3-none-any.whl (86.2 kB view details)

Uploaded Python 3

File details

Details for the file casa_pclean-0.2.0.tar.gz.

File metadata

  • Download URL: casa_pclean-0.2.0.tar.gz
  • Upload date:
  • Size: 184.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for casa_pclean-0.2.0.tar.gz
Algorithm Hash digest
SHA256 756180ea4cb9e5770dd05eb153f8bdaa1622921910e837d636a1b7e29387cb8e
MD5 216e2b9e9ba27bcf92af5ef3fe129837
BLAKE2b-256 37e9d1086086f8795a5654c483c2c7ecd778596230feca1933e6e3e81f073fb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for casa_pclean-0.2.0.tar.gz:

Publisher: publish.yml on r-xue/pclean

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file casa_pclean-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: casa_pclean-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 86.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for casa_pclean-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2fcde90e695494de583b6816310915df80244749708c6b484c7aab388101483
MD5 ba14a619e5b6702920eea20f48ac19d8
BLAKE2b-256 01a02ce49354ec7a1f8080ba5894ab9d0393ff65fbf4d3d06a911ef9d9bf9fdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for casa_pclean-0.2.0-py3-none-any.whl:

Publisher: publish.yml on r-xue/pclean

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page