Parallel CLEAN imaging using Dask and CASA tools
Project description
pclean — Parallel CLEAN Imaging with Dask
pclean is a modular, Dask-accelerated radio-interferometric imaging package
that wraps CASA's synthesis imaging C++ tools (casatools) to provide
transparent parallelism for cube (channel-distributed) and continuum
(row-distributed) imaging workflows.
Features
| Feature | Description |
|---|---|
| Cube parallelism | Channels are distributed across Dask workers; each worker runs a complete imaging and deconvolution cycle on its sub-cube. |
| Continuum parallelism | Visibility rows are partitioned across Dask workers for major-cycle gridding; minor cycles run on the gathered, normalized image. |
| tclean-compatible API | Drop-in pclean() function accepting the same parameters as CASA tclean. |
| Hierarchical config | Pydantic v2 YAML-based configuration with presets, layered merging, and CASA bridge methods. |
| CLI support | Run imaging from the command line via python -m pclean. |
| SLURM clusters | Native Dask-Jobqueue integration for HPC batch scheduling. |
| Modular internals | Every building block — imager, deconvolver, normalizer, partitioner, cluster manager — is independently importable. |
| ADIOS2 support | Convert MeasurementSet columns to Adios2StMan for I/O benchmarking. Requires the casatools openmpi variant from conda-forge. |
Quick start
from pclean import pclean
# Parallel cube imaging (channels distributed across workers)
pclean(
vis='my.ms',
imagename='cube_out',
specmode='cube',
imsize=[512, 512],
cell='1arcsec',
niter=1000,
deconvolver='hogbom',
parallel=True,
nworkers=8,
cube_chunksize=1, # one sub-cube per channel (max parallelism)
)
# Parallel continuum imaging (visibility rows chunked)
pclean(
vis='my.ms',
imagename='cont_out',
specmode='mfs',
imsize=[2048, 2048],
cell='0.5arcsec',
niter=5000,
deconvolver='mtmfs',
nterms=2,
parallel=True,
nworkers=4,
)
Command-line interface
python -m pclean --vis my.ms --imagename out --specmode cube \
--imsize 512 512 --cell 1arcsec --niter 1000 \
--parallel --nworkers 8
Additional parameters
Beyond the standard tclean parameters, pclean accepts:
| Parameter | Default | Description |
|---|---|---|
parallel |
False |
Enable Dask-distributed parallelism. |
nworkers |
None |
Number of Dask workers. None defaults to the available CPU count. |
scheduler_address |
None |
Address of an existing Dask scheduler; when set, no local cluster is created. |
threads_per_worker |
1 |
Threads per Dask worker. Kept at 1 because CASA tools are not thread-safe. |
memory_limit |
'0' |
Per-worker memory cap. '0' disables Dask memory management, preventing CASA C++ allocations from being paused or killed. |
local_directory |
None |
Scratch directory for Dask spill-to-disk. |
cube_chunksize |
-1 |
Channels per sub-cube task. -1 assigns one sub-cube per worker; 1 assigns one per channel. |
keep_subcubes |
False |
Retain intermediate sub-cube images after concatenation. |
keep_partimages |
False |
Retain partial images after continuum gather. |
concat_mode |
'auto' |
Concatenation strategy: 'auto' (derive from keep_subcubes), 'paged' (physical copy), 'virtual' (reference catalog), 'movevirtual' (rename into output). |
Architecture
pclean/
├── src/pclean/
│ ├── __init__.py # Package init, exposes pclean()
│ ├── __main__.py # CLI entry point (python -m pclean)
│ ├── pclean.py # Top-level tclean-like interface
│ ├── params.py # Parameter container & validation
│ ├── imaging/
│ │ ├── serial_imager.py # Single-process imager (base engine)
│ │ ├── deconvolver.py # Deconvolution wrapper
│ │ └── normalizer.py # Image normalization (gather/scatter)
│ ├── parallel/
│ │ ├── cluster.py # Dask cluster lifecycle management
│ │ ├── cube_parallel.py # Channel-parallel cube imaging
│ │ ├── continuum_parallel.py # Row-parallel continuum imaging
│ │ └── worker_tasks.py # Serialisable functions for workers
│ └── utils/
│ ├── partition.py # Data / image partitioning helpers
│ ├── image_concat.py # Sub-cube image concatenation
│ ├── memory_estimate.py # Worker RAM estimation heuristics
│ ├── check_adios2.py # Adios2StMan availability check
│ └── convert_adios2.py # MS → ADIOS2 conversion utility
Documentation
Full documentation is hosted at pclean.readthedocs.io.
Requirements
- Python ≥ 3.10
casatools≥ 6.5dask+distributednumpypydantic≥ 2.0
Pixi environments
The project uses pixi for reproducible environment
management. Four environments are defined in pyproject.toml:
| Environment | Features | Description |
|---|---|---|
default |
casa |
Runtime with casatools/casatasks from PyPI. |
default-forge |
casa-forge |
Runtime with casatools/casatasks from conda-forge (includes the openmpi variant required for Adios2StMan). |
dev |
casa, dev |
Runtime plus pytest, pytest-cov, and ruff. |
test |
dev |
Linting and testing only (no casatools). |
Common tasks are exposed as pixi scripts:
pixi run -e dev test # pytest -v
pixi run -e dev test-cov # pytest with coverage
pixi run -e dev lint # ruff check
pixi run -e dev fmt # ruff format
References and acknowledgements
pclean builds on the imaging and calibration infrastructure developed by
the CASA team at NRAO / ESO / NAOJ. The scientific algorithms — gridding,
deconvolution, self-calibration — are the product of decades of CASA
development; pclean is purely a computing-engineering effort that
re-orchestrates those mature tools with a modern distributed runtime.
If this package contributes to published research, please cite the CASA software:
CASA Team, Bean, B., Bhatnagar, S., et al. 2022, "CASA, the Common Astronomy Software Applications for Radio Astronomy," PASP, 134, 114501. doi:10.1088/1538-3873/ac9642
McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, "CASA Architecture and Applications," ASP Conf. Ser., 376, 127. ads:2007ASPC..376..127M
Relation to CASA's built-in parallel imaging
pclean's parallel design closely follows the Python orchestration layer that
CASA's tclean task already provides through the
casatasks.private.imagerhelpers module:
| CASA Python class | pclean equivalent | role |
|---|---|---|
PySynthesisImager |
SerialImager |
serial imaging loop (init → PSF → major/minor → restore) |
PyParallelCubeSynthesisImager |
ParallelCubeImager |
each worker runs an independent SerialImager on a frequency sub-cube |
PyParallelContSynthesisImager |
ParallelContinuumImager |
row-partitioned gridding across workers; minor cycles run serially on the coordinator |
PyParallelImagerHelper |
DaskClusterManager |
cluster lifecycle, job dispatch, and result collection |
The structural decomposition is the same: partition → image → normalize →
deconvolve → iterate, with the same split between embarrassingly-parallel cube
channels and gather/scatter continuum cycles. Both code-bases use polymorphic
dispatch — task_tclean.py picks between PySynthesisImager,
PyParallelCubeSynthesisImager, or PyParallelContSynthesisImager based on
specmode and MPI availability; pclean makes the same choice based on its
own parallel and is_cube flags.
The key difference is the parallelism transport. CASA's
PyParallelImagerHelper sends Python code strings to MPI workers via
casampi.MPIInterface, requiring mpicasa and a
shared filesystem. pclean replaces this with
Dask Distributed futures and actors,
eliminating the MPI dependency in exchange for Dask scheduling overhead.
See also CASA Memo 13 (Sekhar, Rau & Xue 2024) for benchmarking of per-channel cube imaging distributed via SLURM job arrays that motivated this work (benchmarking scripts).
License
Copyright 2026 the pclean authors.
GPL-3.0-or-later — see LICENSE for details.
Disclaimer
This project is an independent, personal effort developed on the authors' own time. It is not affiliated with, endorsed by, or conducted as part of any employer's projects or responsibilities.
AI Disclosure
This project was developed with the assistance of AI coding agents (GitHub Copilot, Claude). The AI contributed to code generation, debugging, and documentation under human direction and review.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file casa_pclean-0.2.3.tar.gz.
File metadata
- Download URL: casa_pclean-0.2.3.tar.gz
- Upload date:
- Size: 504.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ea7b6fa85d66ff0bb9e14a4acee0d0318aebaa67b49976950f87c1976a2455f
|
|
| MD5 |
4c520c923d9bd46b2cd75692259b4ba8
|
|
| BLAKE2b-256 |
456c237637a7dcca6a8700a0f9034f3367bb9ebc4aa0f43e2d5b7dd90ca1dc8b
|
Provenance
The following attestation bundles were made for casa_pclean-0.2.3.tar.gz:
Publisher:
publish.yml on r-xue/pclean
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
casa_pclean-0.2.3.tar.gz -
Subject digest:
8ea7b6fa85d66ff0bb9e14a4acee0d0318aebaa67b49976950f87c1976a2455f - Sigstore transparency entry: 1219712334
- Sigstore integration time:
-
Permalink:
r-xue/pclean@ac2fcc2cdc81be08e1c17404f9f0388c1eb476f5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/r-xue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ac2fcc2cdc81be08e1c17404f9f0388c1eb476f5 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file casa_pclean-0.2.3-py3-none-any.whl.
File metadata
- Download URL: casa_pclean-0.2.3-py3-none-any.whl
- Upload date:
- Size: 107.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1541f40ed26092b7fba253f49ffe5973d68893f64a4ede8cb5af83af09fc995d
|
|
| MD5 |
498cd7213600b287f37cd3491c9d55c0
|
|
| BLAKE2b-256 |
2a9ee3d8b18585ab4fed9a21e45210536da1673c9147add2e73d2507a340bc03
|
Provenance
The following attestation bundles were made for casa_pclean-0.2.3-py3-none-any.whl:
Publisher:
publish.yml on r-xue/pclean
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
casa_pclean-0.2.3-py3-none-any.whl -
Subject digest:
1541f40ed26092b7fba253f49ffe5973d68893f64a4ede8cb5af83af09fc995d - Sigstore transparency entry: 1219712346
- Sigstore integration time:
-
Permalink:
r-xue/pclean@ac2fcc2cdc81be08e1c17404f9f0388c1eb476f5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/r-xue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ac2fcc2cdc81be08e1c17404f9f0388c1eb476f5 -
Trigger Event:
workflow_dispatch
-
Statement type: