Source classification using supervised and self-supervised learning
Project description
sclassifier
This python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way:
- convolutional neural networks (CNNs)
- convolutional autoencoders (CAEs)
- decision trees & LightGBM
- HDBSCAN clustering algorithm
- UMAP dimensionality reduction
- SimCLR & BYOL self-supervised frameworks
Status
This software is under development. It requires python3 + tensorflow 2.x.
Credit
This software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers.
Installation
To build and install the package:
- Clone this repository in a local directory (e.g. $SRC_DIR):
git clone https://github.com/SKA-INAF/sclassifier.git
- Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR):
python3.6 -m venv $INSTALL_DIR
- Activate your virtual environment:
source $INSTALL_DIR/bin/activate
- Install module dependencies listed in
requirements.txt
:
pip install -r requirements.txt
- Build and install package:
python setup build
python setup install
- If required (e.g. outside virtual env), add installation path to your
PYTHONPATH
environment variable:
export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages
Usage
Several python scripts are provided in the scripts
directory to run desired tasks, described below.
Image supervised classification with CNNs
The script run_classifier_nn.py
allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: python run_classifier_nn.py --help
.
Input data (train/validation) must be given in json format with the following structure:
{
"data": [
{
"filepaths": [
"G340.743+00.313_ch1.fits",
"G340.743+00.313_ch2.fits",
"G340.743+00.313_ch3.fits"
],
"sname": "G340.743+00.313",
"id": 6,
"label": "HII"
},
...
...
]
}
For multilabel classification the id
and label
keys must be lists.
Two run modes are supported: training, inference. To perform inference you need to specify the --predict
option. To perform binary or multi-class classification you must specify the options --binary_class
and --multilabel
, respectively.
To customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options:
--nclasses=$NCLASSES
--classid_remap=$CLASSID_REMAP
--target_label_map=$TARGET_LABEL_MAP
--classid_label_map=$CLASSID_LABEL_MAP
--target_names=$TARGET_NAMES
For example:
NCLASSES=4
CLASS_PROBS='{"BACKGROUND":1.0,"COMPACT":0.1,"EXTENDED":1.0,"DIFFUSE":1.0}'
CLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}'
TARGET_LABEL_MAP='{-1:"UNKNOWN",0:"BACKGROUND",1:"COMPACT",2:"EXTENDED",3:"DIFFUSE"}'
CLASSID_LABEL_MAP='{0:"UNKNOWN",1:"BACKGROUND",2:"COMPACT",3:"EXTENDED",4:"DIFFUSE"}'
TARGET_NAMES="BACKGROUND,COMPACT,EXTENDED,DIFFUSE"
Below we report some run examples:
-
To train a custom model (2 conv layers + 1 dense layer) from scratch:
python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \ --nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \ --add_dense_layer --dense_layer_sizes=16 \ --add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \ --batch_size=64 --optimizer=adam --learning_rate=1e-4 \ --augment --augmenter=cnn --augment_scale_factor=5 \ --resize_size=64 --scale_to_abs_max
-
To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE):
python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \ --use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE
-
To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE):
python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \ --modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \ --predict
Image feature extraction with CAE
WRITE ME
Image feature extraction with SimCLR
WRITE ME
Feature reduction with UMAP
WRITE ME
Clustering feature data with HDBSCAN
WRITE ME
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sclassifier-1.0.7.tar.gz
.
File metadata
- Download URL: sclassifier-1.0.7.tar.gz
- Upload date:
- Size: 255.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 596473ca6791769f2d590f41ebdaf235cc570ba305a310de12fc1414220ff7c5 |
|
MD5 | c87e3fbbbb1527725442e855a7e49241 |
|
BLAKE2b-256 | 01bfec4842d40262463671602f84673f2dfd06c1cac6f02c7d4229e69d3bdcf8 |