Skip to main content

Astromorph tool

Project description

AstroMorph

The AstroMorph project is an ML project to automatically separate a collection of astronomical objects based on their morphology. The method and science demonstration are detailed in Boschman et al. (in preparation). If you use AstroMorph in your research, please consider citing our paper:

astromorph: self-supervised machine learning pipeline for astronomical morphology analysis
L. Boschman, O. Maya Lucas, P. Bjerkeli, J. Kainulainen, and M. C. Toribio (In preparation)

Installation

This project has been developed for Python 3.12. Lower versions of Python might work, but there is no guarantee. Anything below Python 3.9 will definitely not work.

The easiest way to set this project up is inside its own virtual environment. If you are not familiar with those, you can read up on them here. In short, they are a very convenient way of separating projects with conflicting requirements

# Run this command inside your working directory to create a virtual environment
$ python -m venv .venv
# The virtual environment is created inside its own subdirectory
$ ls -a
.
..
.venv

It is very easy to activate this environment, and deactivate it when you no longer need it.

# activate the virtual environment
$ source .venv/bin/activate
# deactivate the venv when no longer using it, or switching to a different project
$ deactivate

The next step is to install the dependencies in the virtual environment

# If you have deactivated your venv, make sure to activate it again
$ source .venv/bin/activate
# Install requirements using pip
$ pip install -r requirements.txt

Package contents

In this package we provide the following functionalities:

  • the BYOL class as a PyTorch implementation of the BYOL framework;
  • the ByolTrainer class wraps around BYOL to provide an easy training interface;
  • a FilelistDataset for easy handling of sets of FITS files;
  • a light-weight 2D convolutional neural network called AstroMorphologyModel
  • a configurable training script for basic command line use;
  • a configurable inference script for easy inspection of the resulting embeddings.

The BYOL class provides the most flexibility, but requires some experience with setting up a training routine in PyTorch. ByolTrainer and the training script provide more ease-of-use at the cost of some flexibility.

BYOL Class

The BYOL class is a subclass of pytorch.nn.Module, and can therefore be used like any other PyTorch module.

NB: remember to call the update_moving_average() method after every optimization step. If you do not know why you have to do this, please have a look at the original paper at https://arxiv.org/abs/2006.07733.

ByolTrainer

The ByolTrainer class is a wrapper around BYOL providing an easy-to-use interface for those less experienced in neural networks. Primarily, one only needs to specify the core network and the dimensionality of the resulting embeddings. More customization is possible through providing an augmentation function, an optimizer, a learning rate, etc. For training, one only needs two PyTorch DataLoader instances for the training- and test-set.

See below for a basic example using the AstroMorphologyModel network from this package:

from torch.utils.data import DataLoader
from astromorph import AstroMorphologyModel, ByolTrainer

train_data = DataLoader(...)
test_data = DataLoader(...)

model = ByolTrainer(AstroMorphologyModel(), representation_size=128)

model.train_model(train_data=train_data, test_data=test_data, epochs=10)

Training Script

Basic configuration

The settings of the training run are specified through a TOML file. An example of such a file can be found in example_settings.toml. This file can be passed to the script with the -c or --config-file flag. The script should be invoked from the main folder of the repository:

astromorph training -c example_settings.toml

Training

Filelist

Input can be specified as a filelist, which we specify with the -d flag. Such a filelist can be made using the find command line program. In the example below, we want to use all the FITS files inside the directory data that are smaller than 10 MB as input for our model. We do this using the following commands:

# Find the filenames and store them in data/inputfiles.txt
find /full/path/to/datadirectory/ -type f -size -10M -name "**.fits" > data/inputfiles.txt

astromorph training -c training_settings.toml

In this example, training_settings.toml would look similar to

# Configfile for using a filelist
datafile = "data/inputfiles.txt"
epochs = 5
network_name = "n_layer_resnet"
Epochs

Optionally, the number of training epochs can be specified with the epochs keyword, with a default of 10.

# Configfile for using a filelist
datafile = "data/inputfiles.txt"
epochs = 5
network_name = "n_layer_resnet"
Reduced ResNet18 network

It is possible to use only a few of the convolutional layers of the ResNet18 network. There are four convolutional layers in ResNet18, named layer1, layer2, layer3, and layer4. If we select layer2 as the last convolutional layer, layer3 and layer4 will be removed from the network.

This might be beneficial, as the earlier layers are usually more generic. To invoke this possibility, use the last_layer keyword inside the network_settings dictionary. By default, n_layer_resnet will use the full ResNet18 network.

# Configfile for using a filelist
datafile = "data/inputfiles.txt"
epochs = 5
network_name = "n_layer_resnet"

[network_settings]
last_layer = "layer2"
Other settings

Other settings that can be set in a config file are the following:

# Limit the number of cores used in the training process
core_limit = 4

# If the network expects 3-channel RGB images, but you have single-channel images
[data_settings]
stacksize = 3

# Specify dimensions for the BYOL components
[byol_settings]
representation_size = 128
projection_size = 16
projection_hidden_size = 512
use_momentum = true # Target network is exponential MA of online network.

Inference

To run a trained network on some data, we will have to specify the location of the trained neural network. We do this with the trained_network_name keyword in the config file.

# Configfile for using a filelist
datafile = "data/inputfiles.txt"
epochs = 5
network_name = "n_layer_resnet"
trained_network_name = "saved_models/newly_trained_network.pt"

export_to_csv = true # Export embeddings and metadata to a CSV file

[network_settings]
last_layer = "layer2"

The non-relevant options (e.g. epochs) will be ignored, so you can reuse the config file from the training run.

Alternatively, you can specify the relevant options using the command line, as shown here:

astromorph inference -d <data-file> -m <mask-file> -n <trained-network-file>

It is even possible to use a combination of config file and command line options. The options given in the command line will overrule the settings specified in the config file.

astromorph inference -c example_settings.toml -n saved_models/newly_trained_network.pt

Visualisation

To view a visualisation of the embeddings after inference, use TensorBoard with:

tensorboard --logdir=./runs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astromorph-0.1.2.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

astromorph-0.1.2-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file astromorph-0.1.2.tar.gz.

File metadata

  • Download URL: astromorph-0.1.2.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.1

File hashes

Hashes for astromorph-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1cdde3c3c3e96cac48cb19c6a5505d3ff73da9bcf3478409fd9ca9f879260e16
MD5 d5a935a7528f67bb669b67e257754eed
BLAKE2b-256 97ba5e07ddedebae8ba7cc80186372024649564dd28edf6eebba78047e9fbb3f

See more details on using hashes here.

File details

Details for the file astromorph-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for astromorph-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 20c93828533a29c2a50c439e5dca54e2b10016cd05b9c58d0680fd2fa0731642
MD5 64c83464b7ff6d7e19a1e9e92b2aeb4b
BLAKE2b-256 0eb77226112156728e0f289942f4289cbe1b4c62dc3d85f6699000f4fb51e36d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page