Skip to main content

Galaxy morphology classifiers

Project description

Zoobot

Documentation Status Build Status DOI ascl:2203.027

Zoobot classifies galaxy morphology with deep learning. This code will let you:

  • Reproduce and improve the Galaxy Zoo DECaLS automated classifications
  • Finetune the classifier for new tasks

For example, you can train a new classifier like so:

model = define_model.get_model(
    output_dim=len(schema.label_cols),  # schema defines the questions and answers
    input_size=initial_size, 
    crop_size=int(initial_size * 0.75),
    resize_size=resize_size
)

model.compile(
    loss=losses.get_multiquestion_loss(schema.question_index_groups),
    optimizer=tf.keras.optimizers.Adam()
)

training_config.train_estimator(
    model, 
    train_config,  # parameters for how to train e.g. epochs, patience
    train_dataset,
    test_dataset
)

You can finetune Zoobot with a free GPU using this Google Colab notebook. To install locally, keep reading.

Installation

Development Use

If you will be making changes to the Zoobot package itself (e.g. to add a new architecture), download the code using git:

# I recommend using a virtual environment, see below
git clone git@github.com:mwalmsley/zoobot.git

And then install Zoobot using pip, specifying either the pytorch dependencies, the tensorflow dependencies, or both:

pip install -e zoobot[pytorch]  # pytorch dependencies
pip install -e zoobot[tensorflow]  # tensorflow dependencies
pip install -e zoobot[pytorch,tensorflow]  # both

The main branch is for stable-ish releases. The dev branch includes the shiniest features but may change at any time.

Direct Use

I expect most users will make small changes. But if you won't be making any changes to Zoobot itself (e.g. you just want to apply it, or you're in a production environment), you can simply install directly from pip:

pip install zoobot[pytorch]  # pytorch dependencies
# other dependency options as above

Getting Started

To get started, see the documentation. For pretrained model weights, precalculated representations, catalogues, and so forth, see the data notes in particular.

I also include some working examples for you to copy and adapt.

TensorFlow:

PyTorch:

I also include some examples which record how the models in W+22a (the GZ DECaLS data release) were trained:

There's also the gz_decals_data_release_analysis_demo.ipynb, which describes Zoobot's statistical predictions. When trained from scratch, it predicts the parameters for distributions, not simple class labels!

Latest features

  • Added to PyPI/pip! Convenient for production or simple use.
  • PyTorch version! Integrates with PyTorch Lightning and WandB. Multi-GPU support. Trains on jpeg images, rather than TFRecords, and does not yet have a finetuning example script.
  • Train on colour (3-band) images: Add --color (American-friendly) to train_model.py
  • Select which EfficientNet variant to train using the get_architecture arg in define_model.py - or replace with a func. returning your own architecture!
  • New predict_on_dataset.py and save_predictons.py modules with useful functions for making predictions on large sets of images. Predictions are now saved to .hdf5 by default, which is much more convenient than csv for multi-forward-pass predictions.
  • Multi-GPU (single node) training
  • Support for Weights and Biases (wandb)
  • Worked examples for custom representations
  • Colab notebook for GZ predictions and fine-tuning
  • Schemas (questions and answers for the decision trees) extended to include DECaLS DR1/2 and DR8, in various combinations. See zoobot.shared.label_metadata.py.
  • Test time augmentations are now off by default but can be enabled with --test-time-augs on train_model.py

Contributions are welcome and will be credited in any future work.

Note on Environments

I recommend installing in a virtual environment like anaconda. For example, conda create --name zoobot python=3.7, then conda activate zoobot. Do not install packages directly with anaconda itself (e.g. conda install tensorflow) as Anaconda may install older versions. Use pip instead, as above. Python 3.7 or greater is required.

Replication

For replication of the GZ DECaLS classifier see /replicate. This contains slurm scripts to:

  • Create training TFRecords equivalent to those used to train the published classifier
  • Train the classifier itself (by calling zoobot/tensorflow/examples/train_model.py)

Citing

If you use this repo for your research, please cite the paper and the code (via Zenodo).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zoobot-0.0.4.tar.gz (90.0 kB view details)

Uploaded Source

Built Distribution

zoobot-0.0.4-py3-none-any.whl (108.6 kB view details)

Uploaded Python 3

File details

Details for the file zoobot-0.0.4.tar.gz.

File metadata

  • Download URL: zoobot-0.0.4.tar.gz
  • Upload date:
  • Size: 90.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.11

File hashes

Hashes for zoobot-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d50c2bb4b37ad0b857db828058de1eb51f13226436877b40a63ddc9c26ffb9ed
MD5 354fd23a4f3bac02a9e6903190f59abc
BLAKE2b-256 a0af57e1d723daef82aff89d14db2419d2673cc5bc20bee2a6ca41fd1c2e1fff

See more details on using hashes here.

File details

Details for the file zoobot-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: zoobot-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 108.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.11

File hashes

Hashes for zoobot-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 710c9fad8511c19357534ca05b3574659ae377a6c23f7e6fad548de94406c337
MD5 c565d1b4138257b23c4369aa02a5882f
BLAKE2b-256 273c062c94f1b43309aa9bf76cf7b017a292d88c9737231ac145d68ea53884a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page