Skip to main content

U-Net based cell and nucleus segmentation for brightfield microscopy

Project description

DOI License Tests codecov

aiSEGcell - Overview

This repository contains a torch implementation of U-Net (Ronneberger et al., 2015). We provide trained models to semantically segment nuclei and whole cells in bright field images. Please cite this paper if you are using this code in your research.

Contents

Installation

If you do not have python installed already, we recommend installing it using the Anaconda distribution. aisegcell was tested with python 3.8.6.

Virtual environment setup

If you do not use and IDE that handles virtual environments for you (e.g. PyCharm) use your command line application (e.g. Terminal) and one of the many virtual environment tools (see here). We will use conda

  1. Create new virtual environment

    conda create -n aisegcell python=3.8.6
    
  2. Activate virtual environment

    conda activate aisegcell
    

pip installation

Recommended if you do not want to develop the aisegcell code base.

  1. Install aisegcell

    # update pip
    pip install -U pip==23.2.1
    pip install aisegcell
    
  2. (Optional) GPUs greatly speed up training and inference of U-Net and are available for torch (v1.10.2) for Windows and Linux. Check if your GPU(s) are CUDA compatible (Windows, Linux) and update their drivers if necessary.

  3. Install torch/torchvision compatible with your system. aisegcell was tested with torch version 1.10.2, torchvision version 0.11.3, and cuda version 11.3.1. Depending on your OS, your CPU or GPU (and CUDA version) the installation may change

# Windows/Linux CPU
pip install torch==1.10.2+cpu torchvision==0.11.3+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

# Windows/Linux GPU (CUDA 11.3.X)
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# macOS CPU
pip install torch==1.10.2 torchvision==0.11.3
  1. Install pytorch-lightning. aisegcell was tested with version 1.5.9.
# note the installation of v1.5.9 does not use pip install lightning
pip install pytorch-lightning==1.5.9

Source installation

Installation requires a command line application (e.g. Terminal) with git and python installed. If you operate on Windows we recommend using Ubuntu on Windows. Alternatively, you can install Anaconda and use Anaconda Powershell Prompt. An introductory tutorial on how to use git and GitHub can be found here.

  1. (Optional) If you use Anaconda Powershell Prompt, install git through conda

    conda install -c anaconda git
    
  2. clone the repository (consider ssh alternative)

    # change directory
    cd /path/to/directory/to/clone/repository/to
    
    git clone https://github.com/CSDGroup/aisegcell.git
    
  3. Navigate to the cloned directory

    cd aisegcell
    
  4. Install aisegcell

    # update pip
    pip install -U pip==23.2.1
    
    1. as a user

      pip install .
      
    2. as a developer (in editable mode with development dependencies and pre-commit hooks)

      pip install -e ".[dev]"
      pre-commit install
      
  5. (Optional) GPUs greatly speed up training and inference of U-Net and are available for torch (v1.10.2) for Windows and Linux. Check if your GPU(s) are CUDA compatible (Windows, Linux) and update their drivers if necessary.

  6. Install torch/torchvision compatible with your system. aisegcell was tested with torch version 1.10.2, torchvision version 0.11.3, and cuda version 11.3.1. Depending on your OS, your CPU or GPU (and CUDA version) the installation may change

# Windows/Linux CPU
pip install torch==1.10.2+cpu torchvision==0.11.3+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

# Windows/Linux GPU (CUDA 11.3.X)
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# macOS CPU
pip install torch==1.10.2 torchvision==0.11.3
  1. Install pytorch-lightning. aisegcell was tested with version 1.5.9.
# note the installation of v1.5.9 does not use pip install lightning
pip install pytorch-lightning==1.5.9

Data

U-Net is currently intended for single-class semantic segmentation. Input images are expected to be 8-bit or 16-bit greyscale images. Segmentation masks are expected to decode background as 0 intensity and all intensities >0 are converted to a single intensity value (255). Consequently, different instances of a class (instance segmentation) or multi-class segmentations are handled as single-class segmentations. Have a look at this notebook for a data example.

Training

Training U-Net is as simple as calling the command aisegcell_train. We provide a notebook on how to train U-Net with a minimal working example. aisegcell_train is available if you activate the virtual environment you installed and can be called with the following arguments:

  • --help: show help message
  • --data: Path to CSV file containing training image file paths. The CSV file must have the columns bf and mask.
  • --data_val: Path to CSV file containing validation image file paths (same format as --data).
  • --output_base_dir: Path to output directory.
  • --model: Model type to train (currently only U-Net). Default is "Unet".
  • --checkpoint: Path to checkpoint file matching --model. Only necessary if continuing a model training. Default is None.
  • --devices: Devices to use for model training. If you want to use GPU(s) you have to provide int IDs. Multiple GPU IDs have to be listed separated by spacebar (e.g. 2 5 9). If you want to use the CPU you have to use "cpu". Default is "cpu".
  • --epochs: Number of training epochs. Default is 5.
  • --batch_size: Number of samples per mini-batch. Default is 2.
  • --lr: Learning rate of the optimizer. Default is 1e-4.
  • --base_filters: Number of base_filters of Unet. Default is 32.
  • --shape: Shape [heigth, width] that all images will be cropped/padded to before model submission. Height and width cannot be smaller than --receptive_field. Default is [1024,1024].
  • --receptive_field Receptive field of a neuron in the deepest layer. Default is 128.
  • --log_frequency: Log performance metrics every N gradient steps during training. Default is 50.
  • --loss_weight: Weight of the foreground class compared to the background class for the binary cross entropy loss. Default is 1.
  • --bilinear: If flag is used, use bilinear upsampling, else transposed convolutions.
  • --multiprocessing: If flag is used, all GPUs given in devices will be used for traininig. Does not support CPU.
  • --retrain: If flag is used, best scores for model saving will be reset (required for training on new data).
  • --transform_intensity: If flag is used random intensity transformations will be applied to input images.
  • --seed: None or Int to use for random seeding. Default is None.

The command aisegcell_generate_list can be used to write CSV files for --data and --data_val and has the following arguments:

  • --help: show help message
  • --bf: Path (glob pattern) to input images (e.g. bright field). Naming convention must match naming convention of --mask.
  • --mask: Path (glob pattern) to segmentation masks corresponding to --bf.
  • --out: Directory to which output file is saved.
  • --prefix: Prefix for output file name (i.e. {PREFIX}_paths.csv). Default is "train".

Use wildcard characters like * to select all files you want to input to --bf and --mask (see example below).

Consider the following example:

# activate the virtual environment
conda activate aisegcell

# generate CSV files for data and data_val
aisegcell_generate_list \
  --bf "/path/to/train_images/*/*.png" # i.e. select all PNG files in all sub-directories of /path/to/train_images\
  --mask "/path/to/train_masks/*/*mask.png" # i.e. select all files in all sub-directories that end with "mask.png"\
  --out /path/to/output_directory \
  --prefix train

aisegcell_generate_list \
  --bf "/path/to/val_images/*.png" \
  --mask "/path/to/val_masks/*.png" \
  --out /path/to/output_directory \
  --prefix val

# starting multi-GPU training
aisegcell_train \
  --data /path/to/output_directory/train_paths.csv \
  --data_val /path/to/output_directory/val_paths.csv \
  --model Unet \
  --devices 2 4 # use GPU 2 and 4 \
  --output_base_dir /path/to/results/folder \
  --epochs 10 \
  --batch_size 8 \
  --lr 1e-3 \
  --base_filters 32 \
  --shape 1024 512 \
  --receptive_field 128 \
  --log_frequency 5 \
  --loss_weight 1 \
  --bilinear  \
  --multiprocessing # required if you use multiple --devices \
  --transform_intensity \
  --seed 123

# OR retrain an existing checkpoint with single GPU
aisegcell_train \
  --data /path/to/output_directory/train_paths.csv \
  --data_val /path/to/output_directory/val_paths.csv \
  --model Unet \
  --checkpoint /path/to/checkpoint/file.ckpt
  --devices 0 \
  --output_base_dir /path/to/results/folder \
  --epochs 10 \
  --batch_size 8 \
  --lr 1e-3 \
  --base_filters 32 \
  --shape 1024 1024 \
  --receptive_field 128 \
  --log_frequency 5 \
  --loss_weight 1 \
  --bilinear  \
  --transform_intensity \
  --seed 123

The output of aisegcell_train will be stored in subdirectories {DATE}_Unet_{ID1}/lightning_logs/version_{ID2}/ at --output_base_dir. Its contents are:

  • hparams.yaml: stores hyper-parameters of the model (used by pytorch_lightning.LightningModule)
  • metrics.csv: contains all metrics tracked during training
    • loss_step: training loss (binary cross-entropy) per gradient step
    • epoch: training epoch
    • step: training gradient step
    • loss_val_step: validation loss (binary cross-entropy) per validation mini-batch
    • f1_step: f1 score per validation mini-batch
    • iou_step: average of iou_small_step and iou_big_step per validation mini-batch
    • iou_big_step: intersection over union of objects with > 2000 px in size per validation mini-batch
    • iou_small_step: intersection over union of objects with <= 2000 px in size per validation mini-batch
    • loss_val_epoch: average loss_val_step over all validation steps per epoch
    • f1_epoch: average f1_step over all validation steps per epoch
    • iou_epoch: average iou_step over all validation steps per epoch
    • iou_big_epoch: average iou_big_epoch over all validation steps per epoch
    • iou_small_epoch: average iou_small_epoch over all validation steps per epoch
    • loss_epoch: average loss_step over all training gradient steps per epoch
  • checkpoints: model checkpoints are stored in this directory. Path to model checkpoints are used as input to --checkpoint of aisegcell_train or --model of aisegcell_test and aisegcell_predict.
    • best-f1-epoch={EPOCH}-step={STEP}.ckpt: model weights with the (currently) highest f1_epoch
    • best-iou-epoch={EPOCH}-step={STEP}.ckpt: model weights with the (currently) highest iou_epoch
    • best-loss-epoch={EPOCH}-step={STEP}.ckpt: model weights with the (currently) lowest loss_val_epoch
    • latest-epoch={EPOCH}-step={STEP}.ckpt: model weights of the (currently) latest checkpoint

Trained models

We provide trained models:

modality image format example image description availability
nucleus segmentation 2D grayscale Trained on a data set (link to data set) of 9849 images (~620k nuclei). ETH Research Collection
whole cell segmentation 2D grayscale Trained on a data set (link to data set) of 224 images (~12k cells). ETH Research Collection

Testing

A trained U-Net can be tested with aisegcell_test. We provide a notebook on how to test with U-Net. aisegcell_test returns predicted masks and performance metrics. aisegcell_test can be called with the following arguments:

  • --help: show help message
  • --data: Path to CSV file containing test image file paths. The CSV file must have the columns bf and --mask.
  • --model: Path to checkpoint file of trained pytorch_lightning.LightningModule.
  • --suffix: Suffix to append to all mask file names.
  • --output_base_dir: Path to output directory.
  • --devices: Devices to use for model training. If you want to use GPU(s) you have to provide int IDs. Multiple GPU IDs have to be listed separated by spacebar (e.g. 2 5 9). If multiple GPUs are provided only the first ID will be used. If you want to use the CPU you have to use "cpu". Default is "cpu".

Make sure to activate the virtual environment created during installation before calling aisegcell_test.

Consider the following example:

# activate the virtual environment
conda activate aisegcell

# generate CSV file for data
aisegcell_generate_list \
  --bf "/path/to/test_images/*.png" \
  --mask "/path/to/test_masks/*.png" \
  --out /path/to/output_directory \
  --prefix test

# run testing
aisegcell_test \
  --data /path/to/output_directory/test_paths.csv \
  --model /path/to/checkpoint/file.ckpt \
  --suffix mask \
  --output_base_dir /path/to/results/folder \
  --devices 0 # predict with GPU 0

The output of aisegcell_test will be stored in subdirectories lightning_logs/version_{ID}/ at --output_base_dir. Its contents are:

  • hparams.yaml: stores hyper-parameters of the model (used by pytorch_lightning.LightningModule)
  • metrics.csv: contains all metrics tracked during testing. Column IDs are identical to metrics.csv during training
  • test_masks: directory containing segmentation masks obtained from U-Net

Predicting

A trained U-Net can used for predictions with aisegcell_predict. We provide a notebook on how to predict with U-Net. aisegcell_predict returns only predicted masks metrics and can be called with the following arguments:

  • --help: show help message
  • --data: Path to CSV file containing predict image file paths. The CSV file must have the columns bf and --mask.
  • --model: Path to checkpoint file of trained pytorch_lightning.LightningModule.
  • --suffix: Suffix to append to all mask file names.
  • --output_base_dir: Path to output directory.
  • --devices: Devices to use for model training. If you want to use GPU(s) you have to provide int IDs. Multiple GPU IDs have to be listed separated by spacebar (e.g. 2 5 9). If multiple GPUs are provided only the first ID will be used. If you want to use the CPU you have to use "cpu". Default is "cpu".

Make sure to activate the virtual environment created during installation before calling aisegcell_predict.

Consider the following example:

# activate the virtual environment
conda activate aisegcell

# generate CSV file for data
aisegcell_generate_list \
  --bf "/path/to/predict_images/*.png" \
  --mask "/path/to/predict_images/*.png" # necessary to provide "--mask" for aisegcell_generate_list \
  --out /path/to/output_directory \
  --prefix predict

# run prediction
aisegcell_predict \
  --data /path/to/output_directory/predict_paths.csv \
  --model /path/to/checkpoint/file.ckpt \
  --suffix mask \
  --output_base_dir /path/to/results/folder \
  --devices 0 # predict with GPU 0

The output of aisegcell_predict will be stored in subdirectories lightning_logs/version_{ID}/ at --output_base_dir. Its contents are:

  • hparams.yaml: stores hyper-parameters of the model (used by pytorch_lightning.LightningModule)
  • predicted_masks: directory containing segmentation masks obtained from U-Net

napari plugin

aisegcell_predict is also available as a plug-in for napari (link to napari-hub page and github page).

Image annotation tools

Available tools to annotate segmentations include:

Troubleshooting & support

In case you are experiencing issues with aisegcell inform us via the issue tracker. Before you submit an issue, check if it has been addressed in a previous issue.

Citation

t.b.d.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aisegcell-0.2.0.tar.gz (2.4 MB view hashes)

Uploaded Source

Built Distribution

aisegcell-0.2.0-py3-none-any.whl (30.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page