Skip to main content

Biom3d. Framework for easy-to-use biomedical image segmentation.

Project description

:microscope: Biom3d

Documentation

Warning: This repository is still a work in progress!

Biom3d automatically configures the training of a 3D U-Net for 3D semantic segmentation.

The default configuration matches the performance of nnUNet but is much easier to use both for community users and developers. Biom3d is flexible for developers: easy to understand and easy to edit.

Biom3d modules nnUNet modules

Illustrations generated with pydeps module

Disclaimer: Biom3d does not include ensemble learning, the possibility to use 2D U-Net or 3D-Cascade U-Net or Pytorch parallel computing yet. However, these options could easily be adapted if needed.

There are two main types of users of Biom3d:

  • Community users, who are interested in using the basic features of Biom3d: GUI, predictions with ready-to-use models or default training.
  • Deep-learning developers, who are interested in more advanced features: changing default configuration, writing of new Biom3d modules, Biom3d core editing etc.

In the following documentation, we marked the advanced features with :rocket: symbol.

Warning: For Windows users, the paths are here written in "linux-like format". You will have to change '/' symbols to '\' symbols in the command lines.

:hammer: Installation

For the installation details, please check our documentation here: Documentation-Installation

:hand: Usage

For Graphical User Interface users, please check our documentation here: Documentation-GUI

Two options:

  • If you have a trained model, you can do predictions directly.
  • If you do not have a trained model, you must train one and, to do so, you must preprocess your data and create a configuration file.

Three steps to train a new model:

Training preprocessing

Preprocessing consists in transforming the training images and masks to the appropriate format for both training and prediction.

Folder structure

The training images and masks must all be placed inside two distinct folders:

training_folder
├── images
│   ├── image_01.tif
│   ├── image_02.tif
│   └── ...
└── masks
    ├── image_01.tif
    ├── image_01.tif
    └── ...

About the naming, the only constraint is that the images and masks have the exact same name. All the folders can have any name with no space in it and the parent folder structure does not matter.

Image format

To help formatting the images to the correct format, we have written a preprocessing script (preprocess.py). More details are available in the next section.

Constraints:

  • The images and masks must be .tif files.
  • The images and masks must all have 4 dimensions: (channel, height, width, depth).
  • Each dimension of each image must be identical to each dimension of the corresponding mask, expect for the channel dimension.
  • Images must be stored in float32 format (numpy.float32).
  • Masks must be stored in byte format (numpy.byte) or int64 format (numpy.int64 or python int type).
  • Masks values must be 0 or 1. Each mask channel represents one type of object. Masks do not have to be 'one-hot' encoded as we use sigmoid activation and not softmax activation.

Recommandations: (the training might work well without these constraints)

  • Images values must be Z-normalized

Helper function

We defined a function in biom3d/preprocess.py to help preprocess the images.

Here is an example of how to use it:

python biom3d/preprocess.py --img_dir path/to/image/folder --img_out_dir path/to/preprocessed/image/folder --msk_dir path/to/mask/folder --msk_out_dir path/to/preprocessed/mask/folder --auto_config

The --auto_config option is recommended. It helps you complete the configuration file by providing you the ideal patch size, batch size and number of poolings depending of the median size of the dataset images.

Training configuration file definition

All of the hyper-parameters are defined in the configuration file. The configuration files are stored in Python format in the configs folder. You can create a new config file by copy/paste one of the existing ones and by adapting the parameters defined below. For instance, copy/paste and rename unet_pancreas.py in the same folder and open this Python script with your favourite text editor.

There are two types of hyper-parameters in the configuration file: builder parameters and modules parameters.

Builder parameters

Builder parameters are written as follows: NAME=value. The dataset builder parameters must be adapted to your own dataset and the Auto-config builder parameters value can be set with the pre-processing values. The rest of the builder parameters is optional.

Here is the exhaustive list of builder parameters:

#---------------------------------------------------------------------------
# Dataset builder-parameters
# EDIT THE FOLLOWING PARAMATERS WITH YOUR OWN DATASETS PARAMETERS

# Folder where pre-processed images are stored
IMG_DIR = 'data/pancreas/tif_imagesTr_small'

# Folder where pre-processed masks are stored
MSK_DIR = 'data/pancreas/tif_labelsTr_small'

# (optional) path to the .csv file storing "filename,hold_out,fold", where:
# "filename" is the image name,
# "hold_out" is either 0 (training image) or 1 (testing image),
# "fold" (non-negative integer) indicates the k-th fold, 
# by default fold 0 of the training image (hold_out=0) is the validation set.
CSV_DIR = 'data/pancreas/folds_pancreas.csv'

# CSV_DIR can be set to None, in which case the validation set will be
# automatically chosen from the training set (20% of the training images/masks)
# CSV_DIR = None 

# model name
DESC = 'unet_mine-pancreas_21'

# number of classes of objects
# the background does not count, so the minimum is 1 (the max is 255)
NUM_CLASSES=2

#---------------------------------------------------------------------------
# Auto-config builder-parameters
# PASTE AUTO-CONFIG RESULTS HERE

# batch size
BATCH_SIZE = 2

# patch size passed to the model
PATCH_SIZE = [40,224,224]

# larger patch size used prior rotation augmentation to avoid "empty" corners.
AUG_PATCH_SIZE = [48,263,263]

# number of pooling done in the UNet
NUM_POOLS = [3,5,5]

# median spacing is used only during prediction to normalize the output images
# it is commented here because we did not noticed any improvemet
# MEDIAN_SPACING=[0.79492199, 0.79492199, 2.5]
MEDIAN_SPACING=[]

#---------------------------------------------------------------------------
# Advanced paramaters (can be left as such) 
# training configs

# whether to store also the best model 
SAVE_BEST = True 

# number of epochs
# the number of epochs can be reduced for small training set (e.g. a set of 10 images/masks of 128x128x64)
NB_EPOCHS = 1000

# optimizer paramaters
LR_START = 1e-2 # comment if need to reload learning rate after training interruption
WEIGHT_DECAY = 3e-5

# whether to use deep-supervision loss:
# a loss is placed at each stage of the UNet model
USE_DEEP_SUPERVISION = False

# whether to use softmax loss instead of sigmoid
# should not be set to True if object classes are overlapping in the masks
USE_SOFTMAX=False 

# training loop parameters
USE_FP16 = True
NUM_WORKERS = 4

#---------------------------------------------------------------------------
# callback setup (can be left as such) 
# callbacks are routines that execute periodically during the training loop

# folder where the training logs will be stored, including:
# - model .pth files (state_dict)
# - image snapshots of model training (only if USE_IMAGE_CLBK is True)
# - logs with this configuration stored in .yaml format and tensorboard logs
LOG_DIR = 'logs/'

SAVE_MODEL_EVERY_EPOCH = 1
USE_IMAGE_CLBK = True
VAL_EVERY_EPOCH = SAVE_MODEL_EVERY_EPOCH
SAVE_IMAGE_EVERY_EPOCH = SAVE_MODEL_EVERY_EPOCH
USE_FG_CLBK = True
#---------------------------------------------------------------------------

:rocket: Module parameters

The modules parameters are written as follows in the configuration file:

NAME=Dict(
  fct="RegisterName"
  kwargs=Dict(
    key_word=arguments,
  )
)

The fct argumentation correspond to one of the module name listed in the register.py file. The register.py file lists all existing modules in Biom3d. To have more details about one specific module, we recommended to read the documentation of the module. There are currently 5 main modules type: dataset, model, metric, trainer and predictor. Each modules are not compatible with all modules, read the documentation for more details.

:muscle: Training

Please create a folder named logs/ in the current directory.

Once the configuration file is defined, the training can start with the following command:

python biom3d/train.py --config configs.your_config_file

Careful, do not put .py in the end of your config file name.

A new sub-folder, that we dubbed base-folder in this documentation, will be created in the logs/ folder. The base-folder contains 3 sub-folders:

  • image: with the snapshots of the current training results
  • log: with the configuration files stored in Yaml format and with Tensorboard event file
  • model: with the Pytorch model(s).

You can plot the training curves during model training with the following command:

tensorboard --logdir=logs/

:rocket: Advanced training/evaluation/prediction

Biom3d has originally been designed to fasten state-of-the-art tools development for 3d bio-medical imaging, that's why it possible to run in a single command: the training, the test prediction and the test metrics computations. Use python biom3d/train.py --help to get more details.

:dart: Prediction

Once your model is trained, it is ready to use for prediction with the following command:

python biom3d/pred.py --log path/to/base-folder --dir_in path/to/raw/data --dir_out path/of/the/future/predictions 

For Omero user, you can use the following command to download a Omero Dataset or a Omero Project and to directly run the prediction over this dataset:

python biom3d/omero_pred.py --obj Dataset:ID

or with a Omero Project

python biom3d/omero_pred.py --obj Project:ID

The previous command will ask you to provide your omero server name, your omero identification and your omero password.

:rocket: Advanced prediction

pred.py can also be used to compare the prediction results with existing test annotations. Use python biom3d/pred.py --help for more details.

:bookmark_tabs: Citation

If you find Biom3d useful in your research, please cite:

@misc{biom3d,
  title={{Biom3d} Easy-to-use Tool for 3D Semantic Segmentation of Volumetric Images using Deep Learning},
  author={Guillaume Mougeot},
  howpublished = {\url{https://github.com/GuillaumeMougeot/biom3d}},
  year={2022}
  }

:moneybag: Fundings and Acknowledgements

This project has been inspired by the following publication: "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation", Fabian Isensee et al, Nature Method, 2021.

This project has been supported by Oxford Brookes University and the European Regional Development Fund (FEDER). It was carried out between the laboratories of iGReD (France), Institut Pascal (France) and Plant Nuclear Envelop (UK).

Europe Brookes iGReD IP AURA UCA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biom3d-0.0.12.tar.gz (104.8 kB view details)

Uploaded Source

Built Distribution

biom3d-0.0.12-py3-none-any.whl (112.2 kB view details)

Uploaded Python 3

File details

Details for the file biom3d-0.0.12.tar.gz.

File metadata

  • Download URL: biom3d-0.0.12.tar.gz
  • Upload date:
  • Size: 104.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for biom3d-0.0.12.tar.gz
Algorithm Hash digest
SHA256 12b8c8dffc9155f7b272477df9acc6d5216b3704205da332b0ca6d661f06bb8f
MD5 733340ce7d8599478549c6b815cf8545
BLAKE2b-256 4bff21ae9c5fde015d4724db0807f6b4997cb9af363d733d42afbc759d711511

See more details on using hashes here.

File details

Details for the file biom3d-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: biom3d-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 112.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for biom3d-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 e4c3973a14d96dfc55355c272197f8052434d47631b3cdb8bea5a640e6432197
MD5 e486566d9fcfc1068504dc37603db903
BLAKE2b-256 f4f9a2ec65fd69d9aea14cf064ab2a5f8a6fafdc343e9c1b9658e8a35d2cc607

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page