This project is the source code to our paper.

Project description

Scaling the U-net

Requirements to use this project

poetry
python 3.9

You can use many ways to get a specific python version. I recommend Conda

Conda installation
Once installed you need to install a python 3.9 interpreter:

# Conda
conda create -n sun_py39 python=3.9
conda activate sun_py39

Install poetry:

# Poetry for Ubuntu
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

How to use

# clone project
git clone https://gitlab.desy.de/ivo.matteo.baltruschat/scaling_the_u_net.git
cd scaling_the_u_net

poetry install

Rename .env.example to .env. Set PROJECT_DIR to your root path. If you want wandb support, set WANDB_USER to your user name
Run dvc checkout --summary to restore raw and processed data from the shared_cache(/beegfs/desy/group/it/ReferenceData/dvc/shared_cache).
Now your can simply run python run_training.py experiment=baseline.

Afterwards you can use run_testing.py to evaluate your trained model on the test data

run python run_testing.py experiment_dir=[your_experiment_name] (replace [...] with your experiment subdir in the experiment folder)

Guide

How To Get Started

First, you should probably get familiar with Tensorflow
Next, go through Hydra quick start guide and basic Hydra tutorial

How it works

By design, every run is initialized by run_training.py file. All modules are dynamically instantiated from module paths specified in config. Example model config:

filters: 64 # Channel output of first layer -> other depend on this one
kernel_size: [3, 3] # Kernel size for all Conv layers
num_layers: 5 # Number of u-net blocks
regularization_factor_l1: -1 # If -1, no regularization.
regularization_factor_l2: -1 # If -1, no regularization.
layer_order: "CNAD" # Choose the oder of Conv,Norm,Act,Drop ["CADN", "CAND", "CDAN", "CDNA", "CNDA", "CNAD"]
dropout_type: "spatial" # ["standard", "spatial"] Type of dropout used in a "CNAD" stack
dropout_conv: -1 # If -1, no dropout used in a "CNAD" stack
use_norm: "BatchNorm" # ["none", "BatchNorm", "GroupNorm"] Type of normalization used in a "CNAD" stack
activation: "mish" # ["relu", "leakyReLU", "mish"] Type of activation used in a "CNAD" stack
dropout: -1 # Dropout before the Conv-bock of latent space. If -1, no dropout used.
output_activation: "linear" # ["linear", "softmax"] If "linear", model will provide logits and not prediction probabilities

This allows you to easily iterate over new models!
Every time you create a new one, just specify its module path and parameters in appriopriate config file.
The whole pipeline managing the instantiation logic is placed in scalingtheunet/executor/training.py.

Main Project Configuration

Location: configs/config.yaml
Main project config contains default training configuration.
It determines how config is composed when simply executing command python run_training.py. You can overwrite all parameters by cmd arguments python run_training.py experiment_name=test123 model.filters=16

Show main project configuration

# specify here default training configuration
# @package _global_

# specify here default training configuration
defaults:
  - _self_
  - trainer: default.yaml
  - model: simple_unet.yaml
  - datamodule: mg_full.yaml
  - callbacks: default.yaml
  - logger: default.yaml

  #- mode: default.yaml

  - experiment: null

  # enable color logging
  - override hydra/hydra_logging: colorlog
  - override hydra/job_logging: colorlog

# name of the run, accessed by loggers
# allows for custom naming of the experiment
name: ???

current_time: ${now:%Y-%m-%d}_${now:%H-%M-%S}

hydra:
  # sets output paths for all file logs to `logs/experiment/name'
  run:
    dir: experiments/${name}/${current_time}
  sweep:
    dir: experiments/${name}/${current_time}
    subdir: ${hydra.job.num}
  output_subdir: "hydra_training"

# path to original working directory
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have this path as a special variable
# https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
work_dir: ${hydra:runtime.cwd}

# path to folder with data
data_dir: ${work_dir}/data/

# mlflow path
mlflow_dir: null #${work_dir}/mlflow/

# tensorboard path
tensorboard_dir: ${work_dir}/tensorboard/

# pretty print config at the start of the run using Rich library
print_config: True

# pretty print history after the run using Rich library
print_history: True

# disable python warnings if they annoy you
ignore_warnings: True

# seed for random number generators in pytorch, numpy and python.random
seed: "0xCAFFEE"

Experiment Configuration

Location: configs/experiment
You should store all your experiment configurations in this folder.
Experiment configurations allow you to overwrite parameters from main project configuration.

Simple example

# @package _global_

# to execute this experiment run:
# python run.py experiment=example_simple.yaml

defaults:
  - override /trainer: default.yaml
  - override /model: simple_unet.yaml
  - override /datamodule: mg_full.yaml
  - override /callbacks: default.yaml
  - override /logger: default.yaml


# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters

name: simple_example
seed: "OxCAFFEE"

trainer:
  epochs: 5

model:
  filters: 16
  num_layers: 4
  activation: "relu"

datamodule:
  batch_size: 8

Project Organization

    ├──.venv                <- Local poetry environment
    │   └──.gitkeep
    ├── configs
    ├── data
    │   ├── external        <- Data from third party sources.
    │   ├── interim         <- Intermediate data that has been transformed.
    │   ├── processed       <- The final, canonical data sets for modeling.
    │   └── raw             <- The original, immutable data dump.
    ├── logs                <- Tensorboard logging folder; Will be created by training
    ├── experiments         <- All your runs will be saved here; Will be created by training
    ├── mlflow              <- MLflow logging folder; Will be created by training
    ├── models              <- Trained and serialized models, model predictions, or model summaries
    ├── notebooks           <- Jupyter notebooks. Naming convention is a number (for ordering),
    │                         the creator's initials, and a short `-` delimited description, e.g.
    │                         `1.0-jqp-initial-data-exploration`.
    ├── references          <- Data dictionaries, manuals, and all other explanatory materials.
    ├── reports             <- Generated analysis as HTML, PDF, LaTeX, etc.
    │   └── figures         <- Generated graphics and figures to be used in reporting
    ├── test                <- Data dictionaries, manuals, and all other explanatory materials.
    ├── scalingtheunet                        <- Source code for use in this project.
    │   │
    │   ├── data                              <- Scripts to download or generate data
    │   │   ├── __init__.py
    │   │   └── make_dataset.py               <- TODO
    │   ├── dataloaders                       <- Scripts to handel and load the preprocessed data
    │   │   ├── __init__.py                   <- TODO
    │   │   └── hzg_mg_tomo.py                <- TODO
    │   ├── evaluation                        <- Scripts to do evaluation of the results
    │   │   └── __init__.py                   <- TODO
    │   ├── executor                          <- Scripts to train, eval and test models
    │   │   ├── __init__.py                   <- TODO
    │   │   ├── testing_9axes.py
    │   │   └── training.py
    │   ├── models                            <- Scripts to define model architecture
    │   │   ├── __init__.py                   <- TODO
    │   │   └── simple_unet.py
    │   ├── utils                             <- TODO
    │   │   ├── __init__.py                   <- TODO
    │   │   ├── file_system.py                <- TODO
    │   │   ├── logger.py                     <- TODO
    │   │   ├── losses.py                     <- TODO
    │   │   ├── metrics.py                    <- TODO
    │   │   ├── my_callback.py                <- TODO
    │   │   └── utils.py                      <- TODO
    │   │
    │   ├── visualization                     <- Scripts to create exploratory and results oriented
    │   │                                       visualizations
    │   └── __init__.py                       <- Makes {{ cookiecutter.module_name }} a Python module
    │
    ├── .editorconfig         <- file with format specification. You need to install
    │                             the required plugin for your IDE in order to enable it.
    ├── .gitignore         <- file that specifies what should we commit into
    │                             the repository and we should not.
    ├── LICENSE
    ├── Makefile            <- Makefile with commands like `make data` or `make train`
    ├── poetry.toml         <- poetry config file to install enviroment locally
    ├── poetry.lock         <- lock file for dependencies. It is used to install exactly
    │                         the same versions of dependencies on each build
    ├── pyproject.toml      <- The project's dependencies for reproducing the
    │                         analysis environment
    ├── README.md           <- The top-level README for developers using this project.
    └── setup.cfg           <- configuration file, that is used by all tools in this project

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scalingtheunet-0.1.0.tar.gz (38.8 kB view details)

Uploaded Jun 1, 2022 Source

Built Distribution

scalingtheunet-0.1.0-py3-none-any.whl (41.7 kB view details)

Uploaded Jun 1, 2022 Python 3

File details

Details for the file scalingtheunet-0.1.0.tar.gz.

File metadata

Download URL: scalingtheunet-0.1.0.tar.gz
Upload date: Jun 1, 2022
Size: 38.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.2.0a2 CPython/3.8.11 Darwin/21.5.0

File hashes

Hashes for scalingtheunet-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2622c9069013460690b42a91001f57e57c864e9cf087deb3a0b50385784d5a2d`
MD5	`fc575d0b591b12259bce582a807d979c`
BLAKE2b-256	`18ff93eadddb91602112f5e8e3b86fc8d6f6ac78635a999b5d60d24bcea84b46`

See more details on using hashes here.

File details

Details for the file scalingtheunet-0.1.0-py3-none-any.whl.

File metadata

Download URL: scalingtheunet-0.1.0-py3-none-any.whl
Upload date: Jun 1, 2022
Size: 41.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.2.0a2 CPython/3.8.11 Darwin/21.5.0

File hashes

Hashes for scalingtheunet-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d527d74f42f3d9cec2f2ec5a3b14ade80f0dac1e6c7322bd8ac183d6fa485d4`
MD5	`477f282f3b848eb11139bdc1356b6bf2`
BLAKE2b-256	`8e04f19a54daa5597cff3170d84cedd3712c9dc5d1fe83a8a1a0820ce16588e1`