This project is the source code to our paper.
Project description
Scaling the U-net
Requirements to use this project
- poetry
- python 3.9
You can use many ways to get a specific python version. I recommend Conda
- Conda installation
Once installed you need to install a python 3.9 interpreter:
# Conda
conda create -n sun_py39 python=3.9
conda activate sun_py39
Install poetry:
# Poetry for Ubuntu
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
How to use
# clone project
git clone https://gitlab.desy.de/ivo.matteo.baltruschat/scaling_the_u_net.git
cd scaling_the_u_net
poetry install
- Rename
.env.example
to.env
. SetPROJECT_DIR
to your root path. If you want wandb support, setWANDB_USER
to your user name - Run
dvc checkout --summary
to restore raw and processed data from theshared_cache
(/beegfs/desy/group/it/ReferenceData/dvc/shared_cache). - Now your can simply run
python run_training.py experiment=baseline
.
Afterwards you can use run_testing.py to evaluate your trained model on the test data
- run
python run_testing.py experiment_dir=[your_experiment_name]
(replace [...] with your experiment subdir in the experiment folder)
Guide
How To Get Started
- First, you should probably get familiar with Tensorflow
- Next, go through Hydra quick start guide and basic Hydra tutorial
How it works
By design, every run is initialized by run_training.py file. All modules are dynamically instantiated from module paths specified in config. Example model config:
filters: 64 # Channel output of first layer -> other depend on this one
kernel_size: [3, 3] # Kernel size for all Conv layers
num_layers: 5 # Number of u-net blocks
regularization_factor_l1: -1 # If -1, no regularization.
regularization_factor_l2: -1 # If -1, no regularization.
layer_order: "CNAD" # Choose the oder of Conv,Norm,Act,Drop ["CADN", "CAND", "CDAN", "CDNA", "CNDA", "CNAD"]
dropout_type: "spatial" # ["standard", "spatial"] Type of dropout used in a "CNAD" stack
dropout_conv: -1 # If -1, no dropout used in a "CNAD" stack
use_norm: "BatchNorm" # ["none", "BatchNorm", "GroupNorm"] Type of normalization used in a "CNAD" stack
activation: "mish" # ["relu", "leakyReLU", "mish"] Type of activation used in a "CNAD" stack
dropout: -1 # Dropout before the Conv-bock of latent space. If -1, no dropout used.
output_activation: "linear" # ["linear", "softmax"] If "linear", model will provide logits and not prediction probabilities
This allows you to easily iterate over new models!
Every time you create a new one, just specify its module path and parameters in appriopriate config file.
The whole pipeline managing the instantiation logic is placed in scalingtheunet/executor/training.py.
Main Project Configuration
Location: configs/config.yaml
Main project config contains default training configuration.
It determines how config is composed when simply executing command python run_training.py
.
You can overwrite all parameters by cmd arguments python run_training.py experiment_name=test123 model.filters=16
Show main project configuration
# specify here default training configuration
# @package _global_
# specify here default training configuration
defaults:
- _self_
- trainer: default.yaml
- model: simple_unet.yaml
- datamodule: mg_full.yaml
- callbacks: default.yaml
- logger: default.yaml
#- mode: default.yaml
- experiment: null
# enable color logging
- override hydra/hydra_logging: colorlog
- override hydra/job_logging: colorlog
# name of the run, accessed by loggers
# allows for custom naming of the experiment
name: ???
current_time: ${now:%Y-%m-%d}_${now:%H-%M-%S}
hydra:
# sets output paths for all file logs to `logs/experiment/name'
run:
dir: experiments/${name}/${current_time}
sweep:
dir: experiments/${name}/${current_time}
subdir: ${hydra.job.num}
output_subdir: "hydra_training"
# path to original working directory
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have this path as a special variable
# https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
work_dir: ${hydra:runtime.cwd}
# path to folder with data
data_dir: ${work_dir}/data/
# mlflow path
mlflow_dir: null #${work_dir}/mlflow/
# tensorboard path
tensorboard_dir: ${work_dir}/tensorboard/
# pretty print config at the start of the run using Rich library
print_config: True
# pretty print history after the run using Rich library
print_history: True
# disable python warnings if they annoy you
ignore_warnings: True
# seed for random number generators in pytorch, numpy and python.random
seed: "0xCAFFEE"
Experiment Configuration
Location: configs/experiment
You should store all your experiment configurations in this folder.
Experiment configurations allow you to overwrite parameters from main project configuration.
Simple example
# @package _global_
# to execute this experiment run:
# python run.py experiment=example_simple.yaml
defaults:
- override /trainer: default.yaml
- override /model: simple_unet.yaml
- override /datamodule: mg_full.yaml
- override /callbacks: default.yaml
- override /logger: default.yaml
# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters
name: simple_example
seed: "OxCAFFEE"
trainer:
epochs: 5
model:
filters: 16
num_layers: 4
activation: "relu"
datamodule:
batch_size: 8
Project Organization
├──.venv <- Local poetry environment
│ └──.gitkeep
├── configs
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── logs <- Tensorboard logging folder; Will be created by training
├── experiments <- All your runs will be saved here; Will be created by training
├── mlflow <- MLflow logging folder; Will be created by training
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── test <- Data dictionaries, manuals, and all other explanatory materials.
├── scalingtheunet <- Source code for use in this project.
│ │
│ ├── data <- Scripts to download or generate data
│ │ ├── __init__.py
│ │ └── make_dataset.py <- TODO
│ ├── dataloaders <- Scripts to handel and load the preprocessed data
│ │ ├── __init__.py <- TODO
│ │ └── hzg_mg_tomo.py <- TODO
│ ├── evaluation <- Scripts to do evaluation of the results
│ │ └── __init__.py <- TODO
│ ├── executor <- Scripts to train, eval and test models
│ │ ├── __init__.py <- TODO
│ │ ├── testing_9axes.py
│ │ └── training.py
│ ├── models <- Scripts to define model architecture
│ │ ├── __init__.py <- TODO
│ │ └── simple_unet.py
│ ├── utils <- TODO
│ │ ├── __init__.py <- TODO
│ │ ├── file_system.py <- TODO
│ │ ├── logger.py <- TODO
│ │ ├── losses.py <- TODO
│ │ ├── metrics.py <- TODO
│ │ ├── my_callback.py <- TODO
│ │ └── utils.py <- TODO
│ │
│ ├── visualization <- Scripts to create exploratory and results oriented
│ │ visualizations
│ └── __init__.py <- Makes {{ cookiecutter.module_name }} a Python module
│
├── .editorconfig <- file with format specification. You need to install
│ the required plugin for your IDE in order to enable it.
├── .gitignore <- file that specifies what should we commit into
│ the repository and we should not.
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── poetry.toml <- poetry config file to install enviroment locally
├── poetry.lock <- lock file for dependencies. It is used to install exactly
│ the same versions of dependencies on each build
├── pyproject.toml <- The project's dependencies for reproducing the
│ analysis environment
├── README.md <- The top-level README for developers using this project.
└── setup.cfg <- configuration file, that is used by all tools in this project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scalingtheunet-0.1.0.tar.gz
.
File metadata
- Download URL: scalingtheunet-0.1.0.tar.gz
- Upload date:
- Size: 38.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0a2 CPython/3.8.11 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2622c9069013460690b42a91001f57e57c864e9cf087deb3a0b50385784d5a2d |
|
MD5 | fc575d0b591b12259bce582a807d979c |
|
BLAKE2b-256 | 18ff93eadddb91602112f5e8e3b86fc8d6f6ac78635a999b5d60d24bcea84b46 |
File details
Details for the file scalingtheunet-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: scalingtheunet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0a2 CPython/3.8.11 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d527d74f42f3d9cec2f2ec5a3b14ade80f0dac1e6c7322bd8ac183d6fa485d4 |
|
MD5 | 477f282f3b848eb11139bdc1356b6bf2 |
|
BLAKE2b-256 | 8e04f19a54daa5597cff3170d84cedd3712c9dc5d1fe83a8a1a0820ce16588e1 |