PathoPatch - Accelerating Artificial Intelligence Based Whole Slide Image Analysis with an Optimized Preprocessing Pipeline
Project description
PathoPatch:
Accelerating Artificial Intelligence Based Whole Slide Image Analysis with an Optimized Preprocessing Pipeline
Installation
Prerequisite
- Openslide (>= 3.4.1) needs to be installed (either directly https://openslide.org/download/ or via conda)
OpenSlide conda
- Recommended:conda install conda-forge::openslide=4.0.0
for DICOM support - Generic/minimum version:conda-forge::openslide>=3.4.1
- Openslide python:
pip install openslide-python
- Optional for speedup: cuCIM installation instructions
PIP-Package
The package can be found here: https://pypi.org/project/pathopatch/
Installation: pip install pathopatch
Development
- Install pre-commit with
pre-commit install
Usage
We provide different use cases - Offline-Dataset (Store on Disk :floppy_disk:) and Inference-Dataset for :zap: PyTorch :zap:
In our Pre-Processing pipeline, we are able to extract quadratic patches from detected tissue areas, load annotation files (.json
) and apply color normlizations. We make use of the popular OpenSlide library, but extended it with the RAPIDS cuCIM framework for a speedup in patch-extraction.
We support all OpenSlide file formats + .dcm-File format (DICOM), by utilizing
wsidicom
andwsidicomizer
.
Offline-Dataset
In general, our framework has the following commands registered in your shell:
wsi_extraction: Extract patches with specific configuration and store them on the disk annotation_conversion: Can be used to convert annotations macenko_vector_generation: To generate new macenko vectors for a new dataset, if custom vectors are tend to be used
Parameter handover and CLI
Option 1: Config.yaml
Arguments are passed via CLIs. In addition to the CLI, also a configuration file can be passed via
wsi_extraction --config path/to/config.yaml
Exemplary configuration file: patch_extraction.yaml.
Option 2: CLI
The CLI of the main script for patch extraction (wsi_extraction) is as follows:
wsi_extraction [-h]
[--wsi_paths WSI_PATHS]
[--wsi_filelist WSI_FILELIST]
[--output_path OUTPUT_PATH]
[--wsi_extension {svs}]
[--config CONFIG]
[--patch_size PATCH_SIZE]
[--patch_overlap PATCH_OVERLAP]
[--target_mpp TARGET_MPP]
[--target_mag TARGET_MAG]
[--downsample DOWNSAMPLE]
[--level LEVEL]
[--context_scales [CONTEXT_SCALES ...]]
[--check_resolution CHECK_RESOLUTION]
[--processes PROCESSES]
[--overwrite]
[--annotation_paths ANNOTATION_PATHS]
[--annotation_extension {json,xml}]
[--incomplete_annotations]
[--label_map_file LABEL_MAP_FILE]
[--save_only_annotated_patches]
[--save_context_without_mask]
[--exclude_classes EXCLUDE_CLASSES]
[--store_masks]
[--overlapping_labels]
[--normalize_stains]
[--normalization_vector_json NORMALIZATION_VECTOR_JSON]
[--min_intersection_ratio MIN_INTERSECTION_RATIO]
[--tissue_annotation TISSUE_ANNOTATION]
[--tissue_annotation_intersection_ratio TISSUE_ANNOTATION_INTERSECTION_RATIO]
[--masked_otsu]
[--otsu_annotation OTSU_ANNOTATION]
[--filter_patches FILTER_PATCHES]
[--apply_prefilter APPLY_PREFILTER]
[--log_path LOG_PATH]
[--log_level {critical,error,warning,info,debug}]
[--hardware_selection {cucim,openslide,wsidicom}]
[--wsi_magnification WSI_MAGNIFICATION]
[--wsi_mpp WSI_MPP]
options:
-h, --help show this help message and exit
--wsi_paths WSI_PATHS
Path to the folder where all WSI are stored or path to a
single WSI-file. (default: None)
--wsi_filelist WSI_FILELIST
Path to a csv-filelist with WSI files (separator: `,`), if
provided just these files are used.Must include full paths
to WSIs, including suffixes.Can be used as an replacement
for the wsi_paths option.If both are provided, yields an
error. (default: None)
--output_path OUTPUT_PATH
Path to the folder where the resulting dataset should be
stored. (default: None)
--wsi_extension {svs,tiff,tif,bif,scn,ndpi,vms,vmu}
The extension types used for the WSI files, the options
are: ['svs', 'tiff', 'tif', 'bif', 'scn', 'ndpi', 'vms',
'vmu'] (default: None)
--config CONFIG Path to a config file. The config file can hold the same
parameters as the CLI. Parameters provided with the CLI are
always having precedence over the parameters in the config
file. (default: None)
--patch_size PATCH_SIZE
The size of the patches in pixel that will be retrieved
from the WSI, e.g. 256 for 256px (default: None)
--patch_overlap PATCH_OVERLAP
The percentage amount pixels that should overlap between
two different patches. Please Provide as integer between 0
and 100, indicating overlap in percentage. (default: None)
--target_mpp TARGET_MPP
If this parameter is provided, the output level of the WSI
corresponds to the level that is at the target microns per
pixel of the WSI. Alternative to target_mag, downsaple and
level. Highest priority, overwrites all other setups for
magnifcation, downsample, or level. (default: None)
--target_mag TARGET_MAG
If this parameter is provided, the output level of the WSI
corresponds to the level that is at the target
magnification of the WSI. Alternative to target_mpp,
downsaple and level. High priority, just target_mpp has a
higher priority, overwrites downsample and level if
provided. (default: None)
--downsample DOWNSAMPLE
Each WSI level is downsampled by a factor of 2, downsample
expresses which kind of downsampling should be used with
respect to the highest possible resolution. Medium
priority, gets overwritten by target_mag and target_mpp if
provided, but overwrites level. (default: None)
--level LEVEL The tile level for sampling, alternative to downsample.
Lowest priority, gets overwritten by target_mag and
downsample if they are provided. (default: None)
--context_scales [CONTEXT_SCALES ...]
Define context scales for context patches. Context patches
are centered around a central patch. The context-patch size
is equal to the patch-size, but downsampling is different
(default: None)
--check_resolution CHECK_RESOLUTION
If a float value is supplies, the program checks whether
the resolution of all images corresponds to the given value
(default: None)
--processes PROCESSES
The number of processes to use. (default: None)
--overwrite Overwrite the patches that have already been created in
case they already exist. Removes dataset. Handle with care!
(default: None)
--annotation_paths ANNOTATION_PATHS
Path to the subfolder where the XML/JSON annotations are
stored or path to a file (default: None)
--annotation_extension {json}
The extension types used for the annotation files, the
options are: ['json'] (default: None)
--incomplete_annotations
Set to allow WSI without annotation file (default: None)
--label_map_file LABEL_MAP_FILE
The path to a json file that contains the mapping between
the annotation labels and some integers; an example can be
found in examples (default: None)
--save_only_annotated_patches
If true only patches containing annotations will be stored
(default: None)
--save_context_without_mask
This is helpful for extracting patches, that are not within
a mask, but needed for the Valuing Vicinity Segmentation
Algorithms. This flag is specifically helpful if only fully
annotated patches should be extracted from a region of
interest (ROI) and their masks are stored, but also
sourrounding neighbourhood patches (without mask) are
needed. (default: None)
--exclude_classes EXCLUDE_CLASSES
Can be used to exclude annotation classes (default: None)
--store_masks Set to store masks per patch. Defaults to false (default:
None)
--overlapping_labels Per default, labels (annotations) are mutually exclusive.
If labels overlap, they are overwritten according to the
label_map.json ordering (highest number = highest priority)
(default: None)
--normalize_stains Uses Macenko normalization on a portion of the whole slide
image (default: None)
--normalization_vector_json NORMALIZATION_VECTOR_JSON
The path to a JSON file where the normalization vectors are
stored (default: None)
--adjust_brightness Normalize brightness in a batch by clipping to 90 percent.
Not recommended, but kept for legacy reasons (default:
None)
--min_intersection_ratio MIN_INTERSECTION_RATIO
The minimum intersection between the tissue mask and the
patch. Must be between 0 and 1. 0 means that all patches
are extracted. (default: None)
--tissue_annotation TISSUE_ANNOTATION
Can be used to name a polygon annotation to determine the
tissue area. If a tissue annotation is provided, no Otsu-
thresholding is performed (default: None)
--tissue_annotation_intersection_ratio TISSUE_ANNOTATION_INTERSECTION_RATIO
Intersection ratio with tissue annotation. Helpful, if ROI
annotation is passed, which should not interfere with
background ratio. If not provided, the default
min_intersection_ratio with the background is used.
(default: None)
--masked_otsu Use annotation to mask the thumbnail before otsu-
thresholding is used (default: None)
--otsu_annotation OTSU_ANNOTATION
Can be used to name a polygon annotation to determine the
area for masked otsu thresholding. Seperate multiple labels
with ' ' (whitespace) (default: None)
--filter_patches Post-extraction patch filtering to sort out artefacts,
marker and other non-tissue patches with a DL model. Time
consuming. Defaults to False. (default: None)
--apply_prefilter Pre-extraction mask filtering to remove marker from mask
before applying otsu. Defaults to False. (default: None)
--log_path LOG_PATH Path where log files should be stored. Otherwise, log files
are stored in the output folder (default: None)
--log_level {critical,error,warning,info,debug}
Set the logging level. Options are ['critical', 'error',
'warning', 'info', 'debug'] (default: None)
--hardware_selection {cucim,openslide,wsidicom}
Select hardware device (just if available, otherwise always
cucim). Defaults to None. (default: None)
--wsi_magnification WSI_MAGNIFICATION
Manual WSI magnification, but just applies if metadata
cannot be derived from OpenSlide (e.g., for .tiff files).
(default: None)
--wsi_mpp WSI_MPP Manual WSI MPP, but just applies if metadata cannot be
derived from OpenSlide (e.g., for .tiff files). (default:
None)
Option 3: CLI + Config
Both can be combined, but arguments in the CLI have precedence!
Inference-Dataset (PyTorch)
TBD, Elements: LivePatchWSIConfig, LivePatchWSIDataset, LivePatchWSIDataloader Link
Usage:
patch_config = LivePatchWSIConfig(
wsi_path="/Users/fhoerst/Fabian-Projekte/Selocan/RicardoScans/266819.svs",
patch_size=256,
patch_overlap=0,
target_mpp=0.3,
target_mpp_tolerance=0.1,
)
patch_dataset = LivePatchWSIDataset(patch_config, logger)
patch_dataloader = LivePatchWSIDataloader(patch_dataset, batch_size=8)
for batch in patch_dataloader:
...
Resulting Dataset Structure
In general, the folder structure for a preprocessed dataset looks like this: The aim of pre-processing is to create one dataset per WSI in the following structure:
WSI_Name
├── annotation_masks # thumbnails of extracted annotation masks
│ ├── all_overlaid.png # all with same dimension as the thumbnail
│ ├── tumor.png
│ └── ...
├── context # context patches, if extracted
│ ├── 2 # subfolder for each scale
│ │ ├── WSI_Name_row1_col1_context_2.png
│ │ ├── WSI_Name_row2_col1_context_2.png
│ │ └── ...
│ └── 4
│ │ ├── WSI_Name_row1_col1_context_2.png
│ │ ├── WSI_Name_row2_col1_context_2.png
│ │ └── ...
├── masks # Mask (numpy) files for each patch -> optional folder for segmentation
│ ├── WSI_Name_row1_col1.npy
│ ├── WSI_Name_row2_col1.npy
│ └── ...
├── metadata # Metadata files for each patch
│ ├── WSI_Name_row1_col1.yaml
│ ├── WSI_Name_row2_col1.yaml
│ └── ...
├── patches # Patches as .png files
│ ├── WSI_Name_row1_col1.png
│ ├── WSI_Name_row2_col1.png
│ └── ...
├── thumbnails # Different kind of thumbnails
│ ├── thumbnail_mpp_5.png
│ ├── thumbnail_downsample_32.png
│ └── ...
├── tissue_masks # Tissue mask images for checking
│ ├── mask.png # all with same dimension as the thumbnail
│ ├── mask_nogrid.png
│ └── tissue_grid.png
├── mask.png # tissue mask with green grid
├── metadata.yaml # WSI metdata for patch extraction
├── patch_metadata.json # Patch metadata of WSI merged in one file
└── thumbnail.png # WSI thumbnail
Further information
For more information, check out the git.
License
PathoPatcher by Fabian Hörst, University Hospital Essen, is licensed under CC BY-NC-SA 4.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pathopatch-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d81b135fb81778c8dfdf8a7a2fbd3d93c272793a21cdbd740302b79fe5791bc9 |
|
MD5 | db37f8fa92c1aa91860538c9a974773a |
|
BLAKE2b-256 | 422a1943d2bb630c1b4e827d9936e8aba7241dec260528b33255022982f7e4ee |