Read and process histological slide images with python!

These details have not been verified by PyPI

Project description

HistoPrep

Preprocessing large medical images for machine learning made easy!

Description • Installation • Usage • API Documentation • Citation

Description

HistoPrep makes is easy to prepare your histological slide images for deep learning models. You can easily cut large slide images into smaller tiles and then preprocess those tiles (remove tiles with shitty tissue, finger marks etc).

Installation

Install OpenSlide on your system and then install histoprep with pip!

pip install histoprep

Usage

Typical workflow for training deep learning models with histological images is the following:

Cut each slide image into smaller tile images.
Preprocess smaller tile images by removing tiles with bad tissue, staining artifacts.
Overfit a pretrained ResNet50 model, report 100% validation accuracy and publish it in Nature like everyone else.

With HistoPrep, steps 1. and 2. are as easy as accidentally drinking too much at the research group christmas party and proceeding to work remotely until June.

Let's start by cutting a slide from the PANDA kaggle challenge into small tiles.

from histoprep import SlideReader

# Read slide image.
reader = SlideReader("./slides/slide_with_ink.jpeg")
# Detect tissue.
threshold, tissue_mask = reader.get_tissue_mask(level=-1)
# Extract overlapping tile coordinates with less than 50% background.
tile_coordinates = reader.get_tile_coordinates(
    tissue_mask, width=512, overlap=0.5, max_background=0.5
)
# Save tile images with image metrics for preprocessing.
tile_metadata = reader.save_regions(
    "./train_tiles/", tile_coordinates, threshold=threshold, save_metrics=True
)

slide_with_ink: 100%|██████████| 390/390 [00:01<00:00, 295.90it/s]

Let's take a look at the output and visualise the thumbnails.

jopo666@~$ tree train_tiles
train_tiles
└── slide_with_ink
    ├── metadata.parquet       # tile metadata
    ├── properties.json        # tile properties
    ├── thumbnail.jpeg         # thumbnail image
    ├── thumbnail_tiles.jpeg   # thumbnail with tiles
    ├── thumbnail_tissue.jpeg  # thumbnail of the tissue mask
    └── tiles [390 entries exceeds filelimit, not opening dir]

Prostate biopsy sample Tissue mask Thumbnail with tiles

That was easy, but it can be annoying to whip up a new python script every time you want to cut slides, and thus it is recommended to use the HistoPrep CLI program!

# Repeat the above code for all images in the PANDA dataset!
jopo666@~$ HistoPrep --input './train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5

As we can see from the above images, histological slide images often contain areas that we would not like to include into our training data. Might seem like a daunting task but let's try it out!

from histoprep.utils import OutlierDetector

# Let's wrap the tile metadata with a helper class.
detector = OutlierDetector(tile_metadata)
# Cluster tiles based on image metrics.
clusters = detector.cluster_kmeans(num_clusters=4, random_state=666)
# Visualise first cluster.
reader.get_annotated_thumbnail(
    image=reader.read_level(-1), coordinates=detector.coordinates[clusters == 0]
)

Tiles in cluster 0

I said it was gonna be easy! Now we can mark tiles in cluster 0 as outliers and start overfitting our neural network! This was a simple example but the same code can be used to cluster all several million tiles extracted from the PANDA dataset and discard outliers simultaneously!

Citation

If you use HistoPrep to process the images for your publication, please cite the github repository.

@misc{histoprep,
  author = {Pohjonen, Joona and Ariotta, Valeria},
  title = {HistoPrep: Preprocessing large medical images for machine learning made easy!},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {https://github.com/jopo666/HistoPrep},
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.5

Jun 16, 2023

2.0.4

Jun 7, 2023

2.0.3

May 3, 2023

2.0.2

Apr 19, 2023

2.0.1

Apr 13, 2023

1.0.8

Aug 3, 2022

1.0.7

Jun 6, 2022

1.0.6

Jun 6, 2022

1.0.5

Jun 3, 2022

1.0.4

Jun 3, 2022

1.0.3

Jun 3, 2022

1.0.2

Jun 2, 2022

1.0.1

Jun 2, 2022

1.0.0

Jun 2, 2022

0.0.2.12.dev1 pre-release

Oct 13, 2021

0.0.2.11

Sep 29, 2021

0.0.2.10

Sep 29, 2021

0.0.2.9

Sep 29, 2021

0.0.2.8

Jun 1, 2021

0.0.2.7

May 25, 2021

0.0.2.6

May 24, 2021

0.0.2.5

Apr 30, 2021

0.0.2.4

Apr 21, 2021

0.0.2.3

Apr 21, 2021

0.0.2.2

Apr 20, 2021

0.0.2.1

Apr 20, 2021

0.0.2.0

Apr 18, 2021

0.0.1.9

Mar 26, 2021

0.0.1.9.dev1 pre-release

Apr 6, 2021

0.0.1.9.dev0 pre-release

Apr 6, 2021

0.0.1.8

Mar 24, 2021

0.0.1.7

Mar 23, 2021

0.0.1.6

Mar 1, 2021

0.0.1.5

Feb 26, 2021

0.0.1.5.dev1 pre-release

Feb 26, 2021

0.0.1.5.dev0 pre-release

Feb 26, 2021

0.0.1.4

Feb 24, 2021

0.0.1.3

Feb 24, 2021

0.0.1.2

Feb 23, 2021

0.0.1.1

Feb 13, 2021

0.0.1

Feb 12, 2021

0.0.1.dev13 pre-release

Feb 12, 2021

0.0.1.dev12 pre-release

Feb 12, 2021

0.0.1.dev11 pre-release

Feb 12, 2021

0.0.1.dev10 pre-release

Feb 12, 2021

0.0.1.dev9 pre-release

Feb 12, 2021

0.0.1.dev8 pre-release

Feb 12, 2021

0.0.1.dev7 pre-release

Feb 12, 2021

0.0.1.dev6 pre-release

Feb 10, 2021

0.0.1.dev5 pre-release

Feb 9, 2021

0.0.1.dev4 pre-release

Feb 5, 2021

0.0.1.dev3 pre-release yanked

Feb 5, 2021

Reason this release was yanked:

typo

0.0.1.dev2 pre-release yanked

Feb 5, 2021

Reason this release was yanked:

missing dependecies

0.0.1.dev1 pre-release yanked

Feb 5, 2021

Reason this release was yanked:

missing dependecies

0.0.1.dev0 pre-release yanked

Feb 5, 2021

Reason this release was yanked:

missing dependecies

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histoprep-2.0.5.tar.gz (35.4 kB view details)

Uploaded Jun 16, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

histoprep-2.0.5-py3-none-any.whl (42.5 kB view details)

Uploaded Jun 16, 2023 Python 3

File details

Details for the file histoprep-2.0.5.tar.gz.

File metadata

Download URL: histoprep-2.0.5.tar.gz
Upload date: Jun 16, 2023
Size: 35.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.3 Linux/5.15.0-72-generic

File hashes

Hashes for histoprep-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`a3b5495db53e0701d911adef79427473e141ff131fc79247e65b41114450be76`
MD5	`6b8a4bf1e1ae3c008d908b1e26f4c7ad`
BLAKE2b-256	`92e93078708714503b8e222e312c7393a16876d0f0f76ffc9123b7fc0201acbe`

See more details on using hashes here.

File details

Details for the file histoprep-2.0.5-py3-none-any.whl.

File metadata

Download URL: histoprep-2.0.5-py3-none-any.whl
Upload date: Jun 16, 2023
Size: 42.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.3 Linux/5.15.0-72-generic

File hashes

Hashes for histoprep-2.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19c0de3090878425fcdcc4a8edce81fb826848d76d3e315a7b2b4ca128c6bc9a`
MD5	`0bde4c8cfc135525747138fef3d1a987`
BLAKE2b-256	`4f597d2e1bf243ec17fbb75c373d3d4b50fa7d088a9c2caa2710167d92dd2b62`

See more details on using hashes here.

histoprep 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

HistoPrep

Description

Installation

Usage

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes