Skip to main content

Read and process histological slide images with python!

Project description

HistoSlice

PyPI - Version PyPI - Python Version GitHub License Check Docs codecov

Preprocessing large medical images for machine learning made easy!

DocumentationPyPI

Description

HistoSlice makes is easy to prepare your histological slide images for deep learning models. You can easily cut large slide images into smaller tiles and then preprocess those tiles (remove tiles with shitty tissue, finger marks etc).

[!NOTE] This project was forked from HistoPrep, and further modified for additional features and improvements.

Installation

uv add histoslice
# or
pip install histoslice

Usage

[!NOTE] HistoSlice uses pyvips as the only slide backend. The backend argument is still accepted for compatibility, but it always resolves to pyvips.

If Pillow is built without JPEG support, HistoSlice will automatically save tiles/thumbnails as .png and update filenames accordingly. Developers can check availability via histoslice.functional.has_jpeg_support().

Typical workflow for training deep learning models with histological images is the following:

  1. Cut each slide image into smaller tile images.
  2. Preprocess smaller tile images by removing tiles with bad tissue, staining artifacts.
histoslice --input './train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5 --backend pyvips --metrics --thumbnail

Or you can use the HistoSlice python API to do the same thing!

from histoslice import SlideReader

# Read slide image.
reader = SlideReader("./slides/slide_with_ink.jpeg")
# Detect tissue.
threshold, tissue_mask = reader.get_tissue_mask(level=-1)
# Extract overlapping tile coordinates with less than 50% background.
tile_coordinates = reader.get_tile_coordinates(
    tissue_mask, width=512, overlap=0.5, max_background=0.5
)
# Save tile images with image metrics for preprocessing.
tile_metadata, failures = reader.save_regions(
    "./train_tiles/",
    tile_coordinates,
    threshold=threshold,
    save_metrics=True,
    save_thumbnail=True
)
if failures:
    print(f"Some tiles failed: {len(failures)}")

Let's take a look at the output and visualise the thumbnails.

train_tiles
└── slide_with_ink
    ├── metadata.parquet       # tile metadata
    ├── failures.json          # per-tile failures (only written if any failures occur)
    ├── properties.json        # tile properties
    ├── thumbnail.jpeg         # thumbnail image (or .png if JPEG support is unavailable)
    ├── thumbnail_tiles.jpeg   # thumbnail with tiles (or .png if JPEG support is unavailable)
    ├── thumbnail_tissue.jpeg  # thumbnail of the tissue mask (or .png if JPEG support is unavailable)
    └── tiles [390 entries exceeds filelimit, not opening dir]

Prostate biopsy sample Tissue mask Thumbnail with tiles

As we can see from the above images, histological slide images often contain areas that we would not like to include into our training data. Might seem like a daunting task but let's try it out!

from histoslice.utils import OutlierDetector

# Let's wrap the tile metadata with a helper class.
detector = OutlierDetector(tile_metadata)
# Cluster tiles based on image metrics.
clusters = detector.cluster_kmeans(num_clusters=4, random_state=666)
# Visualise first cluster.
reader.get_annotated_thumbnail(
    image=reader.read_level(-1), coordinates=detector.coordinates[clusters == 0]
)

Tiles in cluster 0

Now we can mark tiles in cluster 0 as outliers!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histoslice-0.6.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

histoslice-0.6.0-py3-none-any.whl (50.9 kB view details)

Uploaded Python 3

File details

Details for the file histoslice-0.6.0.tar.gz.

File metadata

  • Download URL: histoslice-0.6.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoslice-0.6.0.tar.gz
Algorithm Hash digest
SHA256 b4b74e9048a2ab846190f2257b4aa8f36c06ce04a33943639ee5e1c4e3a768dc
MD5 38a28c8dbf3c07d1d5178f9fed022a84
BLAKE2b-256 c3ea775f5078fa6d3fb0a61d4755854fe29ec27c3e3e2ad8d021ff84f6eaa8bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoslice-0.6.0.tar.gz:

Publisher: publish.yaml on rmuraix/HistoSlice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file histoslice-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: histoslice-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 50.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoslice-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 984ddb04f3b9d9c2bf230287b60fa7dff44336714dd4af9aa43b245096946be7
MD5 5131fdc54b6dbf8d1ff025c233bbf7e7
BLAKE2b-256 2f4afb8ff135c748fe192bdb7c674eb474e80a6c6988173d734776e110262eef

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoslice-0.6.0-py3-none-any.whl:

Publisher: publish.yaml on rmuraix/HistoSlice

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page