Tool for reading WSI files from proprietary formats and optionally convert them to to DICOM

These details have not been verified by PyPI

Project links

Repository

Project description

wsidicomizer

wsidicomizer is a Python library for opening WSIs in proprietary formats and optionally convert them to DICOM. The aims of the project are:

Provide read support for various proprietary formats.
Provide lossless conversion for files supported by opentile.
Provide 'as good as possible' conversion for other formats.
Simplify the encoding of WSI metadata into DICOM.

Supported formats

wsidicomizer currently supports the following formats:

Aperio svs (lossless)
Hamamatsu ndpi (lossless)
Philips tiff (lossless)
Zeiss czi (lossy)
Optional: Formats supported by Bioformats (lossy)

With the openslide extra the following formats are also supported:

Mirax mrxs (lossy)
Leica scn (lossy)
Sakura svslide (lossy)
Trestle tif (lossy)
Ventana bif, tif (lossy)
Hamamatsu vms, vmu (lossy)

The bioformats extra by default enables lossy support for the BSD-licensed Bioformat formats.

The isyntax extra enables lossy single-thread support for isynax files.

For czi and isyntax only the base level is read from file. To produce a conversion with full levels, use add_missing_levels in the save() method.

Installation

Install wsidicomizer from pypi

pip install wsidicomizer

See Openslide support and Bioformats support for how to install optional extras.

Install libjpeg-turbo Install libjpeg-turbo either as binary from https://libjpeg-turbo.org/ or using your package manager. For Windows, you also need to add libjpeg-turbo's bin-folder to the environment variable 'Path'

Important note

Please note that this is an early release and the API is not frozen yet. Function names and functionality is prone to change.

Requirements

wsidicomizer requires python >=3.8 and uses numpy, pydicom, highdicom, imagecodecs, PyTurboJPEG, opentile, and wsidicom.

Basic cli-usage

Convert a wsi-file into DICOM using cli-interface

wsidicomizer -i 'path_to_wsi_file' -o 'path_to_output_folder'

Options

  -i, --input PATH                Path to input wsi file.  [required]
  -o, --output PATH               Path to output folder. Folder will be
                                  created and must not exist. If not specified
                                  a folder named after the input file is
                                  created in the same path.
  -t, --tile-size INTEGER         Tile size (same for width and height).
                                  Required for ndpi and openslide formats.
  -m, --metadata PATH             Path to json metadata that will override
                                  metadata from source image file.
  -d, --default-metadata PATH     Path to json metadata that will be used as
                                  default values.
  -l, --levels INTEGER            Pyramid levels to include, if not all. E.g.
                                  0 1 for base and first pyramid layer. Can be
                                  specified multiple times.
  --add-missing-levels            If to add missing dyadic levels up to the
                                  single tile level.
  --label PATH                    Optional label image to use instead of label
                                  found in file.
  --no-label                      If not to include label
  --no-overview                   If not to include overview
  --no-confidential               If not to include confidential metadata
  -w, --workers INTEGER           Number of worker threads to use
  --chunk-size INTEGER            Number of tiles to give each worker at a
                                  time
  --format [jpeg|jpeg2000|htjpeg2000|jpegxl]
                                  Encoding format to use if re-encoding.
  --quality FLOAT                 Quality to use if re-encoding. It is not
                                  recommended to use > 95 for jpeg. Use < 1 or
                                  > 1000 for lossless jpeg2000.
  --subsampling [r444|r422|r420|r411|r440]
                                  Subsampling option if using jpeg for re-
                                  encoding. Use '444' for no subsampling,
                                  '422' for 2x1 subsampling, and '420' for 2x2
                                  subsampling.
  --offset-table [basic|extended|empty]
                                  Offset table to use.
  --source [opentile|tiffslide|openslide|czi|isyntax|bioformats]
                                  Source library to use for reading the input
                                  file. If not specified, the library will be
                                  chosen based on file type.
  --help                          Show this message and exit.

Using the no-confidential-flag properties according to DICOM Basic Confidentiality Profile are not included in the output file. Properties otherwise included are currently:

Acquisition DateTime
Device Serial Number

Basic usage

Create metadata (Optional)

from wsidicom.conceptcode import (
    AnatomicPathologySpecimenTypesCode,
    ContainerTypeCode,
    SpecimenCollectionProcedureCode,
    SpecimenEmbeddingMediaCode,
    SpecimenFixativesCode,
    SpecimenSamplingProcedureCode,
    SpecimenStainsCode,
)
from wsidicom.metadata import (
    Collection,
    Embedding,
    Equipment,
    Fixation,
    Label,
    Patient,
    Sample,
    Series,
    Slide,
    SlideSample,
    Specimen,
    Staining,
    Study,
)
from wsidicomizer.metadata import WsiDicomizerMetadata

study = Study(identifier="Study identifier")
series = Series(number=1)
patient = Patient(name="FamilyName^GivenName")
label = Label(text="Label text")
equipment = Equipment(
    manufacturer="Scanner manufacturer",
    model_name="Scanner model name",
    device_serial_number="Scanner serial number",
    software_versions=["Scanner software versions"],
)

specimen = Specimen(
    identifier="Specimen",
    extraction_step=Collection(method=SpecimenCollectionProcedureCode("Excision")),
    type=AnatomicPathologySpecimenTypesCode("Gross specimen"),
    container=ContainerTypeCode("Specimen container"),
    steps=[Fixation(fixative=SpecimenFixativesCode("Neutral Buffered Formalin"))],
)

block = Sample(
    identifier="Block",
    sampled_from=[specimen.sample(method=SpecimenSamplingProcedureCode("Dissection"))],
    type=AnatomicPathologySpecimenTypesCode("tissue specimen"),
    container=ContainerTypeCode("Tissue cassette"),
    steps=[Embedding(medium=SpecimenEmbeddingMediaCode("Paraffin wax"))],
)

slide_sample = SlideSample(
    identifier="Slide sample",
    sampled_from=block.sample(method=SpecimenSamplingProcedureCode("Block sectioning")),
)

slide = Slide(
    identifier="Slide",
    stainings=[
        Staining(
            substances=[
                SpecimenStainsCode("hematoxylin stain"),
                SpecimenStainsCode("water soluble eosin stain"),
            ]
        )
    ],
    samples=[slide_sample],
)
metadata = WsiDicomizerMetadata(
    study=study,
    series=series,
    patient=patient,
    equipment=equipment,
    slide=slide,
    label=label,
)

Convert a wsi-file into DICOM using python-interface

from wsidicomizer import WsiDicomizer
created_files = WsiDicomizer.convert(
    filepath=path_to_wsi_file,
    output_path=path_to_output_folder,
    metadata=metadata,
    tile_size=tile_size
)

Import a wsi file as a WsiDicom object.

from wsidicomizer import WsiDicomizer
wsi = WsiDicomizer.open(path_to_wsi_file)
region = wsi.read_region((1000, 1000), 6, (200, 200))
wsi.close()

Metadata handling

The open() and convert() methods of WsiDicomizer takes three parameters that are important for inserting additional metadata into the DICOM dataset of the converted image:

metadata
default_metadata
metadata_post_processor

Metadata merging

When creating the DICOM dataset, the metadata provided in the metadata and default_metadata parameters are merged with metadata that is parsed from the source image file, with the following descending preference:

Metadata from the metadata parameter
Metadata from the source image
Metadata from the default_metadata parameter

For example:

equipment in the metadata-parameter metadata will override the equipment metadata from the source image (if present).
optical_paths in the default_metadata-parameter metadata will be overriden by any optical_paths present in the metadata parameter metadata or source image metadata.

Note that merging is also performed on nested metadata, e.g. focus_method in an Image can be merged from the different sources.

Metadata post processing

After the metadata merge a pydicom Dataset is created from the result. Additional post processing can be performed using the metadata_post_processor parameter. This can be another Dataset, in which case the merged dataset is updated with (i.e. overwritten by) the provided dataset:

from pydicom import Dataset

dataset = Dataset()
dataset.PatientAge = "042Y"

WsiDicomizer.convert(
    filepath=path_to_wsi_file,
    output_path=path_to_output_folder,
    metadata_post_processor=dataset
)

For more complex processing a callback function that takes the merged Dataset and WsiMetadata as parameters and returns an updated Dataset can be used:

from pydicom import Dataset
from wsidicom.metadata import WsiMetadata

def metadata_post_processor(dataset: Dataset, metadata: WsiMetadata) -> Dataset:
    dataset.PatientAge = "042Y"
    return dataset

WsiDicomizer.convert(
    filepath=path_to_wsi_file,
    output_path=path_to_output_folder,
    metadata_post_processor=metadata_post_processor
)

JSON metadata

WsiDicom provides methods for serializing and deserialising metadata to and from JSON. This is useful for example for providing metadata when performing conversion using the cli. As there is not yet any documentation on the JSON schema, the simplest way to produce metadata in the JSON-format is to first construct it in Python and then calling the provided serializer:

import json
from wsidicom.metadata.schema.json import WsiMetadataJsonSchema
metadata = WsiDicomizerMetadata(
    study=study,
    series=series,
    patient=patient,
    equipment=equipment,
    slide=slide,
    label=label,
)
with open('metadata.json', 'w') as f:
    json.dump(WsiMetadataJsonSchema().dump(metadata), f, indent=4)

Openslide support

Installation

Support for reading images using Openslide c library can optionally be enabled by installing wsidicomizer with the openslide extra:

pip install wsidicomizer[openslide]

The OpenSlide extra requires the OpenSlide library to be installed separately. This can be done through pip:

pip install openslide-bin

Alternative instructions for how to install OpenSlide is available on https://openslide.org/download/

Bioformats support

Installation

Support for reading images using Bioformats java library can optionally be enabled by installing wsidicomizer with the bioformats extra:

pip install wsidicomizer[bioformats]

The bioformats extra enables usage of the bioformats module.The required Bioformats java library (jar-file) is downloaded automatically when the module is imported using scyjava.

Using

As the Bioformats library is a java library it needs to run in a java virtual machine (JVM). A JVM is started automatically when the bioformats module is imported. The JVM can´t be restarted in the same Python inteprenter, and is therefore left running once started. If you want to shutdown the JVM (without closing the Python inteprenter) you can call the shutdown_jvm()-method:

import scyjava
scyjava.shutdown_jvm()

Due to the need to start a JVM, the bioformats module is not imported when using the default WsiDicomzer-class unless SourceIdentifier.BIOFORMATS is used as preferred_source:

from wsidicomizer import SouceIdentifier, WsiDicomizer

with WsiDicomizer('input file', preferred_source=SourceIdentifier.BIOFORMASTS) as wsi:
    ...

Bioformats version

The Bioformats java library is available in two versions, one with BSD and one with GPL2 license, and can read several WSI formats. However, most formats are only available in the GPL2 version. Due to the licensing incompatibility between Apache 2.0 and GPL2, wsidicomizer is distributed with a default setting of using the BSD licensed library. The loaded Biformats version can be changed by the user by setting the BIOFORMATS_VERSION environmental variable from the default value bsd:8.3.0.

Limitations

Files with z-stacks or multiple focal paths are currently fully not supported.

Other DICOM python tools

Contributing

We welcome any contributions to help improve this tool for the WSI DICOM community!

We recommend first creating an issue before creating potential contributions to check that the contribution is in line with the goals of the project. To submit your contribution, please issue a pull request on the imi-bigpicture/wsidicomizer repository with your changes for review.

Our aim is to provide constructive and positive code reviews for all submissions. The project relies on gradual typing and roughly follows PEP8. However, we are not dogmatic. Most important is that the code is easy to read and understand.

Acknowledgement

This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: <www.imi.europa.eu>

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.24.0

Dec 9, 2025

0.23.1

Sep 30, 2025

0.23.0

Sep 29, 2025

0.22.1

Jun 7, 2025

0.22.0

May 15, 2025

0.21.1

May 14, 2025

0.21.0

Apr 25, 2025

0.20.0

Mar 29, 2025

0.19.0

Mar 13, 2025

0.18.0

Feb 18, 2025

0.17.0

Jan 30, 2025

0.16.0

Jan 29, 2025

0.15.1

Jan 7, 2025

0.15.0

Oct 21, 2024

0.14.2

Jul 1, 2024

0.14.1

May 7, 2024

0.14.0

Apr 13, 2024

0.13.2

Mar 20, 2024

0.13.1

Feb 22, 2024

0.13.0

Feb 15, 2024

0.12.1

Jan 12, 2024

0.12.0

Jan 12, 2024

0.11.0

Dec 10, 2023

0.10.2

Sep 1, 2023

0.10.1

Jul 6, 2023

0.10.0

Jun 30, 2023

0.9.3

May 17, 2023

0.9.2

May 11, 2023

0.9.1

Apr 16, 2023

0.9.0

Apr 3, 2023

0.8.0

Mar 21, 2023

0.7.0

Feb 13, 2023

0.6.0

Jan 25, 2023

0.5.1

Jan 16, 2023

0.5.0

Jan 16, 2023

0.4.0

Dec 13, 2022

0.3.1

Sep 9, 2022

0.3.0

Jun 30, 2022

0.2.0

May 23, 2022

0.1.3

Feb 14, 2022

0.1.2

Dec 21, 2021

0.1.1

Dec 2, 2021

0.1.0

Dec 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsidicomizer-0.24.0.tar.gz (195.7 kB view details)

Uploaded Dec 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wsidicomizer-0.24.0-py3-none-any.whl (72.8 kB view details)

Uploaded Dec 9, 2025 Python 3

File details

Details for the file wsidicomizer-0.24.0.tar.gz.

File metadata

Download URL: wsidicomizer-0.24.0.tar.gz
Upload date: Dec 9, 2025
Size: 195.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.30

File hashes

Hashes for wsidicomizer-0.24.0.tar.gz
Algorithm	Hash digest
SHA256	`fdad97be4a21c4adcc8c6585d53dbc1837fe5d2e2fd6f1089f4616afb83cdc13`
MD5	`d7ff2fb0acf2331dfe39de299a49c64c`
BLAKE2b-256	`a31a1a8919e80eb878993d382e6dd67be2f0f811b56993005fab0c8b1b30b282`

See more details on using hashes here.

File details

Details for the file wsidicomizer-0.24.0-py3-none-any.whl.

File metadata

Download URL: wsidicomizer-0.24.0-py3-none-any.whl
Upload date: Dec 9, 2025
Size: 72.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.30

File hashes

Hashes for wsidicomizer-0.24.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`482dd3b81d37df526c5b0771e978ae8d679488d840b21d700c74aac9ed5f78ec`
MD5	`391c2ffdc97fcf099d9c6ec5fae3a268`
BLAKE2b-256	`e91f70c62e4335f7e9d29d4c3e3ef9274550e8916c3d9274d412398a87c72b25`

See more details on using hashes here.

wsidicomizer 0.24.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

wsidicomizer

Supported formats

Installation

Important note

Requirements

Basic cli-usage

Options

Basic usage

Metadata handling

Metadata merging

Metadata post processing

JSON metadata

Openslide support

Installation

Bioformats support

Installation

Using

Bioformats version

Limitations

Other DICOM python tools

Contributing

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes