Tools for computational pathology
🤖🔬 PathML: Tools for computational pathology
⭐ PathML objective is to lower the barrier to entry to digital pathology
Imaging datasets in cancer research are growing exponentially in both quantity and information density. These massive datasets may enable derivation of insights for cancer research and clinical care, but only if researchers are equipped with the tools to leverage advanced computational analysis approaches such as machine learning and artificial intelligence. In this work, we highlight three themes to guide development of such computational tools: scalability, standardization, and ease of use. We then apply these principles to develop PathML, a general-purpose research toolkit for computational pathology. We describe the design of the PathML framework and demonstrate applications in diverse use cases.
🚀 The fastest way to get started?
docker pull pathml/pathml && docker run -it -p 8888:8888 pathml/pathml
dev branch is under active development, with experimental features, bug fixes, and refactors that may happen at any time!
Stable versions are available as tagged releases on GitHub, or as versioned releases on PyPI
There are several ways to install
pip installfrom PyPI (recommended for users)
- Clone repo to local machine and install from source (recommended for developers/contributors)
- Use the PathML Docker container
Options (1) and (2) require that you first install all external dependencies:
- JDK 8
We recommend using conda for environment management. Download Miniconda here
Note: these instructions are for Linux. Commands may be different for other platforms.
Installation option 1: pip install
Create conda environment:
conda create --name pathml python=3.8 conda activate pathml
Install external dependencies (Linux) with Apt:
sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev
Install external dependencies (MacOS) with Brew:
brew install openslide
Install OpenJDK 8:
conda install openjdk==8.0.152
Optionally install CUDA (instructions here)
PathML from PyPI:
pip install pathml
Installation option 2: clone repo and install from source
git clone https://github.com/Dana-Farber-AIOS/pathml.git cd pathml
Create conda environment:
conda env create -f environment.yml conda activate pathml
Optionally install CUDA (instructions here)
PathML from source:
pip install -e .
Installation option 3: Docker
First, download or build the PathML Docker container:
Option A: download PathML container from Docker Hub
docker pull pathml/pathml:latest
Optionally specify a tag for a particular version, e.g.
docker pull pathml/pathml:2.0.2. To view possible tags, please refer to the PathML DockerHub page.
Option B: build docker container from source
git clone https://github.com/Dana-Farber-AIOS/pathml.git cd pathml docker build -t pathml/pathml .
Then connect to the container:
docker run -it -p 8888:8888 pathml/pathml
The above command runs the container, which is configured to spin up a jupyter lab session and expose it on port 8888.
The terminal should display a URL to the jupyter lab session starting with
Navigate to that page and you should connect to the jupyter lab session running on the container with the pathml
environment fully configured. If a password is requested, copy the string of characters following the
token= in the
Note that the docker container requires extra configurations to use with GPU.
Note that these instructions assume that there are no other processes using port 8888.
Please refer to the
Docker run documentation for further instructions
on accessing the container, e.g. for mounting volumes to access files on a local machine from within the container.
Option 4: Google Colab
To get PathML running in a Colab environment:
!pip install openslide-python !apt-get install openslide-tools !apt-get install openjdk-8-jdk-headless -qq > /dev/null os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" !update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java !java -version !pip install pathml
Thanks to all of our open-source collaborators for helping maintain these installation instructions!
Please open an issue for any bugs or other problems during installation process.
To use GPU acceleration for model training or other tasks, you must install CUDA. This guide should work, but for the most up-to-date instructions, refer to the official PyTorch installation instructions.
Check the version of CUDA:
Install correct version of
# update this command with your CUDA version number conda install cudatoolkit=11.0
After installing PyTorch, optionally verify successful PyTorch installation with CUDA support:
python -c "import torch; print(torch.cuda.is_available())"
Using with Jupyter
Jupyter notebooks are a convenient way to work interactively. To use
PathML in Jupyter notebooks:
Set JAVA_HOME environment variable
PathML relies on Java to enable support for reading a wide range of file formats.
PathML in Jupyter, you may need to manually set the
JAVA_HOME environment variable
specifying the path to Java. To do so:
- Get the path to Java by running
echo $JAVA_HOMEin the terminal in your pathml conda environment (outside of Jupyter)
- Set that path as the
JAVA_HOMEenvironment variable in Jupyter:
import os os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed
Register environment as an IPython kernel
conda activate pathml conda install ipykernel python -m ipykernel install --user --name=pathml
This makes the pathml environment available as a kernel in jupyter lab or notebook.
PathML is an open source project. Consider contributing to benefit the entire community!
There are many ways to contribute to
- Submitting bug reports
- Submitting feature requests
- Writing documentation and examples
- Fixing bugs
- Writing code for new features
- Sharing workflows
- Sharing trained model parameters
PathMLwith colleagues, students, etc.
See contributing for more details.
If you use
PathML please cite:
- J. Rosenthal et al., "Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the PathML toolkit for computational pathology." Molecular Cancer Research, 2022.
So far, PathML was used in the following manuscripts:
- J. Linares et al. Molecular Cell 2021
- A. Shmatko et al. Nature Cancer 2022
- J. Pocock et al. Nature Communications Medicine 2022
- S. Orsulic et al. Frontiers in Oncology 2022
- D. Brundage et al. arXiv 2022
- A. Marcolini et al. SoftwareX 2022
- M. Rahman et al. Bioengineering 2022
- C. Lama et al. bioRxiv 2022
- the list continues here 🔗 for 2023 and onwards
|This is where in the world our most enthusiastic supporters are located:
and this is where they work:
The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.
Commercial license options are available also.
Questions? Comments? Suggestions? Get in touch!
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.