Skip to main content

Toolbox for analysis on segmented images from MIBI

Project description

CI Coverage Status Docker Image Version (latest by date) Read the Docs

ark-analysis

Toolbox for analyzing multiplexed imaging data.

Full documentation for the project can be found here.

Table of Contents

Pipeline Flowchart

Getting Started

Overview

This repo contains tools for analyzing multiplexed imaging data. The assumption is that you've already performed any necessary image processing on your data (such as denoising, background subtraction, autofluorescence correction, etc), and that it is ready to be analyzed. For MIBI data, we recommend using the toffy processing pipeline.

We have recorded workshop talks which complement the repository. MIBI Workshop Playlist.

1. Segmentation

The segmentation notebook will walk you through the process of using Mesmer to segment your image data. This includes selecting the appropriate channel(s) for segmentation, running your data through the network, and then extracting single-cell statistics from the resulting segmentation mask. Workshop Talk - Session V - Part 1: Segmentation

  • Note: It is assumed that the cell table uses the default column names as in ark/settings.py. Refer to the docs to get descriptions of the cell table columns, and methods to adjust them if necessary.

2. Pixel clustering with Pixie

The first step in the Pixie pipeline is to run the pixel clustering notebook. The notebook walks you through the process of generating pixel clusters for your data, and lets you specify what markers to use for the clustering, train a model, use it to classify your entire dataset, and generate pixel cluster overlays. The notebook includes a GUI for manual cluster adjustment and annotation. Workshop Talk - Session IV - Pixel Level Analysis

3. Cell clustering with Pixie

The second step in the Pixie pipeline is to run the cell clustering notebook. This notebook will use the pixel clusters generated in the first notebook to cluster the cells in your dataset. The notebook walks you through generating cell clusters for your data and generates cell cluster overlays. The notebook includes a GUI for manual cluster adjustment and annotation. Workshop Talk - Session V - Cell-level Analysis - Part 2: Cell Clustering

4. Post Clustering Tasks

After the Pixie Pipeline, the user can inspect and fine tune their results with the post clustering notebook. This notebook will go over cleaning up artifacts left from clustering, and working with functional markers.

5. Spatial Analysis

Workshop Talk - Session VI - Spatial Analysis - Part 1: Choosing the Right Analysis Tool.

  1. Pairwise Enrichment Analysis

    The pairwise enrichment notebook allows the user to investigate the interaction between the phenotypes present in their data. In addition users can cluster based on phenotypes around a particular feature such as artery or gland. Workshop Talk - Session VI - Spatial Analysis - Part 2: Pairwise Spatial Enrichment.

  2. K-means Neighborhood Analysis

    The neighborhood analysis notebook sheds light on neighborhoods made of micro-environments which consist of a collection of cell phenotypes. Workshop Talk - Session VI - Spatial Analysis - Part 3: K-means Neighborhood Analysis.

  3. Spatial LDA

    The preprocessing and training / inference draws from language analysis, specifically topic modelling. Spatial LDA overlays a probability distribution on cells belonging to a any particular micro-environment. Workshop Talk - Session VI - Spatial Analysis - Part 4: Spatial LDA.

Installation Steps

Download the Repo

We recommend using the latest release of ark. You can find all the versions available in the Releases Section. Open terminal and navigate to where you want the code stored.

Currently, the latest release is v0.5.2. Then install the latest release with:

git clone -b v0.5.2 https://github.com/angelolab/ark-analysis.git

Or head to [Releases]https://github.com/angelolab/ark-analysis/releases) and download the source code under the Assets' subsection of the latest release.

You may also install previous releases by simply changing the version after the -b.

Setting up Docker

There is a complementary setup video.

Next, you'll need to download Docker Desktop:

  • First, download Docker Desktop.
  • Once it's sucessfully installed, make sure it is running by looking in toolbar for the Docker whale icon.

Running on Windows

Our repo runs best on Linux-based systems (including MacOS). If you need to run on Windows, please consult our Windows guide for additional instructions.

Using the Repository (Running the Docker)

Enter the following command into terminal from the same directory you ran the above commands:

./start_docker.sh

If running for the first time, or if our Docker image has updated, it may take a while to build and setup before completion.

This will generate a link to a Jupyter notebook. Copy the last URL (the one with 127.0.0.1:8888 at the beginning) into your web browser.

Be sure to keep this terminal open. Do not exit the terminal or enter control-c until you are finished with the notebooks.

NOTE:

If you already have a Jupyter session open when you run ./start_docker.sh, you will receive a couple additional prompts.

Copy the URL listed after Enter this URL instead to access the notebooks:

You will need to authenticate. Note the last URL (the one with 127.0.0.1:8888 at the beginning), copy the token that appears there (it will be after token= in the URL), paste it into the password prompt of the Jupyter notebook, and log in.

You can shut down the notebooks and close docker by entering control-c in the terminal window.

REMEMBER TO DUPLICATE AND RENAME NOTEBOOKS

If you didn't change the name of any of the notebooks within the templates folder, they will be overwritten when you decide to update the repo. Read about updating Ark here

External Tools

Mantis Viewer

Mantis is a multiplexed image viewer developed by the Parker Institute. It has built in functionality for easily viewing multichannel images, creating overlays, and concurrently displaying image features alongisde raw channels. We have found it to be extremely useful for analying the output of our analysis pipeline. There are detailed instructions on their download page for how to install and use the tool. Below are some details specifically related to how we use it in ark. Workshop Talk - Session V - Cell-level Analysis - Part 3: Assessing Accuracy with Mantis Viewer.

Mantis directory structure

Mantis expects image data to have a specific organization in order to display it. It is quite similar to how MIBI data is already stored, with a unique folder for each FOV and all channels as individual tifs within that folder. Any notebooks that suggest using Mantis Viewer to inspect results will automatically format the data in the format shown below.

mantis
│ 
├── fov0
│   ├── cell_segmentation.tiff
│   ├── chan0.tiff
│   ├── chan1.tiff
│   ├── chan2.tiff
│   ├── ...
│   ├── population_mask.csv
│   └── population_mask.tiff
├── fov1
│   ├── cell_segmentation.tiff
│   ├── chan0.tiff
│   ├── chan1.tiff
│   ├── chan2.tiff
│   ├── ...
│   ├── population_mask.csv
│   └── population_mask.tiff
└── marker_counts.csv

Loading image-specific files

In addition to the images, there are additional files in the directory structure which can be read into mantis.

cell_segmentation: This file contains the predicted segmentation for each cell in the image, and allows mantis to identify individual cells.

population_pixel_mask: This file maps the individual pixel clusters generated by Pixie in the pixel clustering notebook to the image data.

population_cell_mask: Same as above, but for cell clusters instead of pixel clusters

These files should be specified when first initializing a project in mantis as indicated below:

Loading project-wide files

When inspecting the output of the clustering notebooks, it is often useful to add project-wide .csv files, such as marker_counts.csv. These files contain information, such as the average expression of a given marker, across all the cells in the project. Project-wide files can either be loaded at project initialization, as shown below:

Or they can be loaded into an existing project via Import -> Segment Features -> For project from CSV

View cell features

Once you have loaded the project-wide files into Mantis, you'll need to decide which of the features you want to view. Click on Show Plot Plane at the bottom right, then select the marker you want to assess. This will then allow you to view the cell expression of that marker when you mouse over the cell in Mantis.

External Hard Drives and Google File Stream

To configure external hard drive (or google file stream) access, you will have to add this to Dockers file paths in the Preferences menu.

On Docker for macOS, this can be found in Preferences -> Resources -> File Sharing. Adding /Volumes will allow docker to see external drives

On Docker for Windows with the WSL2 backend, no paths need to be added. However, if using the Hyper-V backend, these paths will need to be added as in the macOS case.

Once the path is added, you can run:

bash start_docker.sh --external 'path/added/to/preferences'

or

bash start_docker.sh -e 'path/added/to/preferences'

to mount the drive into the virtual /data/external path inside the docker.

Updating the Repository

This project is still under development, and we are making frequent changes and improvements. If you want to update the version on your computer to have the latest changes, perform the following steps. Otherwise, we recommend waiting for new releases.

First, get the latest version of the repository.

git pull

Then, run the command below to update the Jupyter notebooks to the latest version

./start_docker.sh --update

or

./start_docker.sh -u

If you have made changes to these notebooks that you would like to keep (specific file paths, settings, custom routines, etc), rename them before updating!

For example, rename your existing copy of 1_Segment_Image_Data.ipynb to 1_Segment_Image_Data_old.ipynb. Then, after running the update command, a new version of 1_Segment_Image_Data.ipynb will be created with the newest code, and your old copy will exist with the new name that you gave it.

After updating, you can copy over any important paths or modifications from the old notebooks into the new notebook.

Example Dataset

If you would like to test out the pipeline, then we have incorporated an example dataset within the notebooks. Currently the dataset contains 11 FOVs with 22 channels (CD3, CD4, CD8, CD14, CD20, CD31, CD45, CD68, CD163, CK17, Collagen1, ECAD, Fibronectin, GLUT1, H3K9ac, H3K27me3, HLADR, IDO, Ki67, PD1, SMA, Vim), and intermediate data necessary for each notebook in the pipeline.

The dataset is split into several smaller components, with each Jupyter Notebook using a combination of those components. We utilize Hugging Face for storing the dataset and using their API's for creating these configurations. You can view the dataset's repository as well.

Dataset Compartments

Image Data: This compartment stores the tiff files for each channel, for every FOV.

image_data/
├── fov0/
│  ├── CD3.tiff
│  ├── ...
│  └── Vim.tiff
├── fov1/
│  ├── CD3.tiff
│  ├── ...
│  └── Vim.tiff
├── .../

Cell Table: This compartment stores the various cell tables which get generated by Notebook 1.

segmentation/cell_table/
├── cell_table_arcsinh_transformed.csv
├── cell_table_size_normalized.csv
└── cell_table_size_normalized_cell_labels.csv

Deepcell Output: This compartment stores the segmentation images after running deepcell.

segmentation/deepcell_output/
├── fov0_whole_cell.tiff
├── fov0_nuclear.tiff
├── ...
├── fov10_whole_cell.tiff
└── fov10_nuclear.tiff

Example Pixel Output: This compartment stores feather files, csvs and pixel masks generated by pixel clustering.

segmentation/example_pixel_output_dir/
├── cell_clustering_params.json
├── channel_norm.feather
├── channel_norm_post_rowsum.feather
├── pixel_thresh.feather
├── pixel_channel_avg_meta_cluster.csv
├── pixel_channel_avg_som_cluster.csv
├── pixel_masks/
│  ├── fov0_pixel_mask.tiff
│  └── fov1_pixel_mask.tiff
├── pixel_mat_data/
│  ├── fov0.feather
│  ├── ...
│  └── fov10.feather
├── pixel_mat_subset/
│  ├── fov0.feather
│  ├── ...
│  └── fov10.feather
├── pixel_meta_cluster_mapping.csv
└── pixel_som_weights.feather

Example Cell Output: This compartment stores feather files, csvs and cell masks generated by cell clustering.

segmentation/example_cell_output_dir/
├── cell_masks/
│  ├── fov0_cell_mask.tiff
│  └── fov1_cell_mask.tiff
├── cell_meta_cluster_channel_avg.csv
├── cell_meta_cluster_count_avg.csv
├── cell_meta_cluster_mapping.csv
├── cell_som_cluster_channel_avg.csv
├── cell_som_cluster_count_avg.csv
├── cell_som_weights.feather
├── cluster_counts.feather
├── cluster_counts_size_norm.feather
└── weighted_cell_channel.csv

Dataset Configurations

  • 1 - Segment Image Data:
    • Image Data
  • 2 - Pixie Cluster Pixels:
    • Image Data
    • Cell Table
    • Deepcell Output
  • 3 - Pixie Cluster Cells:
    • Image Data
    • Cell Table
    • Deepcell Output
    • Example Pixel Output
  • 4 - Post Clustering:
    • Image Data
    • Cell Table
    • Deepcell Output
    • Example Cell Output

Questions?

If you have a general question or are having trouble with part of the repo, you can refer to our FAQ or head to the discussions tab to get help. If you've found a bug with the codebase, first make sure there's not already an open issue, and if not, you can then open an issue describing the bug.

Want to contribute?

If you would like to help make ark better, please take a look at our contributing guidelines.

How to Cite

Please cite the following papers if you found our repo useful!

  1. Greenwald, Miller et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning [2021]
  2. Liu et al. Robust phenotyping of highly multiplexed tissue imaging data using pixel-level clustering [2022]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ark-analysis-0.5.2.tar.gz (6.3 MB view details)

Uploaded Source

Built Distributions

ark_analysis-0.5.2-cp38-cp38-win_amd64.whl (186.1 kB view details)

Uploaded CPython 3.8 Windows x86-64

ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (526.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.28+ x86-64

ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl (525.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64 manylinux: glibc 2.28+ ARM64

ark_analysis-0.5.2-cp38-cp38-macosx_11_0_arm64.whl (184.3 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

ark_analysis-0.5.2-cp38-cp38-macosx_10_9_x86_64.whl (191.0 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

ark_analysis-0.5.2-cp38-cp38-macosx_10_9_universal2.whl (252.7 kB view details)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file ark-analysis-0.5.2.tar.gz.

File metadata

  • Download URL: ark-analysis-0.5.2.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ark-analysis-0.5.2.tar.gz
Algorithm Hash digest
SHA256 3f6b4719bdd2be01551586a7ac61865e7e7aa0920f8b16b23ad1c7b35c850780
MD5 7e513c17bc1005a135b17efb9cc9194c
BLAKE2b-256 7e16b1c1b40c7b1099fa57320ee8d71d8acfad441a3d4f05c24767fca95ab6c8

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 214492f76a0e40c3ac5a57e1663b81df8cff8508fa615d8f13565518ca5da727
MD5 439d0673a26f5f587f68e97bfb09099b
BLAKE2b-256 f96571be438a4df6386981589c5e15b0cd9bf570a61dfab4d1b9b740e5898127

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b4e984e2eb04e3deed0d000b535cbc9ed3d44228730316a7b8f967bbd53e819d
MD5 f690d96444a38466e454667fcd52d113
BLAKE2b-256 30f55280fb25d9e16b00a09a393ebf0420b29ffc8296b8742ac4d0ffb4209921

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 def64114ee335b7a6c0d1eb14f6efde68fd5646db7250f5d4ed25f8219e31ba8
MD5 33d55a3bbbecd780f505e375dac62ed8
BLAKE2b-256 e57d213c4b2fd5310a860aa7b709a0f5193717b6a73888df47f4f459e783abe0

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a29f49688abee1738e451d9f7ee1fda8edaf140f9ae5edd4a93941147f05489f
MD5 5f19f806bf844d8a07e61e626cf6c96a
BLAKE2b-256 ab015a9b3267839f6a97631ec509a41ced09ca408ff713bda82dbe266e9b73a9

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 73c270488244af212456e6677e254ef808fc395f83e63e6330d938f5a94b904e
MD5 afa727cfb08709dd179114139449bbca
BLAKE2b-256 e416795081b70919a4cec99c24111188039d6f1089905cabd3599264d9c9b365

See more details on using hashes here.

File details

Details for the file ark_analysis-0.5.2-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for ark_analysis-0.5.2-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 912807ddd594620fd27b4934f1fbe72f107bf577be270a80abe67ba508ff6bda
MD5 3b47f79bfa0b8b6285d3bcdc19dfae4f
BLAKE2b-256 7dfe3dcad71a18e3d6ae3f7ec73761474f84d535f5e501d646f94d33bb333324

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page