Skip to main content

A python3-based image analysis package to achieve fully-documented and reproducible visualization and analysis of bio-medical microscopy images.

Project description

cmIF Version 3

Author: engje

Date: 2021-01-05

License: GPLv3

Language: Python3

Description: cmIF is a Python3 library for automated image processing and analysis of multiplex immunofluorescence images

Setup

0. Follow instructions to install python 3.8, if necessary

1. From the command line, clone the repository git clone https://gitlab.com/engje/cmif.git

2. Navigate to PipelineExample cd cmif/PipelineExample/

3. Make a new conda environment conda env create -f cmif_environment.yml

4. Activate the cmif environment source activate cmif or conda activate cmif

5. Run PipelineExample.py -- Programmatically download example images from https://www.synapse.org/#!Synapse:syn22315952 in step 0

PipelineExample Code

0 Dowload Example Images

To run this example code, download tiffs and czis from https://www.synapse.org/#!Synapse:syn22315952 Proper folder naming/structure will be:

  • ./PipelineExample/RawImages/
  • ./PipelineExample/czis/
#libraries
import os
import synapseclient
import synapseutils

#navigate to code folder
os.chdir('PipelineExample')
codedir = os.getcwd()

#input/output folders
czidir = f'{codedir}/czis'
tiffdir = f'{codedir}/RawImages'

# login to Synapse
syn = synapseclient.login(email='me@example.com', password='secret', rememberMe=True) #change to your username and password

# download tiffs and czis
all_files = synapseutils.syncFromSynapse(syn, entity='syn22315956', path=tiffdir)
all_files = synapseutils.syncFromSynapse(syn, entity='syn22315953', path=czidir)

1 Import Libraries and Generate Folders

Within the root directory, the code will auto-generate a standardized folder structure. Folders are for Raw Images (unregistered tiffs single channel), QC outputs, Registered Images, Autofluorescence Subtracted Images, Segmentation outputs and Cropped images for viewing. So only codedirvariable should be modified, retaining the standardized folder structure.

List the names of the slides to be processed as ls_sample. Each slide may have 1 or more scenes acquired during scanning.

# Import libraries
import os
import sys
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
import re
from skimage import io

# Start Preprocessing
if os.getcwd().split('/')[-1] != 'PipelineExample':
    os.chdir('PipelineExample')
codedir = os.getcwd()

#input/output folders
czidir = f'{codedir}/czis'
tiffdir = f'{codedir}/RawImages'
qcdir = f'{codedir}/QC'
regdir = f'{codedir}/RegisteredImages'
subdir = f'{codedir}/SubtractedRegisteredImages'
segdir = f'{codedir}/Segmentation'
cropdir = f'{codedir}/Cropped'
from mplex_image import preprocess, mpimage, cmif, process
preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir,cropdir,czidir])

ls_sample = ['BC44290-146']

2 QC and export metadata from .czi

"Level-1" i.e. original microscope image file QC is performed (check number of files + naming), changing file names if there are any typos. Important metadata, including exposure time (used to normalize images for autofluorescence subtraction) and coordinates of czi scenes is exported to csv.

FILE STRUCTURE: All images from each sample should be in subfolders named by sample ID within czidir.

from mplex_image import metadata
import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)
for s_sample in ls_sample:
    os.chdir(f'{czidir}/{s_sample}')
    #1 rename undescores to dot to match convention (done)
    d_rename = mpimage.underscore_to_dot(s_sample,s_end='.czi')
    d_rename.update({'HER2_ER':'HER2.ER'})
    preprocess.dchange_fname(d_rename) #,b_test=False)

    #2 Check files/naming
    df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
    cmif.count_images(df_img)
    preprocess.check_names(df_img,s_type='czi')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename)#,b_test=False)

    #3 Export useful imaging metadata (done)
    df_img = metadata.scene_position(f'{czidir}/{s_sample}',type='r')
    df_img.to_csv(f'{codedir}/{s_sample}_ScenePositions.csv')
    metadata.exposure_times_slide(df_img,codedir,czidir=f'{czidir}/{s_sample}')
javabridge.kill_vm()

3 QC Tiff Images

Unregistered raw tiff format is 16 bit (uint16) single channel grayscale tiff, for example, exported from .czi using Zeiss' Zen software. This section QC's files/naming and generates overviews of all rounds for visual inspection.

FILE STRUCTURE: Each sample's tiff image stack should be in a separate subfolder named by sample ID within rootdir/RawImages.

# Raw tiffs: check/change names 

for s_sample in ls_sample: 
    os.chdir(f'{tiffdir}/{s_sample}')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename) #,b_test=False)
    #sort and count images 
    df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw') 
    cmif.count_images(df_img[df_img.slide==s_sample])
    preprocess.check_names(df_img,s_type='tiff')
# QC Raw tiffs: visual inspection #

preprocess.cmif_mkdir([f'{qcdir}/RawImages'])

for s_sample in ls_sample:
    os.chdir(f'{tiffdir}/{s_sample}')
    #investigate tissues
    df_img = mpimage.parse_org(s_end="ORG.tif",type='raw')
    cmif.visualize_raw_images(df_img,qcdir,color='c1')

4 Register Images

This section registers all images to round 1 (R1), based on DAPI staining in each round.

FILE STRUCTURE: Registered tiff images are generated and saved in regdir in subfolders named by sample ID and scene.

for s_sample in ls_sample:
    cmif.registration_python(s_sample,tiffdir,regdir,qcdir)

5 Check Registration

This section generates overviews of all rounds of each registered image stack, for QC purposes.

cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')

6 Create AF Subtracted Images

Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. d_channel specifies the name of the background marker to subtract from each channel. If d_early is included, the background image will also be scaled by relative round. ls_exclude lists which markers for which AF subtraction is skipped(typically c5 images). A companion csv listing channels and corresponding markers is generated to reflect this fully processed (i.e. level-2) image data.

FILE STRUCTURE: AF substracted tiff images are output to subdir in subfolders nemed by sample ID and scene

#parameters
d_channel = {'c2':'R8Qc2','c3':'R8Qc3','c4':'R8Qc4','c5':'R8Qc5'}
d_early={'c2':'R0c2','c3':'R0c3','c4':'R0c4','c5':'R0c5'}

for s_sample in ls_sample:
    preprocess.cmif_mkdir([f'{subdir}/{s_sample}'])
    os.chdir(f'{regdir}')
    for s_file in os.listdir():
        if s_file.find(s_sample) > -1:
            os.chdir(s_file)
            df_img = mpimage.parse_org()
            ls_exclude = sorted(set(df_img[df_img.color=='c5'].marker)) + ['DAPI'] + [item for key, item in d_channel.items()] + [item for key, item in d_early.items()]
            #subtract
            df_markers = cmif.autofluorescence_subtract(s_sample,df_img,codedir,d_channel,ls_exclude,subdir=f'{subdir}/{s_sample}',d_early=d_early) #
            os.chdir('..')
#generate channel/marker metadata csv
cmif.metadata_table(regdir,segdir)

7 Cellpose Segmentation

The generalist segmentation algorithim cellpose is used for nuclear and cell segmentation. Custom python/numba code matches labelled nuclei and cells that overlap from the two segmentation results. Note: the cellpose_segment_job is a spawner that starts a job for each slide/scene on the server.

FILE STRUCTURE: Labelled tiffs (uint32) with each pixel's grayscale value reflecting the label, are output as "Nuclei Segmentation Basins" and "Matched Cell Segmentation Basins" in segdir sample ID subfolders.

from mplex_image import segment
nuc_diam = 30
cell_diam = 30 

s_seg_markers = "['Ecad']"
s_type = 'cell' #'nuclei'#

print(f'Predicting {s_type}')
for s_sample in ls_sample:
    segment.segment_spawner(s_sample,segdir,regdir,nuc_diam,cell_diam,s_type,s_seg_markers,s_job='short',s_match='both')

8 Extract features

Extract intensity, shape and location features from each AF subtracted image.

from mplex_image import features

nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']

for s_sample in ls_sample: 
    df_sample, df_thresh = features.extract_cellpose_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_MeanIntensity_Centroid_Shape.csv')
    df_thresh.to_csv(f'{segdir}/thresh_{s_sample}_ThresholdLi.csv')

8b (optional) Enhanced Features Extraction

When we know that segmentation is not entirely accurate, e.g. the cell membrane, we can extract "enhanced" features, that is, the brightest 25% of pixels within the segmentation region.

from mplex_image import features

nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']
ls_membrane = ['HER2','EGFR']

for s_sample in ls_sample: 
    df_sample = features.extract_bright_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam,ls_membrane)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_BrightMeanIntensity.csv')

9 Filter Data

Post-processing includes filling in any missing data (since cellpose typically segments more nuclei than cells) filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location.

from mplex_image import process, features
#parameters
nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']
s_thresh='Ecad'
ls_membrane = ['HER2']
ls_marker_cyto = ['CK14','CK5','CK17','CK19','CK7','CK5R','Ecad','HER2']
ls_custom = ['CD8R_perinuc5','CD20R_perinuc5','CD4R_perinuc5']
ls_filter = ['DAPI11_nuclei','DAPI2_nuclei']
ls_shrunk = ['Vim_perinuc3','CD44_perinuc3']
man_thresh = 600

#filtering
for s_sample in ls_sample: 
    os.chdir(segdir)
    #replace nas, select segmentation region and filter cells negative for dapi
    df_mi_full,df_img_all = process.filter_cellpose_df(s_sample,segdir,qcdir,s_thresh,ls_membrane,ls_marker_cyto,
     ls_custom,ls_filter,ls_shrunk,man_thresh)
    #Expand nuclei without matching cell labels for cells touching analysis
    __ = features.combine_labels(s_sample,segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, df_mi_full,s_thresh)
    process.marker_table(df_img_all,qcdir)

#Expand nuclei without matching cell labels for cells' touching analysis
for s_sample in ls_sample:
    se_neg = df_mi_full[df_mi_full.slide==s_sample].Ecad_negative
    labels,combine,dd_result = features.combine_labels(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, se_neg)

10 Generate multicolor pngs and ome-tiff overlays (cropped)

Crop smaller regions of images and segmentation basins and create custom overlays for efficient viewing of related markers.

#crop coordinates  x, y upper corner
d_crop = {'BC44290-146-Scene-1': (1800,9000)}

#10-1 PNGs: nice images for powerpoint presentations

#PNG parameters
d_overlay = {'R1':['CD20','CD8','CD4','CK19'],
     'R2':[ 'PCNA','HER2','ER','CD45'],
     'R3':['pHH3', 'CK14', 'CD44', 'CK5'],
     'R4':[ 'Vim', 'CK7', 'PD1', 'LamAC',],
     'R5':['aSMA', 'CD68', 'Ki67', 'Ecad'],
     'R6':['CK17','PDPN','CD31','CD3'],
     'R7':['CK5R','CD8R','CD4R','CD20R'],
     'R8':['LamB1','AR','ColIV','ColI'],
     'subtype':['PCNA','HER2','ER','Ki67'],
     'diff':['Ecad', 'CK14', 'CD44', 'CK5'],
     'immune':['PD1','CD8R','CD4R','CD20R'],
     'stromal':['aSMA','Vim','CD68','CD31'],
     }
es_bright = {'pHH3','CD68','CK14','CK5','CK17'} 
high_thresh=0.999

for s_scene in sorted(d_crop.keys()):
    cmif.visualize_multicolor_overlay(s_scene,subdir,qcdir,d_overlay,d_crop,es_bright,high_thresh)

#10 -2 ome-tiff

#ome-tiff parameters
s_dapi = 'DAPI1'
d_combos = {'AF':{'R0c2','R0c3','R0c4','R0c5'}, 
        'Stromal':{'PDPN','Vim','CD31','aSMA'}, 
        'Tumor':{'HER2','ER','AR','Ecad','CD44','Ki67','pHH3','PCNA'},
        'Immune':{'CD45','CD20R','CD68','PD1', 'CD8R', 'CD4R','CD3',},
        'Differentiation':{'CK7','CK19','CK14','CK17','CK5','CD44','Vim'},
        'Nuclear_ImmuneEarly':{ 'LamB1', 'LamAC','CD20','CD8', 'CD4',},
    }

for s_scene in sorted(d_crop.keys()):
    cmif.cropped_ometiff(s_scene,subdir,cropdir,d_crop,d_combos,s_dapi,tu_dim)

#10-3 crop basins to match cropped overlays

cmif.load_crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='exp5_CellSegmentationBasins')
cmif.load_crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='Nuclei Segmentation Basins')

11 Tissue edge detection

"Edge-effect" is a common artifact in FFPE tissues. Here we detect cells on edge of tissue.

from mplex_image import features
nuc_diam = 30
i_pixel = 308
for s_sample in ls_sample:
    features.edge_mask(s_sample,segdir,subdir,i_pixel=i_pixel, dapi_thresh=350)
    df_sample = features.edge_cells(s_sample,segdir,nuc_diam,i_pixel=i_pixel)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_EdgeCells{i_pixel}pixels_CentroidXY.csv')

Next Steps

Filtered data from step 9 and cropped overlays from step 10 are used for manual thresholding and gating. Unsupervised clustering may be run on filtered data. Images, segmentation, and clustering or thresholding results are viewed in napari.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mplex_image-0.0.7-py3-none-any.whl (117.9 kB view details)

Uploaded Python 3

File details

Details for the file mplex_image-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: mplex_image-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 117.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3

File hashes

Hashes for mplex_image-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7f12738be95769d42b5927ddee8ba2335a259c8042a04e36b491d44124cd47c2
MD5 8bbce7d4c93e86bbb43fe9d414cdb4e0
BLAKE2b-256 b8949180e06a109ca1da4b6ddf4c53d4b61b974b2fe3a91ec5ddea64e1ab9630

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page