Skip to main content

A python3-based image analysis package to achieve fully-documented and reproducible visualization and analysis of bio-medical microscopy images. This is a fork from Jennifer Eng`s mplex_image software library.

Project description

cmIF Version 2

Author: engje

Date: 2020-08-19

License: GPLv3

Language: Python3

Description: cmIF is a Python3 library for automated image processing and analysis of multiplex immunofluorescence images

Example Code

1 Import Libraries and Set Paths

Within the root directory, cmIF will auto-generate a standardized folder structure. Folders are for Raw Images (unregistered tiffs single channel), QC outputs, Registered Images, Autofluorescence Subtracted Images, and Segmentation outputs. So only codedir and rootdir variables should be modified, retaining the standardized folder structure.

To run this example code, download Raw Images and czis from https://www.synapse.org/#!Synapse:syn22315952 Proper folder naming/structure will be:

  • rootdir/data/PipelineExample/RawImages/
  • rootdir/data/PipelineExample/czis/

List the names of the slides to be processed as ls_sample. Each slide may have 1 or more scenes acquired during scanning.

# Import libraries
import os
import sys
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
import re
from skimage import io

# Set Paths
codedir = os.getcwd()
rootdir = f'{codedir}/data/PipelineExample'
tiffdir = f'{rootdir}/RawImages'
qcdir = f'{rootdir}/QC'
regdir = f'{rootdir}/RegisteredImages'
subdir = f'{rootdir}/SubtractedRegisteredImages'
segdir = f'{rootdir}/Segmentation'
czidir = f'{rootdir}/czis'


# Start Preprocessing

from mplex_image import preprocess, mpimage, cmif
preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir])

# List slides to be processed
ls_sample = ['BC44290-146'
 ]

2 QC and export metadata from .czi

"Level-1" i.e. original microscope image file QC is performed (check number of files + naming), changing file names if there are any typos. Important metadata, including exposure time (used to normalize images for autofluorescence subtraction) and coordinates of czi scenes is exported to csv.

FILE STRUCTURE: All images from each sample should be in subfolders named by sample ID within czidir.

from mplex_image import metadata

for s_sample in ls_sample:
    os.chdir(f'{czidir}/{s_sample}')
    #1 rename undescores to dot to match convention (done)
    d_rename = mpimage.underscore_to_dot(s_sample,s_end='.czi')
    d_rename.update({'HER2_ER':'HER2.ER'})
    #preprocess.dchange_fname(d_rename) #,b_test=False)

    #2 Check files/naming
    df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
    cmif.count_images(df_img)
    preprocess.check_names(df_img,s_type='czi')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename)#,b_test=False)

    #3 Export useful imaging metadata (done)
    #df_img = metadata.scene_position(f'{czidir}/{s_sample}',type='r')
    #df_img.to_csv(f'{codedir}/{s_sample}_ScenePositions.csv')
    #metadata.exposure_times_scenes(df_img,rootdir,czidir=f'{czidir}/{s_sample}',s_end='.czi')#

3 QC Tiff Images

Unregistered raw tiff format is 16 bit (uint16) single channel grayscale tiff, for example, exported from .czi using Zeiss' Zen software. This section QC's files/naming and generates overviews of all rounds for visual inspection.

FILE STRUCTURE: Each sample's tiff image stack should be in a separate subfolder named by sample ID within rootdir/RawImages.

# Raw tiffs: check/change names 

for s_sample in ls_sample: 
    os.chdir(f'{tiffdir}/{s_sample}')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename) #,b_test=False)
    #sort and count images 
    df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw') 
    cmif.count_images(df_img[df_img.slide==s_sample])

# QC Raw tiffs: visual inspection #

preprocess.cmif_mkdir([f'{qcdir}/RawImages'])

for s_sample in ls_sample:
    os.chdir(f'{tiffdir}/{s_sample}')
    #investigate tissues
    df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw')
    cmif.visualize_raw_images(df_img,qcdir,color='c1')

4 Register Images

This section registers all images to round 1 (R1), based on DAPI staining in each round.

FILE STRUCTURE: Registered tiff images are generated and saved in regdir in subfolders named by sample ID and scene.

for s_sample in ls_sample:
    cmif.registration_python(s_sample,tiffdir,regdir,qcdir)

5 Check Registration

This section generates overviews of all rounds of each registered image stack, for QC purposes.

cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')

6 Create AF Subtracted Images

Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. d_channel specifies the name of the background marker to subtract from each channel. ls_exclude lists which markers for which AF subtraction is skipped(typically c5 images). A companion csv listing channels and corresponding markers is generated to reflect this fully processed (i.e. level-2) image data.

FILE STRUCTURE: AF substracted tiff images are output to subdir in subfolders nemed by sample ID and scene

#parameters
d_channel = {'c2':'R8Qc2','c3':'R8Qc3','c4':'R8Qc4','c5':'R8Qc5'}

for s_sample in ls_sample:
    preprocess.cmif_mkdir([f'{subdir}/{s_sample}'])
    os.chdir(f'{regdir}/{s_sample}')
    df_img = mpimage.parse_org()
    ls_exclude = sorted(set(df_img[df_img.color=='c5'].marker)) + ['DAPI'] + [item for key, item in d_channel.items()]
    #subtract
    df_img, df_exp,df_markers,df_copy = cmif.autofluorescence_subtract(regdir,f'{codedir}/data/PipelineExample',d_channel,ls_exclude,subdir=f'{subdir}/{s_sample}')
#generate channel/marker metadata csv
cmif.segmentation_inputs(subdir,segdir,d_segment={},tma_bool=True,b_start=False, i_counter=0)

7 Cellpose Segmentation

The generalist segmentation algorithim cellpose is used for nuclear and cell segmentation. Custom python/numba code matches labelled nuclei and cells that overlap from the two segmentation results. Note: the cellpose_segment_job is a spawner that starts a job for each slide/scene on the server.

FILE STRUCTURE: Labelled tiffs (uint32) with each pixel's grayscale value reflecting the label, are output as "Nuclei Segmentation Basins" and "Matched Cell Segmentation Basins" in segdir sample ID subfolders.

from mplex_image import segment
import time

nuc_diam = 30
cell_diam = 30 

s_seg_markers = "['Ecad']"
s_type = 'cell' #'nuclei'#

print(f'Predicting {s_type}')
for s_sample in ls_sample:
    preprocess.cmif_mkdir([f'{segdir}/{s_sample}Cellpose_Segmentation'])
    os.chdir(f'{regdir}')
    for s_file in os.listdir():
        if s_file.find(s_sample) > -1:
            os.chdir(f'{regdir}/{s_file}')
            print(f'Processing {s_file}')
            df_img = segment.parse_org()
            for s_scene in sorted(set(df_img.scene)):
                s_slide_scene= f'{s_sample}-Scene-{s_scene}'
                s_find = df_img[(df_img.rounds=='R1') & (df_img.color=='c1') & (df_img.scene==s_scene)].index[0]
                segment.cellpose_segment_job(s_file,s_slide_scene,
                 s_find,f'{segdir}/{s_sample}Cellpose_Segmentation',
                 f'{regdir}/{s_sample}',nuc_diam,cell_diam,
                 s_type,s_seg_markers)#,b_match=True) 
                os.chdir(f'{segdir}/{s_sample}Cellpose_Segmentation')
                os.system(f'sbatch cellpose_{s_type}_{s_slide_scene}.sh')
                time.sleep(12)
                print('Next')

8 Extract features

Extract intesity, shape and location features from each AF subtracted image.

from mplex_image import features

nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']

for s_sample in ls_sample: 
    df_sample, df_thresh = features.extract_cellpose_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_MeanIntensity_Centroid_Shape.csv')
    df_thresh.to_csv(f'{segdir}thresh_{s_sample}_ThresholdLi.csv')

9 Filter Data

Post-processing includes filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location.

from mplex_image import process
#parameters
ls_marker_cyto = ['CK14_cytoplasm','CK5_cytoplasm','CK17_cytoplasm','CK19_cytoplasm','CK7_cytoplasm','CK5R_cytoplasm','Ecad_cytoplasm','HER2_cytoplasm']
ls_custom = ['CD20R_perinuc5','CD4R_perinuc5','CD8R_perinuc5']
ls_filter = ['DAPI8Q_nuclei','DAPI2_nuclei']
#filtering
os.chdir(rootdir)
df_img_all = process.load_li(ls_sample)
df_mi_full = process.load_cellpose_df(ls_sample, segdir)
df_xy = process.filter_cellpose_xy(df_mi_full)
df_mi_filled = process.fill_cellpose_nas(df_img_all,df_mi_full,ls_marker_cyto,s_thresh='Ecad',ls_celline=[],ls_shrunk = ['CD44','Vim'],qcdir=qcdir)
df_mi = process.filter_loc_cellpose(df_mi_filled, ls_marker_cyto, ls_custom)
df_pos_auto,d_thresh_record = process.auto_threshold(df_mi,df_img_all)
ls_color = df_mi.columns[df_mi.columns.str.contains('DAPI')].tolist()
process.positive_scatterplots(df_pos_auto,d_thresh_record,df_xy,ls_color + ['Ecad_cytoplasm'],qcdir)
df_mi_filter = process.filter_dapi_cellpose(df_pos_auto,ls_color,df_mi,ls_filter,df_img_all,qcdir)
s_sample = ls_sample[0]
df_mi_filter.to_csv(f'{rootdir}/features_{s_sample}_FilteredMeanIntensity_{"_".join([item.split("_")[0] for item in ls_filter])}.csv')
df_xy.to_csv(f'{rootdir}/features_{s_sample}_CentroidXY.csv')

#Expand nuclei without matching cell labels for cells' touching analysis
for s_sample in ls_sample:
    se_neg = df_mi_full[df_mi_full.slide==s_sample].Ecad_negative
    labels,combine,dd_result = features.combine_labels(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, se_neg)

10 generate multicolor pngs and ome-tiff overlays (cropped)

Crop smaller regions of images and segmentation basins and create custom overlays for efficient viewing of related markers.

#crop coordinates  x, y upper corner
d_crop = {'BC44290-146-Scene-1': (2287,9202)}

#10-1 PNGs: nice images for powerpoint presentations

#PNG parameters
d_overlay = {'R1':['CD20','CD8','CD4','CK19'],
     'R2':[ 'PCNA','HER2','ER','CD45'],
     'R3':['pHH3', 'CK14', 'CD44', 'CK5'],
     'R4':[ 'Vim', 'CK7', 'PD1', 'LamAC',],
     'R5':['aSMA', 'CD68', 'Ki67', 'Ecad'],
     'R6':['CK17','PDPN','CD31','CD3'],
     'R7':['CK5R','CD8R','CD4R','CD20R'],
     'R8':['LamB1','AR','ColIV','ColI'],
     'subtype':['PCNA','HER2','ER','Ki67'],
     'diff':['Ecad', 'CK14', 'CD44', 'CK5'],
     'immune':['PD1','CD8R','CD4R','CD20R'],
     'stromal':['aSMA','Vim','CD68','CD31'],
     }
es_bright = {'pHH3','CD68','CK14','CK5','CK17'} 

for s_sample in ls_sample:
    preprocess.cmif_mkdir([f'{qcdir}/{s_sample}'])
    s_path = f'{subdir}/{s_sample}'
    os.chdir(s_path)
    df_img = mpimage.parse_org()
    df_img['path'] = [f'{s_path}/{item}' for item in df_img.index]
    for s_scene in sorted(set(df_img.scene)):
        df_dapi = df_img[(df_img.color=='c1')&(df_img.scene==s_scene)]
        df_scene = df_img[(df_img.color!='c1') & (df_img.scene==s_scene)]
        for s_round in ['R1','R2','R3','R4','R5','R6','R7','R8']:
            df_round = df_scene[df_scene.rounds==s_round]
            df_dapi_round = df_dapi[(df_dapi.rounds==s_round)]
            high_thresh=0.999
            d_overlay_round = {s_round:d_overlay[s_round]}
            d_result = mpimage.multicolor_png(df_round,df_dapi_round,s_scene=s_scene,d_overlay=d_overlay_round,d_crop=d_crop,es_dim={'nada'},es_bright=es_bright,low_thresh=2000,high_thresh=high_thresh)
            for key, tu_result in d_result.items():
                io.imsave(f'{qcdir}/{s_sample}/ColorArray_{s_scene}_{key}_{".".join(tu_result[0])}.png',tu_result[1])

#10-2 ome-tiff overlays of related markers

import tifffile

s_dapi = 'DAPI1'
d_combos = {
        'Stromal':{'PDPN','Vim','CD31','aSMA'}, 
        'Tumor':{'HER2','ER','AR','Ecad','CD44','Ki67','pHH3','PCNA'},
        'Immune':{'CD45','CD20R','CD68','PD1', 'CD8R', 'CD4R','FoxP3',},
        'Differentiation':{'CK7','CK19','CK14','CK17','CK5','CD44','Vim'},
        #'Nuclear':{ 'LamB1', 'LamAC'},
    }
for s_sample in ls_sample:
    os.chdir(f'{subdir}/{s_sample}')
    df_img = mpimage.parse_org()
    dd_result = mpimage.overlay_crop(d_combos,d_crop,df_img,s_dapi,tu_dim)
    for s_crop, d_result in dd_result.items():
        for s_type, (ls_marker, array) in d_result.items():
            new_array = array[np.newaxis,np.newaxis,:]
            s_xml =  mpimage.gen_xml(new_array, ls_marker)
            with tifffile.TiffWriter(f'{cropdir}/{s_crop}_{s_type}.ome.tif') as tif:
                tif.save(new_array,  photometric = "minisblack", description=s_xml, metadata = None)

#10-3 crop segmentation labels to match cropped overlays

cmif.crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='exp5_CellSegmentationBasins')
cmif.crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='Nuclei Segmentation Basins')

Next Steps

Filtered data from step 9 and cropped overlays from step 10 are used for manual thresholding in this notebook. Images, segmentation, and thresholding results are viewed in napari, using this code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

jinxif-0.0.1-py3-none-any.whl (18.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page