A python3-based image analysis package to achieve fully-documented and reproducible visualization and analysis of bio-medical microscopy images. This is a fork from Jennifer Eng`s mplex_image software library.
Project description
cmIF Version 2
Author: engje
Date: 2020-08-19
License: GPLv3
Language: Python3
Description: cmIF is a Python3 library for automated image processing and analysis of multiplex immunofluorescence images
Example Code
1 Import Libraries and Set Paths
Within the root directory, cmIF will auto-generate a standardized folder structure. Folders are for Raw Images (unregistered tiffs single channel),
QC outputs, Registered Images, Autofluorescence Subtracted Images, and Segmentation outputs. So only
codedir
and rootdir
variables should be modified, retaining the standardized folder structure.
To run this example code, download Raw Images and czis from https://www.synapse.org/#!Synapse:syn22315952 Proper folder naming/structure will be:
rootdir
/data/PipelineExample/RawImages/rootdir
/data/PipelineExample/czis/
List the names of the slides to be processed as ls_sample
. Each slide may have 1 or more scenes acquired during scanning.
# Import libraries
import os
import sys
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
import re
from skimage import io
# Set Paths
codedir = os.getcwd()
rootdir = f'{codedir}/data/PipelineExample'
tiffdir = f'{rootdir}/RawImages'
qcdir = f'{rootdir}/QC'
regdir = f'{rootdir}/RegisteredImages'
subdir = f'{rootdir}/SubtractedRegisteredImages'
segdir = f'{rootdir}/Segmentation'
czidir = f'{rootdir}/czis'
# Start Preprocessing
from mplex_image import preprocess, mpimage, cmif
preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir])
# List slides to be processed
ls_sample = ['BC44290-146'
]
2 QC and export metadata from .czi
"Level-1" i.e. original microscope image file QC is performed (check number of files + naming), changing file names if there are any typos. Important metadata, including exposure time (used to normalize images for autofluorescence subtraction) and coordinates of czi scenes is exported to csv.
FILE STRUCTURE: All images from each sample should be in subfolders named by sample ID within czidir
.
from mplex_image import metadata
for s_sample in ls_sample:
os.chdir(f'{czidir}/{s_sample}')
#1 rename undescores to dot to match convention (done)
d_rename = mpimage.underscore_to_dot(s_sample,s_end='.czi')
d_rename.update({'HER2_ER':'HER2.ER'})
#preprocess.dchange_fname(d_rename) #,b_test=False)
#2 Check files/naming
df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
cmif.count_images(df_img)
preprocess.check_names(df_img,s_type='czi')
#Example: change file name and change back
d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'}
preprocess.dchange_fname(d_rename)#,b_test=False)
#3 Export useful imaging metadata (done)
#df_img = metadata.scene_position(f'{czidir}/{s_sample}',type='r')
#df_img.to_csv(f'{codedir}/{s_sample}_ScenePositions.csv')
#metadata.exposure_times_scenes(df_img,rootdir,czidir=f'{czidir}/{s_sample}',s_end='.czi')#
3 QC Tiff Images
Unregistered raw tiff format is 16 bit (uint16
) single channel grayscale tiff, for example, exported from .czi using Zeiss' Zen software.
This section QC's files/naming and generates overviews of all rounds for visual inspection.
FILE STRUCTURE: Each sample's tiff image stack should be in a separate subfolder named by sample ID within rootdir
/RawImages.
# Raw tiffs: check/change names
for s_sample in ls_sample:
os.chdir(f'{tiffdir}/{s_sample}')
#Example: change file name and change back
d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'}
preprocess.dchange_fname(d_rename) #,b_test=False)
#sort and count images
df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw')
cmif.count_images(df_img[df_img.slide==s_sample])
# QC Raw tiffs: visual inspection #
preprocess.cmif_mkdir([f'{qcdir}/RawImages'])
for s_sample in ls_sample:
os.chdir(f'{tiffdir}/{s_sample}')
#investigate tissues
df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw')
cmif.visualize_raw_images(df_img,qcdir,color='c1')
4 Register Images
This section registers all images to round 1 (R1), based on DAPI staining in each round.
FILE STRUCTURE: Registered tiff images are generated and saved in regdir
in subfolders named by sample ID and scene.
for s_sample in ls_sample:
cmif.registration_python(s_sample,tiffdir,regdir,qcdir)
5 Check Registration
This section generates overviews of all rounds of each registered image stack, for QC purposes.
cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')
6 Create AF Subtracted Images
Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. d_channel
specifies the name of the background marker to subtract from each channel. ls_exclude
lists which markers for which AF subtraction is skipped(typically c5 images).
A companion csv listing channels and corresponding markers is generated to reflect this fully processed (i.e. level-2) image data.
FILE STRUCTURE: AF substracted tiff images are output to subdir
in subfolders nemed by sample ID and scene
#parameters
d_channel = {'c2':'R8Qc2','c3':'R8Qc3','c4':'R8Qc4','c5':'R8Qc5'}
for s_sample in ls_sample:
preprocess.cmif_mkdir([f'{subdir}/{s_sample}'])
os.chdir(f'{regdir}/{s_sample}')
df_img = mpimage.parse_org()
ls_exclude = sorted(set(df_img[df_img.color=='c5'].marker)) + ['DAPI'] + [item for key, item in d_channel.items()]
#subtract
df_img, df_exp,df_markers,df_copy = cmif.autofluorescence_subtract(regdir,f'{codedir}/data/PipelineExample',d_channel,ls_exclude,subdir=f'{subdir}/{s_sample}')
#generate channel/marker metadata csv
cmif.segmentation_inputs(subdir,segdir,d_segment={},tma_bool=True,b_start=False, i_counter=0)
7 Cellpose Segmentation
The generalist segmentation algorithim cellpose
is used for nuclear and cell segmentation. Custom python/numba code matches labelled nuclei and cells that overlap from the two segmentation results.
Note: the cellpose_segment_job
is a spawner that starts a job for each slide/scene on the server.
FILE STRUCTURE: Labelled tiffs (uint32
) with each pixel's grayscale value reflecting the label, are output as "Nuclei Segmentation Basins" and "Matched Cell Segmentation Basins" in segdir
sample ID subfolders.
from mplex_image import segment
import time
nuc_diam = 30
cell_diam = 30
s_seg_markers = "['Ecad']"
s_type = 'cell' #'nuclei'#
print(f'Predicting {s_type}')
for s_sample in ls_sample:
preprocess.cmif_mkdir([f'{segdir}/{s_sample}Cellpose_Segmentation'])
os.chdir(f'{regdir}')
for s_file in os.listdir():
if s_file.find(s_sample) > -1:
os.chdir(f'{regdir}/{s_file}')
print(f'Processing {s_file}')
df_img = segment.parse_org()
for s_scene in sorted(set(df_img.scene)):
s_slide_scene= f'{s_sample}-Scene-{s_scene}'
s_find = df_img[(df_img.rounds=='R1') & (df_img.color=='c1') & (df_img.scene==s_scene)].index[0]
segment.cellpose_segment_job(s_file,s_slide_scene,
s_find,f'{segdir}/{s_sample}Cellpose_Segmentation',
f'{regdir}/{s_sample}',nuc_diam,cell_diam,
s_type,s_seg_markers)#,b_match=True)
os.chdir(f'{segdir}/{s_sample}Cellpose_Segmentation')
os.system(f'sbatch cellpose_{s_type}_{s_slide_scene}.sh')
time.sleep(12)
print('Next')
8 Extract features
Extract intesity, shape and location features from each AF subtracted image.
from mplex_image import features
nuc_diam = 30
cell_diam = 30
ls_seg_markers = ['Ecad']
for s_sample in ls_sample:
df_sample, df_thresh = features.extract_cellpose_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam)
df_sample.to_csv(f'{segdir}/features_{s_sample}_MeanIntensity_Centroid_Shape.csv')
df_thresh.to_csv(f'{segdir}thresh_{s_sample}_ThresholdLi.csv')
9 Filter Data
Post-processing includes filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location.
from mplex_image import process
#parameters
ls_marker_cyto = ['CK14_cytoplasm','CK5_cytoplasm','CK17_cytoplasm','CK19_cytoplasm','CK7_cytoplasm','CK5R_cytoplasm','Ecad_cytoplasm','HER2_cytoplasm']
ls_custom = ['CD20R_perinuc5','CD4R_perinuc5','CD8R_perinuc5']
ls_filter = ['DAPI8Q_nuclei','DAPI2_nuclei']
#filtering
os.chdir(rootdir)
df_img_all = process.load_li(ls_sample)
df_mi_full = process.load_cellpose_df(ls_sample, segdir)
df_xy = process.filter_cellpose_xy(df_mi_full)
df_mi_filled = process.fill_cellpose_nas(df_img_all,df_mi_full,ls_marker_cyto,s_thresh='Ecad',ls_celline=[],ls_shrunk = ['CD44','Vim'],qcdir=qcdir)
df_mi = process.filter_loc_cellpose(df_mi_filled, ls_marker_cyto, ls_custom)
df_pos_auto,d_thresh_record = process.auto_threshold(df_mi,df_img_all)
ls_color = df_mi.columns[df_mi.columns.str.contains('DAPI')].tolist()
process.positive_scatterplots(df_pos_auto,d_thresh_record,df_xy,ls_color + ['Ecad_cytoplasm'],qcdir)
df_mi_filter = process.filter_dapi_cellpose(df_pos_auto,ls_color,df_mi,ls_filter,df_img_all,qcdir)
s_sample = ls_sample[0]
df_mi_filter.to_csv(f'{rootdir}/features_{s_sample}_FilteredMeanIntensity_{"_".join([item.split("_")[0] for item in ls_filter])}.csv')
df_xy.to_csv(f'{rootdir}/features_{s_sample}_CentroidXY.csv')
#Expand nuclei without matching cell labels for cells' touching analysis
for s_sample in ls_sample:
se_neg = df_mi_full[df_mi_full.slide==s_sample].Ecad_negative
labels,combine,dd_result = features.combine_labels(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, se_neg)
10 generate multicolor pngs and ome-tiff overlays (cropped)
Crop smaller regions of images and segmentation basins and create custom overlays for efficient viewing of related markers.
#crop coordinates x, y upper corner
d_crop = {'BC44290-146-Scene-1': (2287,9202)}
#10-1 PNGs: nice images for powerpoint presentations
#PNG parameters
d_overlay = {'R1':['CD20','CD8','CD4','CK19'],
'R2':[ 'PCNA','HER2','ER','CD45'],
'R3':['pHH3', 'CK14', 'CD44', 'CK5'],
'R4':[ 'Vim', 'CK7', 'PD1', 'LamAC',],
'R5':['aSMA', 'CD68', 'Ki67', 'Ecad'],
'R6':['CK17','PDPN','CD31','CD3'],
'R7':['CK5R','CD8R','CD4R','CD20R'],
'R8':['LamB1','AR','ColIV','ColI'],
'subtype':['PCNA','HER2','ER','Ki67'],
'diff':['Ecad', 'CK14', 'CD44', 'CK5'],
'immune':['PD1','CD8R','CD4R','CD20R'],
'stromal':['aSMA','Vim','CD68','CD31'],
}
es_bright = {'pHH3','CD68','CK14','CK5','CK17'}
for s_sample in ls_sample:
preprocess.cmif_mkdir([f'{qcdir}/{s_sample}'])
s_path = f'{subdir}/{s_sample}'
os.chdir(s_path)
df_img = mpimage.parse_org()
df_img['path'] = [f'{s_path}/{item}' for item in df_img.index]
for s_scene in sorted(set(df_img.scene)):
df_dapi = df_img[(df_img.color=='c1')&(df_img.scene==s_scene)]
df_scene = df_img[(df_img.color!='c1') & (df_img.scene==s_scene)]
for s_round in ['R1','R2','R3','R4','R5','R6','R7','R8']:
df_round = df_scene[df_scene.rounds==s_round]
df_dapi_round = df_dapi[(df_dapi.rounds==s_round)]
high_thresh=0.999
d_overlay_round = {s_round:d_overlay[s_round]}
d_result = mpimage.multicolor_png(df_round,df_dapi_round,s_scene=s_scene,d_overlay=d_overlay_round,d_crop=d_crop,es_dim={'nada'},es_bright=es_bright,low_thresh=2000,high_thresh=high_thresh)
for key, tu_result in d_result.items():
io.imsave(f'{qcdir}/{s_sample}/ColorArray_{s_scene}_{key}_{".".join(tu_result[0])}.png',tu_result[1])
#10-2 ome-tiff overlays of related markers
import tifffile
s_dapi = 'DAPI1'
d_combos = {
'Stromal':{'PDPN','Vim','CD31','aSMA'},
'Tumor':{'HER2','ER','AR','Ecad','CD44','Ki67','pHH3','PCNA'},
'Immune':{'CD45','CD20R','CD68','PD1', 'CD8R', 'CD4R','FoxP3',},
'Differentiation':{'CK7','CK19','CK14','CK17','CK5','CD44','Vim'},
#'Nuclear':{ 'LamB1', 'LamAC'},
}
for s_sample in ls_sample:
os.chdir(f'{subdir}/{s_sample}')
df_img = mpimage.parse_org()
dd_result = mpimage.overlay_crop(d_combos,d_crop,df_img,s_dapi,tu_dim)
for s_crop, d_result in dd_result.items():
for s_type, (ls_marker, array) in d_result.items():
new_array = array[np.newaxis,np.newaxis,:]
s_xml = mpimage.gen_xml(new_array, ls_marker)
with tifffile.TiffWriter(f'{cropdir}/{s_crop}_{s_type}.ome.tif') as tif:
tif.save(new_array, photometric = "minisblack", description=s_xml, metadata = None)
#10-3 crop segmentation labels to match cropped overlays
cmif.crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='exp5_CellSegmentationBasins')
cmif.crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='Nuclei Segmentation Basins')
Next Steps
Filtered data from step 9 and cropped overlays from step 10 are used for manual thresholding in this notebook. Images, segmentation, and thresholding results are viewed in napari, using this code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file jinxif-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: jinxif-0.0.3-py3-none-any.whl
- Upload date:
- Size: 50.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64217b007b77ba2ec4ec36c962002112526c09f316a7ef823f7e5a5bbea858a7 |
|
MD5 | 14e3dd4ba3684a7e3c21add1aa1a3ec1 |
|
BLAKE2b-256 | e1b7a793f357d622359f8a7ee29fd377cf712308ec55844e2a4de5503bfc8a74 |