A python3-based image analysis package to achieve fully-documented and reproducible visualization and analysis of bio-medical microscopy images.
Project description
cmIF Version 3
Author: engje
Date: 2021-01-05
License: GPLv3
Language: Python3
Description: cmIF is a Python3 library for automated image processing and analysis of multiplex immunofluorescence images
Setup
0. Follow instructions to install python 3.8, if necessary
1. From the command line, clone the repository git clone https://gitlab.com/engje/cmif.git
2. Navigate to PipelineExample cd cmif/PipelineExample/
3. Make a new conda environment conda env create -f cmif_environment.yml
4. Activate the cmif environment source activate cmif
or conda activate cmif
5. Run PipelineExample.py -- Programmatically download example images from https://www.synapse.org/#!Synapse:syn22315952 in step 0
PipelineExample Code
0 Dowload Example Images
To run this example code, download tiffs and czis from https://www.synapse.org/#!Synapse:syn22315952 Proper folder naming/structure will be:
- ./PipelineExample/RawImages/
- ./PipelineExample/czis/
#libraries
import os
import synapseclient
import synapseutils
#navigate to code folder
os.chdir('PipelineExample')
codedir = os.getcwd()
#input/output folders
czidir = f'{codedir}/czis'
tiffdir = f'{codedir}/RawImages'
# login to Synapse
syn = synapseclient.login(email='me@example.com', password='secret', rememberMe=True) #change to your username and password
# download tiffs and czis
all_files = synapseutils.syncFromSynapse(syn, entity='syn22315956', path=tiffdir)
all_files = synapseutils.syncFromSynapse(syn, entity='syn22315953', path=czidir)
1 Import Libraries and Generate Folders
Within the root directory, the code will auto-generate a standardized folder structure. Folders are for Raw Images (unregistered tiffs single channel),
QC outputs, Registered Images, Autofluorescence Subtracted Images, Segmentation outputs and Cropped images for viewing. So only
codedir
variable should be modified, retaining the standardized folder structure.
List the names of the slides to be processed as ls_sample
. Each slide may have 1 or more scenes acquired during scanning.
# Import libraries
import os
import sys
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
import re
from skimage import io
# Start Preprocessing
if os.getcwd().split('/')[-1] != 'PipelineExample':
os.chdir('PipelineExample')
codedir = os.getcwd()
#input/output folders
czidir = f'{codedir}/czis'
tiffdir = f'{codedir}/RawImages'
qcdir = f'{codedir}/QC'
regdir = f'{codedir}/RegisteredImages'
subdir = f'{codedir}/SubtractedRegisteredImages'
segdir = f'{codedir}/Segmentation'
cropdir = f'{codedir}/Cropped'
from mplex_image import preprocess, mpimage, cmif, process
preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir,cropdir,czidir])
ls_sample = ['BC44290-146']
2 QC and export metadata from .czi
"Level-1" i.e. original microscope image file QC is performed (check number of files + naming), changing file names if there are any typos. Important metadata, including exposure time (used to normalize images for autofluorescence subtraction) and coordinates of czi scenes is exported to csv.
FILE STRUCTURE: All images from each sample should be in subfolders named by sample ID within czidir
.
from mplex_image import metadata
import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)
for s_sample in ls_sample:
os.chdir(f'{czidir}/{s_sample}')
#1 rename undescores to dot to match convention (done)
d_rename = mpimage.underscore_to_dot(s_sample,s_end='.czi')
d_rename.update({'HER2_ER':'HER2.ER'})
preprocess.dchange_fname(d_rename) #,b_test=False)
#2 Check files/naming
df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
cmif.count_images(df_img)
preprocess.check_names(df_img,s_type='czi')
#Example: change file name and change back
d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'}
preprocess.dchange_fname(d_rename)#,b_test=False)
#3 Export useful imaging metadata (done)
df_img = metadata.scene_position(f'{czidir}/{s_sample}',type='r')
df_img.to_csv(f'{codedir}/{s_sample}_ScenePositions.csv')
metadata.exposure_times_slide(df_img,codedir,czidir=f'{czidir}/{s_sample}')
javabridge.kill_vm()
3 QC Tiff Images
Unregistered raw tiff format is 16 bit (uint16
) single channel grayscale tiff, for example, exported from .czi using Zeiss' Zen software.
This section QC's files/naming and generates overviews of all rounds for visual inspection.
FILE STRUCTURE: Each sample's tiff image stack should be in a separate subfolder named by sample ID within rootdir
/RawImages.
# Raw tiffs: check/change names
for s_sample in ls_sample:
os.chdir(f'{tiffdir}/{s_sample}')
#Example: change file name and change back
d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'}
preprocess.dchange_fname(d_rename) #,b_test=False)
#sort and count images
df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw')
cmif.count_images(df_img[df_img.slide==s_sample])
preprocess.check_names(df_img,s_type='tiff')
# QC Raw tiffs: visual inspection #
preprocess.cmif_mkdir([f'{qcdir}/RawImages'])
for s_sample in ls_sample:
os.chdir(f'{tiffdir}/{s_sample}')
#investigate tissues
df_img = mpimage.parse_org(s_end="ORG.tif",type='raw')
cmif.visualize_raw_images(df_img,qcdir,color='c1')
4 Register Images
This section registers all images to round 1 (R1), based on DAPI staining in each round.
FILE STRUCTURE: Registered tiff images are generated and saved in regdir
in subfolders named by sample ID and scene.
for s_sample in ls_sample:
cmif.registration_python(s_sample,tiffdir,regdir,qcdir)
5 Check Registration
This section generates overviews of all rounds of each registered image stack, for QC purposes.
cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')
6 Create AF Subtracted Images
Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. d_channel
specifies the name of the background marker to subtract from each channel. If d_early
is included, the background image will also be scaled by relative round. ls_exclude
lists which markers for which AF subtraction is skipped(typically c5 images).
A companion csv listing channels and corresponding markers is generated to reflect this fully processed (i.e. level-2) image data.
FILE STRUCTURE: AF substracted tiff images are output to subdir
in subfolders nemed by sample ID and scene
#parameters
d_channel = {'c2':'R8Qc2','c3':'R8Qc3','c4':'R8Qc4','c5':'R8Qc5'}
d_early={'c2':'R0c2','c3':'R0c3','c4':'R0c4','c5':'R0c5'}
for s_sample in ls_sample:
preprocess.cmif_mkdir([f'{subdir}/{s_sample}'])
os.chdir(f'{regdir}')
for s_file in os.listdir():
if s_file.find(s_sample) > -1:
os.chdir(s_file)
df_img = mpimage.parse_org()
ls_exclude = sorted(set(df_img[df_img.color=='c5'].marker)) + ['DAPI'] + [item for key, item in d_channel.items()] + [item for key, item in d_early.items()]
#subtract
df_markers = cmif.autofluorescence_subtract(s_sample,df_img,codedir,d_channel,ls_exclude,subdir=f'{subdir}/{s_sample}',d_early=d_early) #
os.chdir('..')
#generate channel/marker metadata csv
cmif.metadata_table(regdir,segdir)
7 Cellpose Segmentation
The generalist segmentation algorithim cellpose
is used for nuclear and cell segmentation. Custom python/numba code matches labelled nuclei and cells that overlap from the two segmentation results.
Note: the cellpose_segment_job
is a spawner that starts a job for each slide/scene on the server.
FILE STRUCTURE: Labelled tiffs (uint32
) with each pixel's grayscale value reflecting the label, are output as "Nuclei Segmentation Basins" and "Matched Cell Segmentation Basins" in segdir
sample ID subfolders.
from mplex_image import segment
nuc_diam = 30
cell_diam = 30
s_seg_markers = "['Ecad']"
s_type = 'cell' #'nuclei'#
print(f'Predicting {s_type}')
for s_sample in ls_sample:
segment.segment_spawner(s_sample,segdir,regdir,nuc_diam,cell_diam,s_type,s_seg_markers,s_job='short',s_match='both')
8 Extract features
Extract intensity, shape and location features from each AF subtracted image.
from mplex_image import features
nuc_diam = 30
cell_diam = 30
ls_seg_markers = ['Ecad']
for s_sample in ls_sample:
df_sample, df_thresh = features.extract_cellpose_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam)
df_sample.to_csv(f'{segdir}/features_{s_sample}_MeanIntensity_Centroid_Shape.csv')
df_thresh.to_csv(f'{segdir}/thresh_{s_sample}_ThresholdLi.csv')
8b (optional) Enhanced Features Extraction
When we know that segmentation is not entirely accurate, e.g. the cell membrane, we can extract "enhanced" features, that is, the brightest 25% of pixels within the segmentation region.
from mplex_image import features
nuc_diam = 30
cell_diam = 30
ls_seg_markers = ['Ecad']
ls_membrane = ['HER2','EGFR']
for s_sample in ls_sample:
df_sample = features.extract_bright_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam,ls_membrane)
df_sample.to_csv(f'{segdir}/features_{s_sample}_BrightMeanIntensity.csv')
9 Filter Data
Post-processing includes filling in any missing data (since cellpose typically segments more nuclei than cells) filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location.
from mplex_image import process, features
#parameters
nuc_diam = 30
cell_diam = 30
ls_seg_markers = ['Ecad']
s_thresh='Ecad'
ls_membrane = ['HER2']
ls_marker_cyto = ['CK14','CK5','CK17','CK19','CK7','CK5R','Ecad','HER2']
ls_custom = ['CD8R_perinuc5','CD20R_perinuc5','CD4R_perinuc5']
ls_filter = ['DAPI11_nuclei','DAPI2_nuclei']
ls_shrunk = ['Vim_perinuc3','CD44_perinuc3']
man_thresh = 600
#filtering
for s_sample in ls_sample:
os.chdir(segdir)
#replace nas, select segmentation region and filter cells negative for dapi
df_mi_full,df_img_all = process.filter_cellpose_df(s_sample,segdir,qcdir,s_thresh,ls_membrane,ls_marker_cyto,
ls_custom,ls_filter,ls_shrunk,man_thresh)
#Expand nuclei without matching cell labels for cells touching analysis
__ = features.combine_labels(s_sample,segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, df_mi_full,s_thresh)
process.marker_table(df_img_all,qcdir)
#Expand nuclei without matching cell labels for cells' touching analysis
for s_sample in ls_sample:
se_neg = df_mi_full[df_mi_full.slide==s_sample].Ecad_negative
labels,combine,dd_result = features.combine_labels(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, se_neg)
10 Generate multicolor pngs and ome-tiff overlays (cropped)
Crop smaller regions of images and segmentation basins and create custom overlays for efficient viewing of related markers.
#crop coordinates x, y upper corner
d_crop = {'BC44290-146-Scene-1': (1800,9000)}
#10-1 PNGs: nice images for powerpoint presentations
#PNG parameters
d_overlay = {'R1':['CD20','CD8','CD4','CK19'],
'R2':[ 'PCNA','HER2','ER','CD45'],
'R3':['pHH3', 'CK14', 'CD44', 'CK5'],
'R4':[ 'Vim', 'CK7', 'PD1', 'LamAC',],
'R5':['aSMA', 'CD68', 'Ki67', 'Ecad'],
'R6':['CK17','PDPN','CD31','CD3'],
'R7':['CK5R','CD8R','CD4R','CD20R'],
'R8':['LamB1','AR','ColIV','ColI'],
'subtype':['PCNA','HER2','ER','Ki67'],
'diff':['Ecad', 'CK14', 'CD44', 'CK5'],
'immune':['PD1','CD8R','CD4R','CD20R'],
'stromal':['aSMA','Vim','CD68','CD31'],
}
es_bright = {'pHH3','CD68','CK14','CK5','CK17'}
high_thresh=0.999
for s_scene in sorted(d_crop.keys()):
cmif.visualize_multicolor_overlay(s_scene,subdir,qcdir,d_overlay,d_crop,es_bright,high_thresh)
#10 -2 ome-tiff
#ome-tiff parameters
s_dapi = 'DAPI1'
d_combos = {'AF':{'R0c2','R0c3','R0c4','R0c5'},
'Stromal':{'PDPN','Vim','CD31','aSMA'},
'Tumor':{'HER2','ER','AR','Ecad','CD44','Ki67','pHH3','PCNA'},
'Immune':{'CD45','CD20R','CD68','PD1', 'CD8R', 'CD4R','CD3',},
'Differentiation':{'CK7','CK19','CK14','CK17','CK5','CD44','Vim'},
'Nuclear_ImmuneEarly':{ 'LamB1', 'LamAC','CD20','CD8', 'CD4',},
}
for s_scene in sorted(d_crop.keys()):
cmif.cropped_ometiff(s_scene,subdir,cropdir,d_crop,d_combos,s_dapi,tu_dim)
#10-3 crop basins to match cropped overlays
cmif.load_crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='exp5_CellSegmentationBasins')
cmif.load_crop_labels(d_crop,tu_dim,segdir,cropdir,s_find='Nuclei Segmentation Basins')
11 Tissue edge detection
"Edge-effect" is a common artifact in FFPE tissues. Here we detect cells on edge of tissue.
from mplex_image import features
nuc_diam = 30
i_pixel = 308
for s_sample in ls_sample:
features.edge_mask(s_sample,segdir,subdir,i_pixel=i_pixel, dapi_thresh=350)
df_sample = features.edge_cells(s_sample,segdir,nuc_diam,i_pixel=i_pixel)
df_sample.to_csv(f'{segdir}/features_{s_sample}_EdgeCells{i_pixel}pixels_CentroidXY.csv')
Next Steps
Filtered data from step 9 and cropped overlays from step 10 are used for manual thresholding and gating. Unsupervised clustering may be run on filtered data. Images, segmentation, and clustering or thresholding results are viewed in napari.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file mplex_image-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: mplex_image-0.0.7-py3-none-any.whl
- Upload date:
- Size: 117.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f12738be95769d42b5927ddee8ba2335a259c8042a04e36b491d44124cd47c2 |
|
MD5 | 8bbce7d4c93e86bbb43fe9d414cdb4e0 |
|
BLAKE2b-256 | b8949180e06a109ca1da4b6ddf4c53d4b61b974b2fe3a91ec5ddea64e1ab9630 |