Digital pathology feature extraction
Project description
giExtract
A universal framework for the extracting features from digital H&E images using multiple CNN pretrained models. Extracting features from multiple CNNs models captures a wider range of functionally relevant features.
The core of this tool is built in python3.8 with tensorflow backend and keras functional API, while the downstream analysis is implemented in R programming language.
Installation and running the tool
The best way to get giExtract along with all the dependencies is to install the release from python package installer (pip).
pip install giExtract
This will add two command line scripts:
Script | Context | Usage |
---|---|---|
giCube | Create image patches | giCube -h |
giExtract | Extract features from patches | giExtract -h |
Utility functions can be imported using conventional python system like from giExtract.util import generator
Input giCube
The main input here is the path to the H&E images slides (in .jpg or .png), specified by -p
to load and create patches.
All other arguments are optional and have been set to reasonable default. User can use giCube -h
to show the options and the default settings.
Output giCube
Image patches from the H&E slides, which will be saved in "cubes" directory at the path provided in the input.
Input giExtract
The two main inputs are the path to the H&E cubes generated by giCube (.jpg), specified by -p
and path to the meta file (in .csv)
to flow the patches during feature extraction -c
. The context file must have a column with file names matching the patches in the path.
All other arguments are optional and have been set to reasonable default. Use giExtract -h
to see options and default settings.
Output giExtract
A table of features extracted by the different CNN models, with patches as rows and features as columns. The columns in the output file is named to indicate CNN origin of the feature example "inception_46".
Name | feature 1 | feature 2 | feature 2 |
---|---|---|---|
patch 1 | 0.2 | 0.1 | 0.6 |
patch 2 | 5.2 | 0.14 | 0.6 |
patch 3 | 0.6 | 0.1 | 0.7 |
Extras
An R script for analysing the output of giExtract and identifying differential features (see Manuscript) is included under R/ directory, with a README file on usage. The script giFeature.R script requires two mandatory inputs:
- Path to a csv file with meta information (must have only three columns: Name, slide and Group).
- Path to csv file with cnn features to analyse (must be an output of giExtract). Details about the optional arguments and the requirement for R and tidyverse package are given inside the README file.
Manuscript analysis
To reproduce the analysis reported in the manuscript user can execute run.sh
script inside the manuscript folder.
This assumes giExtract has been installed via pip
as stated above, and R is installed on your system.
The run.sh
script will perform the three core analysis 1) patch generation 2) feature extraction and 3) differential feature analysis.
To generate the plots and automatically extract images, user can run the codes in downstream.R.
Example data
Example datasets are provided inside manuscript/data. It these give visuals of what to expect for the input/output files. Note, only a subset of the data is provided due size requirement and access control. Full dataset used for our computational histology subtype inference analysis can be requested from the corresponding authors.
To clone the source repository
git clone https://github.com/caanene1/giExtract
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for giExtract-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7112033fabaf28bb1d990504da1b142c6053335872925a0b3c0d57bc99eae63e |
|
MD5 | 8e974972cc2f3d78f2eddd2f69ddf721 |
|
BLAKE2b-256 | 2675fdb58e054f527dc71fb82d79903256cd6949e2fa41b11079d16e34d87fc3 |