Skip to main content

Digital pathology feature extraction

Project description

giExtract

A universal framework for the extracting features from digital H&E images using multiple CNN pretrained models. Extracting features from multiple CNNs models captures a wider range of functionally relevant features.

The core of this tool is built in python3.8 with tensorflow backend and keras functional API, while the downstream analysis is implemented in R programming language.

Installation and running the tool

The best way to get giExtract along with all the dependencies is to install the release from python package installer (pip).

pip install giExtract This will add two command line scripts:

Script Context Usage
giCube Create image patches giCube -h
giExtract Extract features from patches giExtract -h

Utility functions can be imported using conventional python system like from giExtract.util import generator

Input giCube

The main input here is the path to the H&E images slides (in .jpg or .png), specified by -p to load and create patches. All other arguments are optional and have been set to reasonable default. User can use giCube -h to show the options and the default settings.

Output giCube

Image patches from the H&E slides, which will be saved in "cubes" directory at the path provided in the input.

Input giExtract

The two main inputs are the path to the H&E cubes generated by giCube (.jpg), specified by -p and path to the meta file (in .csv) to flow the patches during feature extraction -c. The context file must have a column with file names matching the patches in the path. All other arguments are optional and have been set to reasonable default. Use giExtract -h to see options and default settings.

Output giExtract

A table of features extracted by the different CNN models, with patches as rows and features as columns. The columns in the output file is named to indicate CNN origin of the feature example "inception_46".

Name feature 1 feature 2 feature 2
patch 1 0.2 0.1 0.6
patch 2 5.2 0.14 0.6
patch 3 0.6 0.1 0.7

Extras

An R script for analysing the output of giExtract and identifying differential features (see Manuscript) is included under R/ directory, with a README file on usage. The script giFeature.R script requires two mandatory inputs:

  • Path to a csv file with meta information (must have only three columns: Name, slide and Group).
  • Path to csv file with cnn features to analyse (must be an output of giExtract). Details about the optional arguments and the requirement for R and tidyverse package are given inside the README file.

Manuscript analysis

To reproduce the analysis reported in the manuscript user can execute run.sh script inside the manuscript folder. This assumes giExtract has been installed via pip as stated above, and R is installed on your system. The run.sh script will perform the three core analysis 1) patch generation 2) feature extraction and 3) differential feature analysis. To generate the plots and automatically extract images, user can run the codes in downstream.R.

Example data

Example datasets are provided inside manuscript/data. It these give visuals of what to expect for the input/output files. Note, only a subset of the data is provided due size requirement and access control. Full dataset used for our computational histology subtype inference analysis can be requested from the corresponding authors.

To clone the source repository

git clone https://github.com/caanene1/giExtract

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giExtract-1.0.4.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

giExtract-1.0.4-py3-none-any.whl (8.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page