Digital pathology feature extraction

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

giExtract

A universal framework for the extracting features from digital H&E images using multiple CNN pretrained models. Extracting features from multiple CNNs models captures a wider range of functionally relevant features.

The core of this tool is built in python3.8 with tensorflow backend and keras functional API, while the downstream analysis is implemented in R programming language.

Installation and running the tool

The best way to get giExtract along with all the dependencies is to install the release from python package installer (pip).

pip install giExtract This will add two command line scripts:

Script	Context	Usage
giCube	Create image patches	`giCube -h`
giExtract	Extract features from patches	`giExtract -h`

Utility functions can be imported using conventional python system like from giExtract.util import generator

Input giCube

The main input here is the path to the H&E images slides (in .jpg or .png), specified by -p to load and create patches. All other arguments are optional and have been set to reasonable default. User can use giCube -h to show the options and the default settings.

Output giCube

Image patches from the H&E slides, which will be saved in "cubes" directory at the path provided in the input.

Input giExtract

The two main inputs are the path to the H&E cubes generated by giCube (.jpg), specified by -p and path to the meta file (in .csv) to flow the patches during feature extraction -c. The context file must have a column with file names matching the patches in the path. All other arguments are optional and have been set to reasonable default. Use giExtract -h to see options and default settings.

Output giExtract

A table of features extracted by the different CNN models, with patches as rows and features as columns. The columns in the output file is named to indicate CNN origin of the feature example "inception_46".

Name	feature 1	feature 2	feature 2
patch 1	0.2	0.1	0.6
patch 2	5.2	0.14	0.6
patch 3	0.6	0.1	0.7

Extras

An R script for analysing the output of giExtract and identifying differential features (see Manuscript) is included under R/ directory, with a README file on usage. The script giFeature.R script requires two mandatory inputs:

Path to a csv file with meta information (must have only three columns: Name, slide and Group).
Path to csv file with cnn features to analyse (must be an output of giExtract). Details about the optional arguments and the requirement for R and tidyverse package are given inside the README file.

Manuscript analysis

To reproduce the analysis reported in the manuscript user can execute run.sh script inside the manuscript folder. This assumes giExtract has been installed via pip as stated above, and R is installed on your system. The run.sh script will perform the three core analysis 1) patch generation 2) feature extraction and 3) differential feature analysis. To generate the plots and automatically extract images, user can run the codes in downstream.R.

Example data

Example datasets are provided inside manuscript/data. It these give visuals of what to expect for the input/output files. Note, only a subset of the data is provided due size requirement and access control. Full dataset used for our computational histology subtype inference analysis can be requested from the corresponding authors.

To clone the source repository

git clone https://github.com/caanene1/giExtract

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.4

Apr 11, 2024

1.0.3

Mar 11, 2024

1.0.2

Mar 8, 2024

1.0.1

Mar 8, 2024

1.0.0

Mar 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giExtract-1.0.4.tar.gz (7.7 kB view hashes)

Uploaded Apr 11, 2024 Source

Built Distribution

giExtract-1.0.4-py3-none-any.whl (8.5 kB view hashes)

Uploaded Apr 11, 2024 Python 3

Hashes for giExtract-1.0.4.tar.gz

Hashes for giExtract-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`7be4b90c98e33ffd2d4699fd4a9c840c0935c2154533264efc2825035bc90b10`
MD5	`a84d191bc1774d5a04ad2e86ed15a26d`
BLAKE2b-256	`c27377573e388b001aedd4f4ccaa08a82a0a11ed19c576795f6d7aeb4db9345d`

Hashes for giExtract-1.0.4-py3-none-any.whl

Hashes for giExtract-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7112033fabaf28bb1d990504da1b142c6053335872925a0b3c0d57bc99eae63e`
MD5	`8e974972cc2f3d78f2eddd2f69ddf721`
BLAKE2b-256	`2675fdb58e054f527dc71fb82d79903256cd6949e2fa41b11079d16e34d87fc3`