Machine Learning in NeuroImaging for various tasks, e.g., regression, classification and clustering.
Project description
MLNI
Machine Learning in NeuroImaging (MLNI) is a python package that performs various tasks using neuroimaging data: i) binary classification for disease diagnosis, following good practice proposed in AD-ML; ii) regression prediction, such as age prediction; and iii) semi-supervised clustering with HYDRA.
License
Copyright (c) 2016 University of Pennsylvania. All rights reserved. See https://www.cbica.upenn.edu/sbia/software/license.html.
Installation
Ananconda allows you to install, run and update python package and their dependencies. We highly recommend the users to install Anancond3 on your machine. After installing Anaconda3, there are three choices to use MLNI.
Use MLNI as a python package
We recommend the users to use Conda virtual environment:
1) conda create --name mlni python=3.6
Activate the virtual environment:
2) source activate mlni
Install other python package dependencies (go to the root folder of MLNI):
3) ./install_requirements.sh
Finally, we need install mlni from PyPi:
4) pip install mlni==0.0.4
Use MLNI from commandline:
After installing all dependencies in the requirements.txt file, go to the root folder of MLNI where the setup.py locates:
pip install -e .
Use MLNI as a developer version:
python -m pip install git+https://github.com/anbai106/mlni.git
Input structure
MLNI requires a specific input structure inspired by BIDS. Conventions for the group label/diagnosis: -1 represents healthy control (CN) and 1 represents patient (PT); categorical variables, such as sex, should be encoded to numbers: Female for 0 and Male for 1, for instance.For regression, simply replace the diagnosis column with the predicted variable, such as age in age prediction task.
Input data format
Clustering
MLNI clusters with ROI features in feature_tsv (covariate_tsv is optionally provided). Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 -1 432.1 596.9
sub-CLNC0002 ses-M00 1 398.2 601.3
sub-CLNC0003 ses-M00 -1 412.0 567.3
sub-CLNC0004 ses-M00 -1 487.4 600.1
sub-CLNC0005 ses-M00 1 346.5 529.5
sub-CLNC0006 ses-M00 1 443.2 663.2
sub-CLNC0007 ses-M00 -1 450.2 599.3
sub-CLNC0008 ses-M00 1 443.2 509.4
Example for covariate_tsv:
participant_id session_id diagnosis age sex ...
sub-CLNC0001 ses-M00 -1 56.1 0
sub-CLNC0002 ses-M00 1 57.2 0
sub-CLNC0003 ses-M00 -1 43.0 1
sub-CLNC0004 ses-M00 -1 25.4 1
sub-CLNC0005 ses-M00 1 74.5 1
sub-CLNC0006 ses-M00 1 44.2 0
sub-CLNC0007 ses-M00 -1 40.2 0
sub-CLNC0008 ses-M00 1 43.2 1
Classification with ROIs
Note: For classification, nested feature selection has also been implemented for ROI-wise and voxel-wise features! Only feature_tsv is required. Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 -1 432.1 596.9
sub-CLNC0002 ses-M00 1 398.2 601.3
sub-CLNC0003 ses-M00 -1 412.0 567.3
sub-CLNC0004 ses-M00 -1 487.4 600.1
sub-CLNC0005 ses-M00 1 346.5 529.5
sub-CLNC0006 ses-M00 1 443.2 663.2
sub-CLNC0007 ses-M00 -1 450.2 599.3
sub-CLNC0008 ses-M00 1 443.2 509.4
Classification with images
Only participant_tsv is required. Example for participant_tsv for voxel-wise classification:
participant_id session_id diagnosis path ...
sub-CLNC0001 ses-M00 -1 path1
sub-CLNC0002 ses-M00 1 path2
sub-CLNC0003 ses-M00 -1 path3
sub-CLNC0004 ses-M00 -1 path4
sub-CLNC0005 ses-M00 1 path5
sub-CLNC0006 ses-M00 1 path6
sub-CLNC0007 ses-M00 -1 path7
sub-CLNC0008 ses-M00 1 path8
Classification with multi-scale features extracted by opNMF.
After running images with opNMF, only participant_tsv is required as input. Example for participant_tsv for voxel-wise classification:
participant_id session_id diagnosis
sub-CLNC0001 ses-M00 -1
sub-CLNC0002 ses-M00 1
sub-CLNC0003 ses-M00 -1
sub-CLNC0004 ses-M00 -1
sub-CLNC0005 ses-M00 1
sub-CLNC0006 ses-M00 1
sub-CLNC0007 ses-M00 -1
sub-CLNC0008 ses-M00 1
Regression with ROIs
Note: For regression with ROI-wise features, please replace the diagnosis column with the predicted variable (e.g., age). Only feature_tsv is required. Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 23 432.1 596.9
sub-CLNC0002 ses-M00 44 398.2 601.3
sub-CLNC0003 ses-M00 65 412.0 567.3
sub-CLNC0004 ses-M00 15 487.4 600.1
sub-CLNC0005 ses-M00 22 346.5 529.5
sub-CLNC0006 ses-M00 78 443.2 663.2
sub-CLNC0007 ses-M00 90 450.2 599.3
sub-CLNC0008 ses-M00 33 443.2 509.4
Example
We offer a toy dataset in the folder of mlni/data.
Running MLNI for clustering CN vs Subtype1 vs Subtype2 vs ...:
from mlni.hydra_clustering import clustering
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
k_min=2
k_max=8
cv_repetition=100
clustering(feature_tsv, output_dir, k_min, k_max, cv_repetition)
Note that the above example assume that the input features have been corrected by covariate effects, such as age and sex, if not, one can run:
from mlni.hydra_clustering import clustering
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
k_min=2
k_max=8
cv_repetition=100
covariate_tsv="mlni/data/test_covariate.tsv"
clustering(feature_tsv, output_dir, k_min, k_max, cv_repetition, covariate_tsv=covariate_tsv)
Running MLNI for binary classification CN vs PT:
from mlni.adml_classification import classification_roi
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
classification_roi(feature_tsv, output_dir, cv_repetition)
or
from mlni.adml_classification import classification_voxel
feature_tsv="mlni/data/test_feature_voxel.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
classification_voxel(feature_tsv, output_dir, cv_repetition)
Running MLNI for regression tasks:
from mlni.adml_regression import regression_roi
feature_tsv="mlni/data/test_feature_regression_age.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
regression_roi(feature_tsv, output_dir, cv_repetition)
Citing this work
If you use this software for clustering:
Varol, E., Sotiras, A., Davatzikos, C., 2017. HYDRA: Revealing heterogeneity of imaging and genetic patterns through a multiple max-margin discriminative analysis framework. Neuroimage, 145, pp.346-364. doi:10.1016/j.neuroimage.2016.02.041 - Paper in PDF
If you use this software for classification or regression:
Wen, J., Samper-González, J., Bottani, S., Routier, A., Burgos, N., Jacquemont, T., Fontanella, S., Durrleman, S., Epelbaum, S., Bertrand, A. and Colliot, O., 2020. Reproducible evaluation of diffusion MRI features for automatic classification of patients with Alzheimer’s disease. Neuroinformatics, pp.1-22. doi:10.1007/s12021-020-09469-5 - Paper in PDF
J. Samper-Gonzalez, N. Burgos, S. Bottani, S. Fontanella, P. Lu, A. Marcoux, A. Routier, J. Guillon, M. Bacci, J. Wen, A. Bertrand, H. Bertin, M.-O. Habert, S. Durrleman, T. Evgeniou and O. Colliot, Reproducible evaluation of classification methods in Alzheimer’s disease: Framework and application to MRI and PET data. NeuroImage, 183:504–521, 2018 doi:10.1016/j.neuroimage.2018.08.042 - Paper in PDF - Supplementary material
Usage demonstration
Wen, J., Varol, E., Davatzikos, C., 2020. Multi-scale feature reduction and semi-supervised learning for parsing neuroanatomical heterogeneity. Organization for Human Brain Mapping. - Link
Wen, J., Varol, E., Davatzikos, C., 2021. Multi-scale semi-supervised clustering of brain images: deriving disease subtypes. Submitted to MedIA. - Link
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.