Python package to run Machine Learning Experiments, within the Hive Framework.

These details have not been verified by PyPI

Project links

Project description

Hive-ML

Hive-ML is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological Medical Imaging.

Install

To install Hive-ML:

pip install hive-ml

or from GitHub:

git clone 
pip install -e Hive_ML

Description

The Hive-ML workflow consists of several sequential steps, including Radiomics extraction, Sequential Forward Feature Selection, and Model Fitting, reporting the classifier performances ( ROC-AUC, Sensitivity, Specificity, Accuracy) in a tabular format and tracking all the steps on an MLFlow server.

In addition, Hive-ML provides a Docker Image, Kubernetes Deployment and Slurm Job, with the corresponding set of instructions to easily reproduce the experiments.

Finally, Hive-ML also support model serving through MLFlow, to provide easy access to the trained classifier for future usage in model prediction.

#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant #chemotherapy, from DCE-MRI.

Usage

Hive-ML Pipeline The Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.

Example:

    {
      "image_suffix": "_image.nii.gz",  # File suffix (or list of File suffixes) of the files containing the image volume.
      "mask_suffix": "_mask.nii.gz",    # File suffix (including file extension) of the files containing the segmentation mask of the ROI.
      "label_dict": {                   # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.
        "0": "non-pCR",
        "1": "pCR"
      },
      "models": {                       # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.
        "rf": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "adab": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "knn": {},
        "lda": {},
        "qda": {},
        "logistic_regression": {},
        "svm": {
          "kernel": "rbf"
        },
        "naive": {}
      },
      "perfusion_maps": {                # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.
        "distance_map": "_distance_map.nii.gz",
        "distance_map_depth": {
          "suffix": "_distance_map_depth.nii.gz",
          "kwargs": [
            2
          ]
        },
        "ttp": "_ttp_map.nii.gz",
        "cbv": "_cbv_map.nii.gz",
        "cbf": "_cbf_map.nii.gz",
        "mtt": "_mtt_map.nii.gz"
     },
      "feature_selection": "SFFS",       # Type of Feature Selection to perform. Supported values are SFFS and PCA .
      "n_features": 30,                  # Number of features to preserve when performing Feature Selection.
      "n_folds": 5,                      # Number of folds to run cross-validation.
      "random_seed": 12345,              # Random seed number used when randomizing events and actions.
      "feature_aggregator": "SD"         # Aggregation strategy used when extracting features in the 4D. 
                                         # Supported values are: ``Flat`` (no aggregation, all features are preserved),
                                         #                       ``Mean`` (Average over the 4-th dimension),
                                         #                        ``SD`` (Standard Deviation over the 4-th dimension),
                                         #                        ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),
                                         #                        ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)
      "k_ensemble": [1,5],               # List of k values to select top-k best models in ensembling.
      "metric_best_model": "roc_auc",    # Classification Metric to consider when determining the best models from CV results.
      "reduction_best_model": "mean"     # Reduction to perform on CV scores to determine the best models.
    }

Perfusion Maps Generation

Given a 4D Volume, to extract the perfusion maps (TTP, CBV, CBF, MTT) run:

 Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>

Fore more details, follow the Jupyter Notebook Tutorial : Generate Perfusion Maps

Perfusion Curve Perfusion Maps

Feature Extraction

To extract Radiomics/Radiodynamics from the 4D Volume, run:

 Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file>

Feature Extraction

Fore more details, follow the Jupyter Notebook Tutorial : Extract Features

Feature Selection

To run Feature Selection:

 Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The Feature Selection report (in JSON format, including the selected features and validation scores for each classifier) will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS

Feature Selection

Fore more details, follow the Jupyter Notebook Tutorial : Feature Selection

Model Fitting

To perform Model Fitting on the Selected features:

 Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The experiment validation reports, plots, and summaries will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>

Validation Plot Example

Fore more details, follow the Jupyter Notebook Tutorial : Model Fitting

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Jul 25, 2023

1.0

Jul 25, 2023

1.0a8 pre-release

Jul 25, 2023

1.0a0.post7 pre-release

Jul 25, 2023

1.0a0.post6 pre-release

Jul 25, 2023

1.0a0.post5 pre-release

Jul 25, 2023

1.0a0.post4 pre-release

Jul 25, 2023

1.0a0.post3 pre-release

Jul 25, 2023

1.0a0.post2 pre-release

Jul 25, 2023

1.0a0.post1 pre-release

Jul 25, 2023

1.0a0 pre-release

Jul 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Hive_ML-1.0.1.tar.gz (48.1 kB view details)

Uploaded Jul 25, 2023 Source

Built Distribution

Hive_ML-1.0.1-py3-none-any.whl (34.4 kB view details)

Uploaded Jul 25, 2023 Python 3

File details

Details for the file Hive_ML-1.0.1.tar.gz.

File metadata

Download URL: Hive_ML-1.0.1.tar.gz
Upload date: Jul 25, 2023
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for Hive_ML-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`91756a8945dcaf5f8d3c60dc67bbb6300d7e1624270dabcccdab3d3b95fdd4f4`
MD5	`67510ec5c6d9661fdd69a62d7cd07f83`
BLAKE2b-256	`c1275f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573`

See more details on using hashes here.

File details

Details for the file Hive_ML-1.0.1-py3-none-any.whl.

File metadata

Download URL: Hive_ML-1.0.1-py3-none-any.whl
Upload date: Jul 25, 2023
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for Hive_ML-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`315deea6876a435f4e4ebe6238eb60dd33f087295eddee226c783f65147fd417`
MD5	`eb5fe6a34cb1047e0682bc63e240dad3`
BLAKE2b-256	`7ea36e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590`