Skip to main content

Python package to run Machine Learning Experiments, within the Hive Framework.

Project description

Hive-ML

Documentation Status

Hive-ML is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological Medical Imaging.

Install

To install Hive-ML:

pip install hive-ml

or from GitHub:

git clone 
pip install -e Hive_ML

Description

The Hive-ML workflow consists of several sequential steps, including Radiomics extraction, Sequential Forward Feature Selection, and Model Fitting, reporting the classifier performances ( ROC-AUC, Sensitivity, Specificity, Accuracy) in a tabular format and tracking all the steps on an MLFlow server.

In addition, Hive-ML provides a Docker Image, Kubernetes Deployment and Slurm Job, with the corresponding set of instructions to easily reproduce the experiments.

Finally, Hive-ML also support model serving through MLFlow, to provide easy access to the trained classifier for future usage in model prediction.

#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant #chemotherapy, from DCE-MRI.

Usage

Hive-ML Pipeline The Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.

Example:

    {
      "image_suffix": "_image.nii.gz",  # File suffix (or list of File suffixes) of the files containing the image volume.
      "mask_suffix": "_mask.nii.gz",    # File suffix (including file extension) of the files containing the segmentation mask of the ROI.
      "label_dict": {                   # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.
        "0": "non-pCR",
        "1": "pCR"
      },
      "models": {                       # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.
        "rf": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "adab": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "knn": {},
        "lda": {},
        "qda": {},
        "logistic_regression": {},
        "svm": {
          "kernel": "rbf"
        },
        "naive": {}
      },
      "perfusion_maps": {                # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.
        "distance_map": "_distance_map.nii.gz",
        "distance_map_depth": {
          "suffix": "_distance_map_depth.nii.gz",
          "kwargs": [
            2
          ]
        },
        "ttp": "_ttp_map.nii.gz",
        "cbv": "_cbv_map.nii.gz",
        "cbf": "_cbf_map.nii.gz",
        "mtt": "_mtt_map.nii.gz"
     },
      "feature_selection": "SFFS",       # Type of Feature Selection to perform. Supported values are SFFS and PCA .
      "n_features": 30,                  # Number of features to preserve when performing Feature Selection.
      "n_folds": 5,                      # Number of folds to run cross-validation.
      "random_seed": 12345,              # Random seed number used when randomizing events and actions.
      "feature_aggregator": "SD"         # Aggregation strategy used when extracting features in the 4D. 
                                         # Supported values are: ``Flat`` (no aggregation, all features are preserved),
                                         #                       ``Mean`` (Average over the 4-th dimension),
                                         #                        ``SD`` (Standard Deviation over the 4-th dimension),
                                         #                        ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),
                                         #                        ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)
      "k_ensemble": [1,5],               # List of k values to select top-k best models in ensembling.
      "metric_best_model": "roc_auc",    # Classification Metric to consider when determining the best models from CV results.
      "reduction_best_model": "mean"     # Reduction to perform on CV scores to determine the best models.
    }

Perfusion Maps Generation

Given a 4D Volume, to extract the perfusion maps (TTP, CBV, CBF, MTT) run:

 Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>

Fore more details, follow the Jupyter Notebook Tutorial : Generate Perfusion Maps

Perfusion Curve Perfusion Maps

Feature Extraction

To extract Radiomics/Radiodynamics from the 4D Volume, run:

 Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file> 

Feature Extraction

Fore more details, follow the Jupyter Notebook Tutorial : Extract Features

Feature Selection

To run Feature Selection:

 Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The Feature Selection report (in JSON format, including the selected features and validation scores for each classifier) will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS

Feature Selection

Fore more details, follow the Jupyter Notebook Tutorial : Feature Selection

Model Fitting

To perform Model Fitting on the Selected features:

 Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The experiment validation reports, plots, and summaries will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>

Validation Plot Example

CV

Fore more details, follow the Jupyter Notebook Tutorial : Model Fitting

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Hive_ML-1.0.1.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

Hive_ML-1.0.1-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file Hive_ML-1.0.1.tar.gz.

File metadata

  • Download URL: Hive_ML-1.0.1.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for Hive_ML-1.0.1.tar.gz
Algorithm Hash digest
SHA256 91756a8945dcaf5f8d3c60dc67bbb6300d7e1624270dabcccdab3d3b95fdd4f4
MD5 67510ec5c6d9661fdd69a62d7cd07f83
BLAKE2b-256 c1275f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573

See more details on using hashes here.

File details

Details for the file Hive_ML-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: Hive_ML-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for Hive_ML-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 315deea6876a435f4e4ebe6238eb60dd33f087295eddee226c783f65147fd417
MD5 eb5fe6a34cb1047e0682bc63e240dad3
BLAKE2b-256 7ea36e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page